Thread: Bug in iterator comparison in C++ standard?

  1. #1
    Registered User
    Join Date
    Jul 2008
    Posts
    1

    Bug in iterator comparison in C++ standard?

    We've encountered an ... anomaly, let's say, in the iterator comparison behaviour in g++ 4.1.1. The STL implementation in this release is fairly standard, so we don't believe it's specific to g++. Just in case we're overlooking something really obvious, I thought I'd post this here first to get some feedback, rather than accusing the (frankly excellent) g++ guys of writing ropey code :-)

    Imagine you have two vectors of the same type, and you'd like to iterate over them in order (ie. all of a's stuff, then all of b's stuff). How about writing:

    Code:
    vector<foo> a, b;
    // Fill in a, b contents here.
    for (vector<foo>::iterator it = a.begin(); it != b.end(); ++it)
    {
      if (it == a.end())
      {
        it = b.begin();
      }
      // Do something with *it
    }
    Except this doesn't work in the rare case that the space allocated for b's foos is just in front of the space for a's foos in memory - in this case, a.begin() and b.end() are the same location, and the comparison (at least in g++'s STL) gives them as being equal. Oh dear. Nondeterministic behaviour results because of the dependence on memory layout. Ouch. Took one of our best guys three days to track down ...

    Can you compare iterators from two different objects, we thought? Well, the C++ standard doesn't prohibit it. It has a number of things to say on this subject:

    "An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j. If j is reachable from i, they refer to the same container."

    So if i and j are different containers, then i is not reachable from j, and hence !(i==j). Except we have a case where they're coming out equal.

    It also says "If a and b are equal, then either a and b are both dereferenceable or else neither is dereferenceable.". Unfortunately, in this case, a.begin() is dereferenceable and b.end() isn't, so they shouldn't be equal. Except they can be.

    So we're thinking that there's either a bug in the standard (someone should have said that it's undefined behaviour to compare iterators from different containers) or a bug in the implementation (there should be one element's gap between the start of a container and the end of the previous one in memory). Currently, the code above seems to be valid and correct, and the compiler turns it into something that doesn't work.

    Anyone smart enough to work out where we've gone wrong? :-)

    Many thanks for any thoughts. Clearly we've re-written the code to have two separate loops, so it now works - we're just worried that our One True Standard (all hail the Standard!) could have missed something.

    Regards, Steev

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    So we're thinking that there's either a bug in the standard (someone should have said that it's undefined behaviour to compare iterators from different containers) or a bug in the implementation (there should be one element's gap between the start of a container and the end of the previous one in memory).
    I think it is a bug in the C++ Standard, at least its 2003 edition. Consider section 5.7 paragraph 6: "Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined."

    Clearly, if iterators are to be a generalisation of pointers, then unless both iterators point to elements of the same range, or one past the last element of the range, the behaviour should be undefined.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    You should search comp.std.c++ and see if they've figured it out already (comp.lang.c++.moderated might also be a place to check). You can check the draft for the next standard (the one I have is n2315) to see if they've changed the language of any of the pertinent areas.

  4. #4
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Most recent draft is n2691, actually. No fix in there. And csc++ is still down.

    Definitely a bug, though. The introduction to iterators states the intent that all functions expecting iterators could be given pointers. But if comparison of iterators over different containers is allowed, then this is not the case.

    Peculiar problem. I wonder if it affects other STLs too, or if other STLs' memory allocators are not as freaky. (Why can a.end() == b.begin()? There must be some heap management data in-between!)
    Last edited by CornedBee; 07-10-2008 at 09:43 AM.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #5
    Registered User
    Join Date
    Jul 2008
    Posts
    1
    CornedBee writes: "Why can a.end() == b.begin()? There must be some heap management data in-between!"

    There is no need to have heap management data held between allocated blocks if blocks of identical size are grouped, as is the case with TCMalloc for example.

  6. #6
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    838
    Except this doesn't work in the rare case that the space allocated for b's foos is just in front of the space for a's foos in memory - in this case, a.begin() and b.end() are the same location,
    i'm no expert by any means, but i'm not so sure about this. the following works as expected for me, and the contents of a and b are definitely contiguous.

    Code:
            int ia[20];
            vector<int> a, b;
            // Fill in a, b contents here.
    
            for(int i=0;i<10;i++)
            {
                    b.push_back(ia[i]);
                    a.push_back(ia[i+10]);
            }
            for (vector<int>::iterator it = a.begin(); it != b.end(); ++it)
            {
              if (it == a.end())
              {
                it = b.begin();
              }
            }
    the closest i can come to making a.end() == b.begin() (or vise versa) is a difference of 32bits, when size() ==1.

    am i missing something?
    Last edited by m37h0d; 07-11-2008 at 12:31 AM.

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Yes. How do you know they're definitely contiguous? Also, what compiler, what memory manager? As I said, most memory managers have management data at the start of each block. Seems george16 solved that puzzle for the failing case.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by m37h0d View Post
    i'm no expert by any means, but i'm not so sure about this. the following works as expected for me, and the contents of a and b are definitely contiguous.
    Nope they're probably not contiguous.
    am i missing something?
    Yes. The vector objects themselves are both next to each other on the stack, but vectors internally typically consist of 3 pointers pointing to different points within a block of memory that they allocate from the heap. They probably allocate from different locations in the heap.

    The case the OP ran into would be where they happened to allocate from adjacent areas on the heap.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  9. #9
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    838
    right. of course. the location of an element in a vector has no relation to the location of the object being added to it since vectors allocate new memory off the heap when something is added to them.

    thanks for setting me straight. i should have seen that to begin with though. please, don't mind me. carry on.
    Last edited by m37h0d; 07-11-2008 at 09:16 AM.

  10. #10
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Quote Originally Posted by ISO/IEC 14882:2003(E)
    24.1.3 Forward iterators
    ...
    [Table 74—Forward iterator requirements]
    1. ...
    -- If a and b are equal, then either a and b are both dereferenceable or else neither is dereferenceable.
    ...
    2. [Note: The condition that a == b implies ++a == ++b (which is not true for input and output iterators) and the removal of the restrictions on the number of the assignments through the iterator (which applies to output iterators) allows the use of multi-pass one-directional algorithms with forward iterators. —end note]
    In my opinion, the implementation is at fault here since it doesn't satisfy the "If a and b are equal" condition statement - and it definitely doesn't satisfy (2) since the ++ operation requires that the iterator be dereferenceable.

    There are a couple of new/open library issues that are somewhat related, but I think the current standard covers the case of equality between a dereferenceable and a non-dereferenceable iterator - even if they are from two different containers.

    492. Invalid iterator arithmetic expressions
    446. Iterator equality between different containers

    gg

  11. #11
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    This is exactly the case of 446. If equality between iterators of different containers is undefined, then it may report true and yet violate T74/1.

    Consider:
    Code:
    int main()
    {
      int ar1[4];
      int ar2[4];
    
      int *last_ar1 = ar1 + 4;
      int *first_ar2 = ar2;
    
      if(last_ar1 == first_ar2) std::cout << "But last_ar1 is not dereferenceable!\n";
    }
    Are you saying the compiler has to introduce padding between the two arrays so that this program doesn't print anything?
    Well, it doesn't - comparing pointers to different arrays is already undefined.
    Last edited by CornedBee; 07-11-2008 at 04:47 PM.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  12. #12
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Whether there is a bug in the STL or not, I'd say the more basic thing being overlooked is that steev's code is trying to be a bit too clever. Any code that relies on relationships between iterators that act on two distinct containers is asking for trouble.

    Code:
    vector<foo> a, b;
    // Fill in a, b contents here.
    for (vector<foo>::iterator it = a.begin(); it != b.end(); ++it)
    {
      if (it == a.end())
      {
        it = b.begin();
      }
      // Do something with *it
    }
    would be more clearly expressed as;
    Code:
    void do_something(std::vector<foo> & thing)
    {
       for (vector<foo>::iterator it = thing.begin(); it != thing.end(); ++it)
       {
          // Do something with *it
       }
    }
    
    vector<foo> a, b;
    // Fill in a, b contents here.
    
    do_something(a);
    do_something(b);
    with no need to compare iterators from different containers at all.

  13. #13
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> Any code that relies on relationships between iterators that act on two distinct containers is asking for trouble.

    which is precisely why this shouldn't be categorized as an implementation bug (CornedBee's point about pointer comparisons actually demonstrates clearly why this type of comparison should be considered undefined behavior).
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  14. #14
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> ...this type of comparison should be considered undefined behavior
    I totally agree. I only meant to "point the finger" as the standard stands today.

    gg

  15. #15
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by Sebastiani View Post
    >> Any code that relies on relationships between iterators that act on two distinct containers is asking for trouble.

    which is precisely why this shouldn't be categorized as an implementation bug (CornedBee's point about pointer comparisons actually demonstrates clearly why this type of comparison should be considered undefined behavior).
    No problems Sebastiani. I wasn't actually arguing against that point.

    My comments were more along the lines that this is an example of code that attempts to be too clever (trickery to use one loop construct to iterate over two containers) so tends to be problematical in practice. IMHO, anyone doing such things deserves a sound whacking around the ears because such code is harder to understand therefore harder to get right and harder to maintain (and makes it easier to encounter obscure forms of undefined behaviour). Even if there was no potential for undefined behaviour, it would be very difficult to justify using such practices.

    While I consider CornedBee's example and reasoning is valid, I also haven't delved into the standard to confirm/deny what is actually the case.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Standard efficiency
    By Jorl17 in forum C Programming
    Replies: 3
    Last Post: 06-18-2009, 11:48 AM
  2. array initialization & C standard
    By jim mcnamara in forum C Programming
    Replies: 2
    Last Post: 09-12-2008, 03:25 PM
  3. Replies: 8
    Last Post: 03-08-2008, 08:50 PM
  4. Standard C++ API
    By sparks in forum C++ Programming
    Replies: 9
    Last Post: 08-26-2005, 05:35 PM
  5. standard language, standard compiler?
    By doubleanti in forum A Brief History of Cprogramming.com
    Replies: 8
    Last Post: 09-03-2001, 04:21 AM