Thread: Setting an iterator past the end of a container somehow causes a crash

  1. #1
    Registered User
    Join Date
    Jun 2016
    Posts
    40

    Setting an iterator past the end of a container somehow causes a crash

    I was working on an assignment from Alex Allain's Jumping into C++, Exercise 1 of Chapter 19. The user enters a bunch of text (a "haystack"), then enters some other text (a "needle"). The program then tells the user how many times the needle appeared in the haystack.

    Everything was going well; I tested a bunch of inputs, and things seemed to work. But, when using the needle "roo" against the haystack "inacatroofcatdogcatyocareler", the program crashes when it gets to the last "r" in the haystack. Here's the function:
    Code:
    /*
    Notes:
    strSize = string::size_type typedef
    conStrIter = string::const_iterator typedef
    strIter = string::iterator typedef
    */
    
    strSize needleCounter(const string& needle, const string& haystack)
    {
        //if the haystack has less letters than the needle, there are no more 
        //instances of the needle to find, so...
        if (haystack.length() < needle.length())
            return 0; //we're done here!
    
        //iterate through the haystack
        
        conStrIter iter;
        conStrIter secondIter; 
        const conStrIter endOfHaystack = haystack.end();
    
        strSize charCount = needle.length(); 
        char firstLetter = needle[0]; //we find the needle based on its first letter
    
        for (iter = haystack.begin(); iter < endOfHaystack; iter++)
        {
            if (*iter == firstLetter) //if we find the first letter of the needle, 
            {
                //set up a string that includes that letter, and all the ones after
                //(based on how many letters in the needle there are)
    
                secondIter = iter + charCount; //get to the where the last letter of the needle may be
    
                //if the second iter gets to or past the endpoint of the haystack, there
                //are no instances of the needle from here onward, so...
                if (secondIter >= endOfHaystack)
                    return 0; //we're done here!
    
                //otherwise, just take the letters, and see if it is our needle
                string needleSub(iter, secondIter);
    
                if (needleSub == needle) //if we found a needle,
                {
                    //call this function again, passing a haystack without all the letters up to
                    //the end of the needle
                    string newHaystack(secondIter, endOfHaystack);
    
                    return 1 + needleCounter(needle, newHaystack);
                }
                else //otherwise, save some resources by moving the iter just before the end of needleSub
                {
                    iter = secondIter - 1;
                }
    
        
            }
        }
    
        return 0; 
        //^this happens when the needle couldnt be found anywhere in the given haystack, despite
        //all the other measures in place
    
    }
    The bug is on the line:
    Code:
    secondIter = iter + charCount; //get to the where the last letter of the needle may be
    Where the iterator meant to point to the last character in a substring gets made to point to said character. For some reason, before that iterator is even bounds-checked, the program crashes when it processes that line. Only when that last "r" in the haystack is reached, which at the time, was also pointed to by the variable iter.

    Which is weird; not only did it work perfectly fine in other inputs in the same haystack, I even tested this iterator-setting in a separate file; setting an iterator past the end of a container shouldn't cause a crash unless you try to access what it is pointing to.

    I'd very much appreciate help figuring this out.

  2. #2
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    Your use of recursion is senseless. You're already in a loop. You don't need the recursive call.

    It is incorrect to advance iter to secondIter (minus one or whatever). Consider needleCounter("abc", "aabc").

    And clearly secondIter can end up pointing beyond one-past-the-end since you're adding charCount to iter, and iter could be pointing anywhere in haystack (i.e., it could be less than charCount from the one-past-the-end).

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    You can't just point to arbitrary points beyond the end of the object.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    C++ strings support a find method as well, if that's applicable in this instance. Though I do suppose the point of the exercise is to give you a realistic yet still doable programming problem.

  5. #5
    Registered User
    Join Date
    Jun 2016
    Posts
    40
    Quote Originally Posted by algorism View Post
    Your use of recursion is senseless. You're already in a loop. You don't need the recursive call.

    It is incorrect to advance iter to secondIter (minus one or whatever). Consider needleCounter("abc", "aabc").


    And clearly secondIter can end up pointing beyond one-past-the-end since you're adding charCount to iter, and iter could be pointing anywhere in haystack (i.e., it could be less than charCount from the one-past-the-end).
    Oh, I see. I feel silly for not realizing that before ^^;
    And yes, I know that secondIter can point to any point after the end of a container. What I want to know is why it crashes in this one specific instance, when it worked just fine with a different needle.

    Also, why is it incorrect to have secondIter point to iter minus one? It's how I made the algorithm avoid looking at any letters it already inspected.

    Though I will try to make a version that doesn't use recursion, and see how that works.

    Quote Originally Posted by Salem View Post
    You can't just point to arbitrary points beyond the end of the object.
    I can, actually. Before implementing this algorithm, I tested it with this:

    Code:
     string haystack = "3f4q8gg7u8qh";
        string::iterator beginning = haystack.begin();
        string::iterator other = beginning + 5;
        string::iterator ending = haystack.end() + 5;
    
        while (true)
        {
    
            int8_t num;
            cin >> num;
            cout << num + 0 << endl;
        }
    
        if (ending > haystack.end())
        {
            cout << "\n\nSo, you can do pointer arithmetic that compares an iterator's \n"
            << "position to the end of the container, even if the iterator is past the\n"
            << "end of the container.\n\n";
    
            cout << "other iter equals " << *other << "\n\n";
            cout << "beginning equals: " << *beginning << "\n\n";
    
        }
    Hence my confusion as to why it suddenly crashed in my program in one specific instance.

    Quote Originally Posted by MutantJohn View Post
    C++ strings support a find method as well, if that's applicable in this instance. Though I do suppose the point of the exercise is to give you a realistic yet still doable programming problem.
    Yup, it is; the assignment is about implementing that find method by yourself, essentially. Though involving a counter.
    Last edited by Tesp; 12-17-2016 at 09:40 AM. Reason: Forgot to add something to the beginning of this post

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Just because it compiles and 'seems to work' in one case isn't the same as guaranteed behaviour.

    You can get away with all sorts of rubbish in short test programs, because the program never lives long enough for the errors to propagate to the point where it crashes. The time taken to realise that something is broken in the code varies anywhere between seconds and years.

    Let me quote chapter and verse at you.
    Quote Originally Posted by c99_standards_committee
    When two pointers are compared, the result depends on the relative locations in the
    address space of the objects pointed to. If two pointers to object or incomplete types both
    point to the same object, or both point one past the last element of the same array object,
    they compare equal. If the objects pointed to are members of the same aggregate object,
    pointers to structure members declared later compare greater than pointers to members
    declared earlier in the structure, and pointers to array elements with larger subscript
    values compare greater than pointers to elements of the same array with lower subscript
    values. All pointers to members of the same union object compare equal. If the
    expression P points to an element of an array object and the expression Q points to the
    last element of the same array object, the pointer expression Q+1 compares greater than
    P. In all other cases, the behavior is undefined.
    That is to say, to compare two pointers, both have to be either in bounds, or be pointing to the last element +1.
    Having a pointer at +N (N>1) is undefined behaviour.

    Having said that, I tried a SSCCE and didn't see what you're seeing (not that it means much).
    Perhaps you can post your actual SSCCE that crashes, and that may throw further light on the problem (like for example the problem is really somewhere else).
    Code:
    #include <iostream>
    #include <iterator>
    #include <string>
    using namespace std;
    
    typedef  string::size_type strSize;
    typedef  string::const_iterator conStrIter;
    typedef  string::iterator strIter;
    
    strSize needleCounter(const string& needle, const string& haystack)
    {
        //if the haystack has less letters than the needle, there are no more 
        //instances of the needle to find, so...
        if (haystack.length() < needle.length())
            return 0; //we're done here!
    
        //iterate through the haystack
        
        conStrIter iter;
        conStrIter secondIter; 
        const conStrIter endOfHaystack = haystack.end();
    
        strSize charCount = needle.length(); 
        char firstLetter = needle[0]; //we find the needle based on its first letter
    
        for (iter = haystack.begin(); iter < endOfHaystack; iter++)
        {
            if (*iter == firstLetter) //if we find the first letter of the needle, 
            {
                //set up a string that includes that letter, and all the ones after
                //(based on how many letters in the needle there are)
    
                secondIter = iter + charCount; //get to the where the last letter of the needle may be
    
                //if the second iter gets to or past the endpoint of the haystack, there
                //are no instances of the needle from here onward, so...
                if (secondIter >= endOfHaystack)
                    return 0; //we're done here!
    
                //otherwise, just take the letters, and see if it is our needle
                string needleSub(iter, secondIter);
    
                if (needleSub == needle) //if we found a needle,
                {
                    //call this function again, passing a haystack without all the letters up to
                    //the end of the needle
                    string newHaystack(secondIter, endOfHaystack);
    
                    return 1 + needleCounter(needle, newHaystack);
                }
                else //otherwise, save some resources by moving the iter just before the end of needleSub
                {
                    iter = secondIter - 1;
                }
    
        
            }
        }
    
        return 0; 
        //^this happens when the needle couldnt be found anywhere in the given haystack, despite
        //all the other measures in place
    
    }
    
    int main () {
      std::cout << "N=" << needleCounter("roo","inacatroofcatdogcatyocareler") << endl;
      return 0;
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    why is it incorrect to have secondIter point to iter minus one? It's how I made the algorithm avoid looking at any letters it already inspected.
    Firstly, that's the opposite of what you're doing. You're setting iter to secondIter - 1.
    Anyway, did you consider my example?
    Code:
    needleCounter("abc", "aabc")
    Think about it. Your algorithm will not find "abc" in "aabc". It will initially match the first 'a'. Then it will hit the second 'a' and see that it doesn't match the 'b' in the needle. Then it will jump ahead 3 chars (from the initial 'a'), up to the 'c' in "aabc", which will of course not match the 'a' in "abc", and the algorithm will fail.

    What I want to know is why it crashes in this one specific instance, when it worked just fine with a different needle.
    You'll have to remind me what the "one specific instance" is. Post a whole compilable/runnable program (complete with main) that crashes (and PLEASE delete all of the comments; they are just clutter).

  8. #8
    Registered User
    Join Date
    Jun 2016
    Posts
    40
    Quote Originally Posted by Salem View Post
    Just because it compiles and 'seems to work' in one case isn't the same as guaranteed behaviour.

    You can get away with all sorts of rubbish in short test programs, because the program never lives long enough for the errors to propagate to the point where it crashes. The time taken to realise that something is broken in the code varies anywhere between seconds and years.

    Let me quote chapter and verse at you.

    That is to say, to compare two pointers, both have to be either in bounds, or be pointing to the last element +1.
    Having a pointer at +N (N>1) is undefined behaviour.

    Having said that, I tried a SSCCE and didn't see what you're seeing (not that it means much).
    Perhaps you can post your actual SSCCE that crashes, and that may throw further light on the problem (like for example the problem is really somewhere else).

    <sscce of the problem>
    I see what you mean about the pointers; it explains why merely setting secondIter past endOfHaystack caused a crash; endOfHaystack is the last element + 1, and thus the last valid point to point to.

    And I think that's why with other inputs in the same haystack, it didn't crash; secondIter never pointed past endOfHaystack.

    Though, when trying to run your SSCCE in my comp, it crashed just like it did before. I even made my own (which was almost exactly like yours), and it also crashed. Here:

    Code:
    #include <iostream>
    #include <iterator>
    #include <string>
    using namespace std;
    
    typedef  string::size_type strSize;
    typedef  string::const_iterator conStrIter;
    typedef  string::iterator strIter;
    
    strSize needleCounter(const string& needle, const string& haystack)
    {
        /*
        Looks through a given haystack, and returns how many times the needle
        appeared in that haystack.
        */
    
        strSize result = 0; 
    
        conStrIter iter;
        conStrIter secondIter;
        const conStrIter endOfHaystack = haystack.end();
    
        //helps raise performance by skipping letters that need not be inspected
        char firstLetter = needle[0];
        const strSize lettersInNeedle = needle.size();
    
        for (iter = haystack.begin(); iter < endOfHaystack; iter++)
        {
            if (*iter != firstLetter) 
                continue; 
    
            if (string(iter, endOfHaystack).length < lettersInNeedle)
                break;
    
            secondIter = iter + lettersInNeedle; 
    
            if (secondIter >= endOfHaystack) //case when there are no more needles to find
                break; 
    
            string subNeedle(iter, secondIter);
    
            if (subNeedle == needle)
            {
                result++;
                iter = secondIter - 1; //avoid looking at letters already checked in the next iteration
            }
    
            
        }
    
        return result;
    }
    
    int main() {
        //std::cout << "N=" << needleCounter("roo", "inacatroofcatdogcatyocareler") << endl;
        strSize result = needleCounter("roo", "inacatroofcatdogcatyocareler");
        cout << "\nN = " << result;
        
        cin.get();
        return 0;
    }

    If it helps any, I ran this in Visual Studio Community 2015.

    Quote Originally Posted by algorism View Post
    Firstly, that's the opposite of what you're doing. You're setting iter to secondIter - 1.
    Anyway, did you consider my example?
    Code:
    needleCounter("abc", "aabc")
    Think about it. Your algorithm will not find "abc" in "aabc". It will initially match the first 'a'. Then it will hit the second 'a' and see that it doesn't match the 'b' in the needle. Then it will jump ahead 3 chars (from the initial 'a'), up to the 'c' in "aabc", which will of course not match the 'a' in "abc", and the algorithm will fail.
    I see what you mean about how setting iter to secondIter - 1 messed things up; upon testing it, it went exactly as you said. That line shouldn't even execute unless a needle match is found. I managed to fix it by changing this:

    Code:
    if (secondIter >= endOfHaystack)
    to this:

    Code:
    if (secondIter > endOfHaystack)
    I assume this works because it doesn't end the loop when secondIter gets to endOfHaystack, which is the last valid point to point to. Is that correct?

    Quote Originally Posted by algorism View Post
    You'll have to remind me what the "one specific instance" is. Post a whole compilable/runnable program (complete with main) that crashes (and PLEASE delete all of the comments; they are just clutter).
    It's using the needle "roo" with the haystack "inacatroofcatdogcatyocareler"; I posted an SSCCE earlier in this post, so you can try to run it yourself. It still crashes (despite the above partial fix). Also, I removed the clutter; I need to get over my bad habit of overcommenting ^^;


    Though, after some more fiddling around, I managed to fix the "roo" problem by adding this:
    Code:
    //avoid pointing past endOfHaystack
    string lettersRemaining(iter, endOfHaystack);
    if (lettersRemaining.length() < lettersInNeedle)
        break;
    under this part of needleCounter:

    Code:
    if (*iter != firstLetter) 
        continue;
    I'll keep testing with other inputs and haystacks to make sure my algorithm is actually fixed.

    Also, as a side note, remember that code I posted that set an iterator 5 positions past the end of a haystack, just to prove my point about how that's allowed? It ran just fine in CodeBlocks, but when I ran the same thing in Visual Studio, it crashed. Thought it'd be worth mentioning.

  9. #9
    Registered User
    Join Date
    Jun 2016
    Posts
    40
    I tested as I said, and everything seems to work perfectly! So, thanks to all of you who helped

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. getting offset from iterator for use in another container
    By CodeMonkey in forum C++ Programming
    Replies: 3
    Last Post: 07-08-2011, 09:00 PM
  2. Container - Iterator ++it
    By thescratchy in forum C++ Programming
    Replies: 1
    Last Post: 12-06-2010, 09:37 AM
  3. iterator to unspecified container?
    By StainedBlue in forum C++ Programming
    Replies: 16
    Last Post: 06-04-2010, 05:02 AM
  4. Access via iterator to pointer container
    By atlmallu in forum C++ Programming
    Replies: 2
    Last Post: 07-20-2009, 01:13 PM
  5. Replies: 4
    Last Post: 03-21-2004, 03:34 PM

Tags for this Thread