Thread: strtok safety

  1. #1
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445

    strtok safety

    how safe is strtok when I get the string from std::string::c_str()? I'd imagine it's considerably safer than if I was dealing with pointers from other origins, but I just figured I'd see what you guys thought. strtok is a lot faster than other methods which use the functionality of std::string or std::stringstream.

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Not safe at all since strtok() modifies the null terminated string but the pointer returned by c_str() points to the first character of a null terminated string that is not to be modified.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Since string::c_str() returns a const char* string, you'd need to use a const_cast to pass it to strtok(), and when you use const_cast you're throwing safety right out the window.
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

  4. #4
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    If you really want strtok for strings you could roll your own, and I don't think it would be any slower than strtok necessarily, since you would have to do things like make a copy of the string, anyway.

    Code:
    #include <string>
    #include <algorithm>
    
    using namespace std;
    
    // PRE: haystack.end() > last >= haystack.begin()
    string myStrtok (string& haystack, const string& needle, string::iterator& last)
    {
       string result;
       string::iterator start = last;
       last = find_end(last, haystack.end(), needle.begin(), needle.end());
       if(last != haystack.end()) {
          copy(start, last, result.begin());
          ++last;
       }
       return result;
    }
    I didn't try it, but it would be something like that.
    Last edited by whiteflags; 06-30-2009 at 01:06 PM.

  5. #5
    The larch
    Join Date
    May 2006
    Posts
    3,573
    The reason why strtok would be faster is that it doesn't create new strings and instead puts null-terminators in the existing one and returns pointers to substrings.

    I think boost::split can store the results as a collection of iterator_ranges without copying the substrings. You might try that or use a similar idea. (Because you get the beginning and end of the substrings, rather than a single pointer to a C-string, you won't need to modify the string at all.)
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  6. #6
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    You could try copying the std::string into a std::vector<char> and use strtok() on the vector instead.
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

  7. #7
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by anon View Post
    The reason why strtok would be faster is that it doesn't create new strings and instead puts null-terminators in the existing one and returns pointers to substrings.
    and in my case, I'm copying the returned strings into elements of a std::vector<std::string>. Since the original string, returned from std::string::c_str(), is basically guaranteed to be constructed sanely, and the returned value from strtok() is basically guaranteed to be a sanely constructed substring of that original string, I would think that this method would be reasonably safe under most circumstances.

    please correct me if I'm wrong.

  8. #8
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    It doesn't matter how safely you deal with the answer, if you're passing a c_str() in ANY form to strtok, then the code is broken at the moment the call is made.

    > I would think that this method would be reasonably safe under most circumstances.
    Pure dumb luck would be my assessment.

    > strtok is a lot faster than other methods which use the functionality of std::string or std::stringstream.
    Here's a tip.
    Make it 'right' before you try and make it 'fast'.

    Other methods would be a lot 'faster' to write and your code would be up and running by now, rather than fiddling with strtok to make it work.

    WHEN your program is finished, and you've PROVEN this is a bottleneck with a profiler, THEN you can think about performance tuning.
    All the tokenising approaches are fast, compared to say file I/O.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  9. #9
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by Salem View Post
    Other methods would be a lot 'faster' to write and your code would be up and running by now, rather than fiddling with strtok to make it work.
    the code has in fact been up and running for almost two years now, and has been working just fine. I just want it to be faster.

    WHEN your program is finished, and you've PROVEN this is a bottleneck with a profiler, THEN you can think about performance tuning.
    All the tokenising approaches are fast, compared to say file I/O.
    it is a bottleneck. I wouldn't be asking these questions if it wasn't an issue.

    doing away with the const_cast, and first copying the string to a buffer on the heap would obviously be safer, and copying a block of memory is relatively fast, so there wouldn't be big performance hit there, and I'm thinking that so long as I check for errors allocating memory, and free it when I'm done, it should be only slightly less safe than std::string and its friends. As long a strtok() and strcpy(), or for that matter, strncpy(), don't misbehave, it should be fine.

  10. #10
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Why not do something like this and avoid the dynamic memory allocation?

    Code:
    std::string str = "One Two Three";
    std::vector<char> vec( str.begin(), str.end() );
    vec.push_back( '\0' );
    const char* result = strtok( &vec[0], " " );
    I haven't tried it, but it should work.
    Last edited by cpjust; 07-02-2009 at 06:42 AM.
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

  11. #11
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    I haven't read that part of the C++ standard, but isn't this legal:
    Code:
    std::string str;
    /* Build a string */
    char *c = &str[0];
    I'd imagine it's guaranteed that &str[0] will be the null terminated string. Or doesn't the data have to be sequential/zero terminated until you call c_str() (in which case, of course, only the returned value is sequential and zero terminated, and &str[0] still might not be).

    Anybody knows whether it's legal?

  12. #12
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by EVOEx
    I'd imagine it's guaranteed that &str[0] will be the null terminated string. Or doesn't the data have to be sequential/zero terminated until you call c_str() (in which case, of course, only the returned value is sequential and zero terminated, and &str[0] still might not be).

    Anybody knows whether it's legal?
    At the moment, there is no such definite guarantee, though due to a defect in the standard the wording could be regarded as ambiguous. This defect will be rectified in C++0x.

    EDIT:
    Oh, but if I remember correctly the guarantee will be that the storage is contiguous. I do not think that there will be any guarantee of null termination (such a change would not make sense), so you would have to add a null character to the end in order to use &str[0] with strtok().
    Last edited by laserlight; 07-02-2009 at 08:11 AM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #13
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by laserlight View Post
    you would have to add a null character to the end in order to use &str[0] with strtok().
    which brings me back to using strncpy() to copy to a buffer allocated on the heap, and when I run a loop ahead of the strtok() call to determine the number of elements (I'm only using a single character as a delimiter anyway), and reserve space in the vector in which the results are stored, the difference in performance is negligible between strtok() and std::getline().

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. 20q game problems
    By Nexus-ZERO in forum C Programming
    Replies: 24
    Last Post: 12-17-2008, 05:48 PM
  2. strtok is causing segmentation fault
    By yougene in forum C Programming
    Replies: 11
    Last Post: 03-08-2008, 10:32 AM
  3. trying to use strtok() function to parse CL
    By ohaqqi in forum C Programming
    Replies: 15
    Last Post: 07-01-2007, 09:38 PM
  4. converting string to integer, for further use
    By shoobsie in forum C Programming
    Replies: 2
    Last Post: 07-01-2005, 03:12 AM
  5. Trouble with strtok()
    By BianConiglio in forum C Programming
    Replies: 2
    Last Post: 05-08-2004, 06:56 PM