Parsing Strings

This is a discussion on Parsing Strings within the C++ Programming forums, part of the General Programming Boards category; I downloaded MSVC 2005 Beta Express from MS, and am now using it in place of my old MSVC 6 ...

  1. #1
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879

    Parsing Strings

    I downloaded MSVC 2005 Beta Express from MS, and am now using it in place of my old MSVC 6 Professional, in the hopes that it will benefit the standard-ness of my code.

    So, I'm writing this nice solution to the third problem here, when suddenly the compiler starts yelling at me that strtok() and just about every other standard C function that I've used is deprecated. Very well, I can use stringstreams to achieve much the same result using std::getline() - but with one important exception: getline() only accepts one delimiter character.

    So naturally, though I don't need the extra delimiters at the moment, I'm wondering if there's some standard C++ equivalent of strtok() that allows for multiple delimiters. Or would you just have to use find_first_of(), then substr() out the section that you want?
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  2. #2
    Registered User
    Join Date
    Aug 2003
    Posts
    470
    For simple lexical scanning, I typically draw a state diagram and use getc along with ungetc, C++ equivalents included. Any thing more complex is better left to regex library, I think.

  3. #3
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    So in short, you either deal with input on a byte-by-byte basis or use a third-party library? Sounds to me like the former would be for extremely simple cases while the latter would be for very complex cases. What happens to everything in between, i.e. breaking up a sequence of words delimited by + - / ;?
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  4. #4
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    I'm having a hard time figuring out your problem.
    The compiler is complaining that strtok() is deprecated. Are you using <cstring> and std::strtok()?
    What does this problem have to do with std::getline() which is an input function and strtok() which parses an array of characters?

    I don't know of any standard input function that allows for multiple delimiters but it wouldn't be hard to write one.

  5. #5
    Registered User
    Join Date
    Aug 2003
    Posts
    470
    So in short, you either deal with input on a byte-by-byte basis or use a third-party library? Sounds to me like the former would be for extremely simple cases while the latter would be for very complex cases. What happens to everything in between, i.e. breaking up a sequence of words delimited by + - / ;?
    Breaking up a sequence of words dilimited by + - / ; is something I'd do by hand. Simply read characters, group then into words, and then when either a +, -, / or ; is reached end the current word and go to the next case. Using regex, you have some complications because you must not only check if a string matches [+-/;]([a-zA-Z]*) but also check special cases for the beginning( usually the "^" character) and the of the file(usually the "$" character.)

  6. #6
    Registered User
    Join Date
    Aug 2003
    Posts
    470
    Note also that the beginning character and end of file character don't exist in the file. They're just stuff to make entering the regex in less complicated. You should be able to write a regex such as "[a-zA-Z]/[+-/;$] where the middle "/" means that those characters must follow for the pattern to be matched but are not in the pattern. (This all depends on which library you use, however.)

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,892
    The Boost library is freely available and does everything you want. Yes, it's third party, but very popular.
    http://www.boost.org/

    And I believe Thantos has the right of it. Use <cstring> and std::strtok and the problem should go away.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Well I was kinda bored, ok really bored, so I wrote an input function that will allow for multiple delimiters. When I go to sleep and then wake up I'll try to optimize it a little and fix any mistakes that were caused by being up so late:
    Code:
    std::istream& mygetline(std::istream& in, char *arr, int size, const std::string &delim="\n")
    {
      int effsize = 0;
      while(effsize < size -1)
      {
        int next = in.peek();
        if ( delim.find_first_of(static_cast<char>(next)) == string::npos )
        {
          arr[effsize++] = static_cast<char>(in.get());
        }
        else
        {
          if ( effsize == 0 )
            in.ignore();
          else
            break;
        }
      }
      arr[effsize] = '\0';
      return in;
    }
    std::istream& mygetline(std::istream& in, std::string& str, const std::string& delim="\n")
    {
      bool gotone=false, stop=false;
      str.erase();
      while ( !stop )
      {
        int next = in.peek();
        if ( delim.find_first_of(static_cast<char>(next)) == string::npos )
        {
          str += static_cast<char>(in.get());
          gotone = true;
        }
        else
          if ( gotone )
            break;
          else
            in.ignore();
      }
      return in;
    }

  9. #9
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    Quote Originally Posted by Thantos
    I'm having a hard time figuring out your problem.
    The compiler is complaining that strtok() is deprecated. Are you using <cstring> and std::strtok()?
    What does this problem have to do with std::getline() which is an input function and strtok() which parses an array of characters?

    I don't know of any standard input function that allows for multiple delimiters but it wouldn't be hard to write one.
    Originally I included <cstring> and called strtok() without the std::. I recompiled with the std:: in front of it, and it still warns me that "strtok() was declared deprecated". I mentioned std::getline, because you can stick the string in an istringstream, then use istringstream::getline() with a user-specified delimiter, in order to extract each token, assuming there is only 1 delimiter between each token. And yes, I know it's not hard to write a standard C++ version of strtok() - I wrote posted one here earlier, which duplicates the results of strtok() albeit suboptimally, except that it stores the tokens in a vector or list of c++ strings.

    >>The Boost library is freely available and does everything you want. Yes, it's third party, but very popular.
    Indeed, when I did a search on 'regex library', Boost came up as one of the first results (and yes I know, Boost has a lot more than regex). But I wanted to know, if C++ has an equivalent for the C strtok(), since *apparently* strtok() has been deprecated in C++ (and the compiler whines about strcpy() too).

    okinrus: Ah thanks, but I don't anticipate using regex anytime soon; seems a little complex at the moment, and I don't see any immediate application for it in any of my projects anyway
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  10. #10
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >"strtok() was declared deprecated".
    Your compiler is stupid. Nowhere in the standard does it say that strtok is deprecated. I suggest you find a switch to turn off that warning, because it's incorrect.
    My best code is written with the delete key.

  11. #11
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    >Your compiler is stupid
    Did you expect anything less from Microsoft?

  12. #12
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    Well, since the later MSVC's are all cracked up to be extra standards-compliant, I figured at least they'd get their deprecations right

    Thanks Prelude, I'll get cracking on that right now.
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  13. #13
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >I figured at least they'd get their deprecations right
    My copy of Visual C++ .NET doesn't seem to have a problem, and I keep it on the most pedantic settings most of the time. Are you using the free beta?

    >since the later MSVC's are all cracked up to be extra standards-compliant
    From what I've seen, they do pretty well.
    My best code is written with the delete key.

  14. #14
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    Yes, I'm using the free beta. But the deprecations look very authentically intentional.

    Interesting link here:
    http://www.open-std.org/jtc1/sc22/wg.../docs/n997.pdf

    I checked the source of the strcpy() and strtok() implementations, and look what it has:
    Code:
    _CRT_INSECURE_DEPRECATE _CRTIMP char *  __cdecl strtok(char *, const char *);
    _CRTIMP char *  __cdecl strtok_s(char *, const char *, char **);
    
    
    _CRT_INSECURE_DEPRECATE char *  __cdecl strcpy(char *, const char *);
    _CRTIMP errcode __cdecl strcpy_s(char *, size_t, const char *);
    Are you sure the standard hasn't been updated recently to incorporate these changes?
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  15. #15
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    From Annex C of the ISO/IEC 14882 C++ standard which I just downloaded last month

    The C + + Standard library provides 209 standard functions from the C library, as shown in Table 99:
    Table 99—Standard Functions
    _ __________________________________________________ ________________________
    abort fmod isupper mktime strftime wcrtomb
    abs fopen iswalnum modf strlen wcscat
    acos fprintf iswalpha perror strncat wcschr
    asctime fputc iswcntrl pow strncmp wcscmp
    asin fputs iswctype printf strncpy wcscoll
    atan fputwc iswdigit putc strpbrk wcscpy
    atan2 fputws iswgraph putchar strrchr wcscspn
    atexit fread iswlower puts strspn wcsftime
    atof free iswprint putwc strstr wcslen
    atoi freopen iswpunct putwchar strtod wcsncat
    atol frexp iswspace qsort strtok wcsncmp
    bsearch fscanf iswupper raise strtol wcsncpy
    btowc fseek iswxdigit rand strtoul wcspbrk
    calloc fsetpos isxdigit realloc strxfrm wcsrchr
    ceil ftell labs remove swprintf wcsrtombs
    clearerr fwide ldexp rename swscanf wcsspn
    clock fwprintf ldiv rewind system wcsstr
    cos fwrite localeconv scanf tan wcstod
    cosh fwscanf localtime setbuf tanh wcstok
    ctime getc log setlocale time wcstol
    difftime getchar log10 setvbuf tmpfile wcstombs
    div getenv longjmp signal tmpnam wcstoul
    exit gets malloc sin tolower wcsxfrm
    exp getwc mblen sinh toupper wctob
    fabs getwchar mbrlen sprintf towctrans wctomb
    fclose gmtime mbrtowc sqrt towlower wctrans
    feof isalnum mbsinit srand towupper wctype
    ferror isalpha mbsrtowcs sscanf ungetc wmemchr
    fflush iscntrl mbstowcs strcat ungetwc wmemcmp
    fgetc isdigit mbtowc strchr vfprintf wmemcpy
    fgetpos isgraph memchr strcmp vfwprintf wmemmove
    fgets islower memcmp strcoll vprintf wmemset
    fgetwc isprint memcpy strcpy vsprintf wprintf
    fgetws ispunct memmove strcspn vswprintf wscanf
    floor isspace memset strerror vwprintf _ __________________________________________________ ________________________

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. parsing command line strings
    By John_L in forum C Programming
    Replies: 15
    Last Post: 05-28-2008, 08:26 AM
  2. sscanf and parsing strings
    By jco1323 in forum C Programming
    Replies: 4
    Last Post: 02-20-2008, 05:32 PM
  3. Parsing Strings
    By SubLogic in forum C++ Programming
    Replies: 15
    Last Post: 01-07-2003, 10:11 AM
  4. Searching and Comparing Strings Using Parsing
    By niroopan in forum C++ Programming
    Replies: 3
    Last Post: 09-28-2002, 10:18 AM
  5. parsing delimited strings
    By Unregistered in forum C++ Programming
    Replies: 4
    Last Post: 11-08-2001, 11:57 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21