how to parse a string

This is a discussion on how to parse a string within the C Programming forums, part of the General Programming Boards category; Which means, if you'll pardon the C idiom, that you need something like Code: while ((*p != *strDelimiter) && *(++strDelimiter)); ...

  1. #31
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Which means, if you'll pardon the C idiom, that you need something like
    Code:
    while ((*p != *strDelimiter) && *(++strDelimiter));
    to get through all the characters of strDelimiter.

    It also appears you're missing the "it doesn't count if it starts that way" rule -- strtok skips all appearances of the delimiter character at the beginning of the string. (You skip one character -- regardless of whether it's a delimiter or not -- with the while p++ thing, but the space is still given in the tokenized version as well.) You can argue that that's the way that you want it, but if you add two spaces to the front of your test string you won't be happy with the results.

    ETA: Well, we can't use strDelimiter in that while loop up there, since that was our original. But we can do it with a copy.
    Last edited by tabstop; 04-20-2008 at 02:58 PM.

  2. #32
    C++まいる!Cをこわせ! Elysia's Avatar
    Join Date
    Oct 2007
    Posts
    22,543
    I suppose. But then again, I don't think that's a good idea.
    If that's what you want, then trim the string first.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  3. #33
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    And look what happens when your test string is
    Code:
    "This   is   a   test   string"
    That actually managed to clobber the original string (on my machine at any rate). Not being able to deal with several spaces in a row seems like a bad thing.

  4. #34
    C++まいる!Cをこわせ! Elysia's Avatar
    Join Date
    Oct 2007
    Posts
    22,543
    I didn't test that, but I will now.
    Nope, works fine for me.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  5. #35
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Elysia View Post
    If that's what you want, then trim the string first.
    But then we're back at "you may need to make a copy of the original before you do that"...

    If you are going to replace a standard function with a "better one", then it should do what the original function does, and be better. Not "my own interpretation of what the original function does". Of course, it's fine if you are only using it for your own purpose and under such conditions where the difference between "my interpretation" and the standard aren't noticeable. But expect criticism if you say "Look, I wrote a better version of function X", and it's not actually doing the same as the original.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #36
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Quote Originally Posted by Elysia View Post
    I didn't test that, but I will now.
    Nope, works fine for me.
    Really? I took a guess and put it into Visual Studio, added the extra spaces in the string, and got Run-Time Check Failure #2 - Stack around the variable 'str' corrupted. (Of course, that comes from your while loop not checking index bounds, not from the strtokv3 itself.)

  7. #37
    C++まいる!Cをこわせ! Elysia's Avatar
    Join Date
    Oct 2007
    Posts
    22,543
    Quote Originally Posted by matsp View Post
    But then we're back at "you may need to make a copy of the original before you do that"...

    If you are going to replace a standard function with a "better one", then it should do what the original function does, and be better. Not "my own interpretation of what the original function does". Of course, it's fine if you are only using it for your own purpose and under such conditions where the difference between "my interpretation" and the standard aren't noticeable. But expect criticism if you say "Look, I wrote a better version of function X", and it's not actually doing the same as the original.

    --
    Mats
    Alright, I'll just squeeze in
    Code:
    	while (p < strEnd && *p == ' ') p++;
    Between:
    Code:
    	const char* p = strToSearch;
    And
    Code:
    	while (p++ < strEnd)
    And no, sorry tabstop, but I couldn't reproduce it, but the code should check for out of bounds. If you can give a better test case or make a small fix yourself.
    Maybe I'll give it another try tomorrow...
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  8. #38
    Registered User whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    7,666
    So Elysia stated something to the effect that strtok the the devil and asked if there was an alternative. While I can understand this, I'm a bit upset: In fact, alternatives were given lip service on page one. Emphasis mine, since some people can't be asked to read before they open their mouths apparently.

    Quote Originally Posted by citizen View Post

    [...]

    strtok is an okay solution for simple patterns like this I suppose, though there might be issues, especially if you have blank fields (lines with just ":"). If that is the case, you might want to look toward sscanf or strcspn for a parser.
    You can view the entire post by clicking the quote arrow. Most of strtok's problems are solved if you simply dupe the string first (except it will still use a static buffer and it will still puke on empty fields). Now I'm sorry I didn't mention it.

    Dave Sinkula's tried to spread his alternative to strtok around as well, if anyone's inclined to look.
    Last edited by whiteflags; 04-20-2008 at 07:03 PM.

  9. #39
    C++まいる!Cをこわせ! Elysia's Avatar
    Join Date
    Oct 2007
    Posts
    22,543
    I reacted on the "how do I know if the delimiter was..." question since strtok would destroy them, as evil as it is. I then asked for alternative solutions, but since noone seemed able to offer any, I proceeded to write a little small utility of my own.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  10. #40
    Registered User
    Join Date
    Jan 2008
    Posts
    569
    okay I found another problem:

    cat : dog mouse mice hot

    if I do a while (strtok(NULL, " ")); then I get : after I got cat

    but for this one

    cat:dog mouse mice hot

    I got dog

    so this is not consistent..

  11. #41
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    I don't see how that's possible. The first token for the second string should be cat:dog (since it doesn't break until the space), unless you did the first token differently.

    If you used " :" for the first tokenizer, then that's what you should get -- the first token of the first string was "cat", since it stopped at the space; but the second time you didn't worry about the colon, so that became a legitimate token the second time. (If you had used " :" for the tokenizer in the while loop as well, you wouldn't have gotten ":", but "dog".)

  12. #42
    Registered User whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    7,666
    > so this is not consistent..

    You're positive? The delimiter string matters. " " is not the same as ": " at all. If you have a string delimited by spaces like

    cat : dog mouse mice hot

    Then the substring ":" is just as much of a token as the other words.

    On the other hand, if you used the other delimiter string, then you get "cat " along with the other words. So now you have a problem with trailing whitespace. You can decide if this matters and fix it quite easily.

    If you've learned about strings, then you know "cat " is just
    Code:
    { 'c', 'a', 't', ' ', 0 }
    So, if it matters, make a function that can trim excess whitespace as an exercise, or use one of the alternatives to strtok already discussed.

  13. #43
    Registered User
    Join Date
    Jan 2008
    Posts
    569
    okay so here's basically what I wan't:

    cat : dog mouse mice hot

    I want to be able to get cat up to the : but I want to eliminate the extra space that cat has if any,

    so cat_ shoud be just cat

    if it doesn't find the ":" then it doesn't find cat

  14. #44
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    So, post your code.

    We can sit here and guess 'til we are blue in the face, but you could have done 118 other things...

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  15. #45
    Registered User
    Join Date
    May 2008
    Posts
    5

    Post

    Just for grins and because I actually enjoy parsing text, here's my rewrite of a better strtok.

    Code:
    char *better_strtok(char *str, char *delim, char *result, int nresult) {
    	char *start = str;
    	while (*start && strchr(delim, *start)) start++;
    	if (*start == '\0') return NULL;
    	char *ptr = start;
    	while (1) {
    		if (strchr(delim, *ptr)) {
    			assert(nresult >= ptr - start + 1);
    			strncpy(result, start, ptr - start);
    			result[ptr - start] = '\0';
    			return ptr;
    		}		
    		ptr++;
    	}
    }
    
    int main(void) {	
    	char *str = "this : is    :   a:test:123";
    	char *ptr = str;
    	char buf[16];
    	while (ptr = better_strtok(ptr, ": ", buf, sizeof(buf))) {
    		printf("> &#37;s\n", buf);
    	}
    }
    
    > this
    > is
    > a
    > test
    > 123
    Last edited by fredb; 05-21-2008 at 12:06 PM. Reason: Removed an unnecessary conditional

Page 3 of 4 FirstFirst 1234 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Interpreter.c
    By moussa in forum C Programming
    Replies: 4
    Last Post: 05-28-2008, 05:59 PM
  2. Calculator + LinkedList
    By maro009 in forum C++ Programming
    Replies: 20
    Last Post: 05-17-2005, 12:56 PM
  3. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 02:23 PM
  4. creating class, and linking files
    By JCK in forum C++ Programming
    Replies: 12
    Last Post: 12-08-2002, 01:45 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21