Thread: String Tokenizing

  1. #1
    Registered User
    Join Date
    May 2003
    Posts
    11

    String Tokenizing

    I currently have a string delimited with pipes:
    i.e. : pMsg = str1|str2|str3|str4

    This string is tokenized using strtok. As long as all the strings are not empty, this works fine. After strtok is complete, I have the following:

    token1 = str1
    token2 = str2
    token3 = str3
    token4 = str4

    BUT, I want to make it so I can have an empty string as one of the tokens in the string.
    i.e. : pMsg = |str2|str3|str4

    strtok seems to skip over the first pipe in this instance and assigns str2 to token1. Everything is getting bumped up one spot.

    token1 = str2
    token2 = str3
    token3 = str4
    token4 =

    My goal is to still read token1 in as an empty string, and thus resulting in the following:

    token1 =
    token2 = str2
    token3 = str3
    token4 = str4


    Any ideas????????

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Use boost::string_tokenizer (I think it does that) or my split_string function.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    Magically delicious LuckY's Avatar
    Join Date
    Oct 2001
    Posts
    856
    It is recommended that you use something as an alternate to strtok() since it destroys the string... Try writing something on your own : ) it might be a good excercise for you.

  4. #4
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    strtok() doesn't destroy the string, but it isn't properly reentrant and not thread-safe unless programmed that way.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #5
    Just a Member ammar's Avatar
    Join Date
    Jun 2002
    Posts
    953
    Yes, strtok() doesn't destroy the string!
    and it's a good solution for his simple case...

    >>irncty99
    I think the problem is in your code, you might not be using strtok() right, maybe if you post some code it will help!
    none...

  6. #6
    Magically delicious LuckY's Avatar
    Join Date
    Oct 2001
    Posts
    856
    Originally posted by CornedBee
    strtok() doesn't destroy the string, but it isn't properly reentrant and not thread-safe unless programmed that way.
    Perhaps it's a matter of opinion and what your definition of "destroy" is, but strtok() replaces characters in your string with NULL bytes and that fits my description..

    Pardon me, but what do you mean by "it isn't properly reentrant?"

  7. #7
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    >>strtok() doesn't destroy the string
    Err.. yeah, it does.

    >>what do you mean by "it isn't properly reentrant?"
    Readme
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  8. #8
    Magically delicious LuckY's Avatar
    Join Date
    Oct 2001
    Posts
    856
    Originally posted by Hammer
    >>strtok() doesn't destroy the string
    Err.. yeah, it does.
    Thank you.. Seeing two people in a row stating otherwise made me start feeling like I'd just entered the twilight zone...

    Thanks for the link.

  9. #9
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    The evidence (and documentation) speaks for itself:
    Code:
    #include <iostream>
    using namespace std;
    
    int main(void)
    {
      char buf[] = "This is a string";
      char *p;
      
      cout <<"buf is >" <<buf <<"<" <<endl;
      p = strtok (buf, " ");
      cout <<"p is >" <<p <<"<" <<endl;
      cout <<"buf is >" <<buf <<"<" <<endl;
    }
    
    /*
     * Program output:
    buf is >This is a string<
    p is >This<
    buf is >This<
    */
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  10. #10
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Hmmm, I was sure it stored a copy internally.

    Apparently it's only a pointer. Well, I never cared about strtok anyway.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #11
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Originally posted by Hammer
    [B>>what do you mean by "it isn't properly reentrant?"
    Readme [/B]
    I mean that you can only tokenize one string at a time.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  12. #12
    Magically delicious LuckY's Avatar
    Join Date
    Oct 2001
    Posts
    856
    Originally posted by CornedBee
    Hmmm, I was sure it stored a copy internally.

    Apparently it's only a pointer. Well, I never cared about strtok anyway.

    I mean that you can only tokenize one string at a time.
    I was going to point out, but I'm sure that you have already realized that you can in fact tokenize more than one string at a time (although I don't know why you would intersperse the tokenizing of two strings).

    FYI, the way strtok() works is it starts from the given address and moves up the array until a token is located. Once it is, it is replaced with a NULL character and a pointer to the character after the token is returned so that you can continue with your next call... Thus, the destroying of your source string.
    Last edited by LuckY; 05-07-2003 at 08:25 AM.

  13. #13
    Registered User
    Join Date
    Feb 2002
    Posts
    329
    Here's a string tokenizer function. I didn't test it, but i think it should work..
    Code:
    /*****************************\
     * String tokenizer          *
     * Input:                    *
     *  inStr		->Input string   *
     *  p				->Position array *
     *  chDelim	->Delimiter      *
    \*****************************/
    int fixInStr(char *inStr, short *p, char chDelim){
    	char *p1=inStr;
    	int i(-1);
    
    	if(*p1==chDelim){	// Handle delimiter in pos. 0
    		*p1=0;
    		p[++i]=0;
    	}
    	while((p1=strchr(p1+1, chDelim))){	// Find all tokens
    		*p1=0;
    		p[++i]=(p1-inStr)+1;
    	}
    	p1=0;
    
    	return(i+1);
    }
    
    // Test
    int main(){
    	char achTest[] = "Test|of|string|tokenizer";
    	short posArray[10], i(0), j(0);
    	if((i=strtok_1(achTest, &posArray, '|'))){
    		cout<<"Result\n";
    		while(j<i)
    			cout<<achTest[j++]<<endl;
    	}
    	return(0);
    }
    Last edited by knutso; 05-07-2003 at 01:54 AM.

  14. #14
    Registered User
    Join Date
    May 2003
    Posts
    11
    This is what I'm doing:

    /*CODE

    char* pToken = strtok(pMsg, "|");
    if (pToken)
    token1 = pToken;
    else
    token1 = "";

    pToken = strtok(NULL, "|");
    if (pToken)
    token2 = pToken;
    else
    token2 = "";

    pToken = strtok(NULL, "|");
    if (pToken)
    token3 = pToken;
    else
    token3 = "";

    pToken = strtok(NULL, "|");
    if (pToken)
    token4 = pToken;
    else
    token4 = "";

    */CODE

    My goal his is to allow for blank fields. When I have a pMsg missing one of its fields (i.e.: pMsg = |field2|field3|field4), I want token1 to get assigned the empty string instead of field2. strtok is skipping over that first delimiter and assigning field2 to token1 and field3 to token2 and so on. It's shifting everything up one. Not recognizing the blank field.

    Thanks to all who have replied to this point. I appreciate it.

  15. #15
    Registered User
    Join Date
    Feb 2002
    Posts
    329
    Then do it this way:

    Code:
    if(*pMsg=='|'){	// Handle blank #1 strings
    	strtok(pMsg, "|");
    	token1 = "";
    }else
    	token1 = strtok(pMsg, "|");
    	
    // Next
    pToken = strtok(NULL, "|");
    token2 = ((pToken-token1)>1)?pToken:"";
    etc..

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Inheritance Hierarchy for a Package class
    By twickre in forum C++ Programming
    Replies: 7
    Last Post: 12-08-2007, 04:13 PM
  2. String issues
    By The_professor in forum C++ Programming
    Replies: 7
    Last Post: 06-12-2007, 09:11 AM
  3. Compile Error that i dont understand
    By bobthebullet990 in forum C++ Programming
    Replies: 5
    Last Post: 05-05-2006, 09:19 AM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM