Thread: a question about strtok.

  1. #1
    بابلی ریکا Masterx's Avatar
    Join Date
    Nov 2007
    Location
    Somewhere nearby,Who Cares?
    Posts
    497

    a question about strtok.

    hello, im trying to tokenise a string with 2 delimiters , ' '(space ) and ':'
    , well how is it possible to understand which delimiters has just used?!
    for example consider the following codes:

    Code:
    
    #include <cstdio>
    #include <cstring>
    using namespace std;
     int main ()
     {
             char string1[] ="start: RED X 12";
             char  * pointer;
             string label[10];
             printf ("Splitting string start :RED X 12");
             pointer = strtok (string1," :");
             while (pointer != NULL)
              {
                      /* Note that the delimiters (space, :) */ 
                     /* are not themselves tokenized. */
                      
                      printf ("&#37;s\n", pointer);
                      pointer = strtok (NULL, " :");
                }
                return 0;
    }

    please note that the sample is in C, just to show you what i mean ( just modified a sample when i googled strtok()

    ok , here as you see, i want to know how i can understand which delimiter is used so that if ':' is used the first token goes to a varible "Label" stating its a label
    and if ' ' is used , the second token goes to a varible "Command" stating its a command, and so on .

    the problem is i have no idea when they are used! the only way i can think of it now , is using a search function on the string and if there is a ':' the first token goes to the varible label and the same goes to the command , but you know i think it would be better no to do it this way, that why im asking if it is possible to understand which one is used!

    and by the way how is it possible to do it all in a loop that is tokenizing the string?

    tanx
    Last edited by Masterx; 11-10-2008 at 12:58 PM.
    Highlight Your Codes
    The Boost C++ Libraries (online Reference)

    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.."
    Bill Bryson


  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You probably need to use "more clever" parsing than strtok. Depending on the exact specification for the assembler syntax, you could possibly use strtok with space as the separator - that works if "label:mov r 7" is not valid - that is, you need a space between "label:" and "mov".

    If a space isn't required, then you will need to "check" if : is present on the line, and if so, if it's BEFORE the first space (otherwise, it may be somehting like mov r ':' or some comment containing : perhaps). I personally would probably just write the whole thing using a loop that checks if the character is any of the separators. You can use strchr() if there are many characters to check, e.g strchr(" :\t,", str[x]) will return NULL if str[x] is not one of the characters.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Tokenise with " ", then if the current token ends with ':', remove it and call it a label.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    بابلی ریکا Masterx's Avatar
    Join Date
    Nov 2007
    Location
    Somewhere nearby,Who Cares?
    Posts
    497
    Quote Originally Posted by matsp View Post
    You probably need to use "more clever" parsing than strtok. Depending on the exact specification for the assembler syntax, you could possibly use strtok with space as the separator - that works if "label:mov r 7" is not valid - that is, you need a space between "label:" and "mov".

    If a space isn't required, then you will need to "check" if : is present on the line, and if so, if it's BEFORE the first space (otherwise, it may be somehting like mov r ':' or some comment containing : perhaps). I personally would probably just write the whole thing using a loop that checks if the character is any of the separators. You can use strchr() if there are many characters to check, e.g strchr(" :\t,", str[x]) will return NULL if str[x] is not one of the characters.

    --
    Mats
    great tip , tanx Mats , i didnt think of that! ok , ill give it try, and will tell you the result.
    well it seems its getting more and more complex!
    by the way does strchr() belong to the <cstring> header file?
    Quote Originally Posted by Salem View Post
    Tokenise with " ", then if the current token ends with ':', remove it and call it a label.
    many tanx salem,
    i really appreciate your answers
    Highlight Your Codes
    The Boost C++ Libraries (online Reference)

    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.."
    Bill Bryson


  5. #5
    بابلی ریکا Masterx's Avatar
    Join Date
    Nov 2007
    Location
    Somewhere nearby,Who Cares?
    Posts
    497
    well it seems i got problems figuring these out properly now!
    ill think about solutions and i ll tell you if it was successful.
    my mind has got completely paralyzed! (mixed up)
    Highlight Your Codes
    The Boost C++ Libraries (online Reference)

    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.."
    Bill Bryson


  6. #6
    Registered User
    Join Date
    Apr 2006
    Posts
    137
    If you're looking for some code to separate your text strings based on delimiters or separators, you can use the Perfect C++ String explode/split for ease of use.

    You should separate one first, such as the Colon :. Then separate the spaces. And you can then tell which one was available depending on how many became separated.
    ★ Inferno provides Programming Tutorials in a variety of languages. Join our Programming Forums. ★

  7. #7
    بابلی ریکا Masterx's Avatar
    Join Date
    Nov 2007
    Location
    Somewhere nearby,Who Cares?
    Posts
    497
    Quote Originally Posted by execute View Post
    If you're looking for some code to separate your text strings based on delimiters or separators, you can use the Perfect C++ String explode/split for ease of use.

    You should separate one first, such as the Colon :. Then separate the spaces. And you can then tell which one was available depending on how many became separated.
    many tanx, thats a great help to me ! i needed such a function . many tanx

    by the way, ive already made a function that would tokenize a string !
    its not working properly and i would like to consult you on this function!
    the code is "
    Code:
    #include <iostream>
    #include <cstring>
    #include <vector>
    #include <string>
    #include <iomanip>
    using namespace std;
    
    int main()
    {
            char string[]={"Start : ADD 5"};
            char token[]="Start";
            char r[15]={0};//result string, this array will store new modified string 
    
    with out token
            int scounter=0;
            int tcounter=0;
            int rcounter=0;//result counter, starts from zero
            bool check;
    
            for (scounter=0;string[scounter]!=0;scounter++)
            {
                    if (string[scounter]==token[0])
                    {
                            for (tcounter=1;token[tcounter]!='\0';tcounter++)
                            {
                                    if (string[scounter+tcounter]!=token[tcounter])
                                    {
                                            check = false;//first character to be saved
                                            break;
                                    }
                                   else
                                            check = true;//means the current character is found
                            }
                                    if (check)
    
                                            scounter +=tcounter;//go to check the next character
                                  
                    }
                    else//if the current element of string is not the first element of token
                            r[rcounter++]=string[scounter];
            }
    
            for (int k=0;string[k]!='\0';k++)
            cout<<string[k];
    
            cout<<endl;
    
            for (int m=0;m<15;m++)
            cout<<r[m];
    
            cout<<endl;
    
            return 0;
    }
    well first of all , the irony is that , it doesnt show ':'! i dont know why!! when i add a space between ':' an token(here "Start") it works .but normaly it doesnt print that!

    the second problem with this function is " it is case sensitive' how to disable this function! ?

    and what are the other probable mistakes in this tokenizer?
    and what is your own sulotion for such a function ( how would you implement such a function)
    is there anyway to use a function from standard library that does the same job ?

    tanx alot !
    ( the problem is i dont know about strings much( ive just reached chapter 8 form Deitels book) , and i havent mastered the other topics of C++, thats why im having so many problems in figuring stuff out! tanx for your paitience and also your kind answers and sulotions )
    Highlight Your Codes
    The Boost C++ Libraries (online Reference)

    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.."
    Bill Bryson


  8. #8
    Registered User
    Join Date
    Apr 2006
    Posts
    137
    So are you trying to rewrite strtok?

    If you are you can use string class, and just use find_first_of, and replace, and substr functions of string class to do it easily. You don't need to make so many loops.
    ★ Inferno provides Programming Tutorials in a variety of languages. Join our Programming Forums. ★

  9. #9
    Registered User
    Join Date
    Oct 2008
    Posts
    55
    Here's a strtok replacement I wrote that returns non-space delimiters as tokens also. Spaces are automatically considered delimiters but are not returned.
    Code:
    #include <iostream>
    #include <cstring>
    using namespace std;
    
    //  mystrtok()
    //    Works similarly to strtok but returns delimiters as
    //    tokens too. Also assumes all space chars (as determined
    //    by isspace) are delimiters but does not return space tokens.
    
    char *mystrtok( char *str, char *delim)
    {
        static char *s, last_delim;
        char *start;
    
        if (str) // If str not NULL, init s
            s = str;
        else {
            // if last delim was null char, there are no more tokens
            if (last_delim == 0) return 0;
            *s = last_delim; // replace last delim
        }
    
        while (isspace(*s)) ++s; // skip spaces
        if (!*s) return 0;       // if null char reached, return 0
    
        for ( start = s; *s; ++s) // start points to token start
            if (isspace( *s) || strchr( delim, *s))
                break;
    
        if (start == s) ++s; // if one char, inc s
        last_delim = *s;     // save delimiter
        *s = 0;              //   and set it to null
        return start;        // return start of token
    }
    
    int main()
    {
        char *res, str[256];
    
    //    strcpy( str, "start: RED X 12");
        strcpy( str, "  add :  x, y \n ;jmp: 1234 ");
        cout << '|' << str << "|\n";
    
        res = mystrtok( str, ",;:");
        while (res) {
            cout << '|' << res << "|\n";
            res = mystrtok( NULL, ",;:");
        }
    }

  10. #10
    بابلی ریکا Masterx's Avatar
    Join Date
    Nov 2007
    Location
    Somewhere nearby,Who Cares?
    Posts
    497
    Quote Originally Posted by nucleon View Post
    Here's a strtok replacement I wrote that returns non-space delimiters as tokens also. Spaces are automatically considered delimiters but are not returned.
    Code:
    #include <iostream>
    #include <cstring>
    using namespace std;
    
    //  mystrtok()
    //    Works similarly to strtok but returns delimiters as
    //    tokens too. Also assumes all space chars (as determined
    //    by isspace) are delimiters but does not return space tokens.
    
    char *mystrtok( char *str, char *delim)
    {
        static char *s, last_delim;
        char *start;
    
        if (str) // If str not NULL, init s
            s = str;
        else {
            // if last delim was null char, there are no more tokens
            if (last_delim == 0) return 0;
            *s = last_delim; // replace last delim
        }
    
        while (isspace(*s)) ++s; // skip spaces
        if (!*s) return 0;       // if null char reached, return 0
    
        for ( start = s; *s; ++s) // start points to token start
            if (isspace( *s) || strchr( delim, *s))
                break;
    
        if (start == s) ++s; // if one char, inc s
        last_delim = *s;     // save delimiter
        *s = 0;              //   and set it to null
        return start;        // return start of token
    }
    
    int main()
    {
        char *res, str[256];
    
    //    strcpy( str, "start: RED X 12");
        strcpy( str, "  add :  x, y \n ;jmp: 1234 ");
        cout << '|' << str << "|\n";
    
        res = mystrtok( str, ",;:");
        while (res) {
            cout << '|' << res << "|\n";
            res = mystrtok( NULL, ",;:");
        }
    }
    many tanx . really tanx
    Highlight Your Codes
    The Boost C++ Libraries (online Reference)

    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.."
    Bill Bryson


  11. #11
    بابلی ریکا Masterx's Avatar
    Join Date
    Nov 2007
    Location
    Somewhere nearby,Who Cares?
    Posts
    497
    hello, i have couple of questions, would any one do me a favor and answer them?
    is there any built in function in standard C++ that checks whether a string is a character based string or is just integer( it only includes digits)?
    if not how can we understand a character is an integer or not !(checking each digit with (0~9) would do it , but is there any better quicker way to find that out?

    how can we copy the string (char * ptr) into an array of char?

    how can we understand if a number in decimal form is octal? (e.g 8
    1 2 3 4 5 6 7 9 10 ( 8 is octal and omitted)
    12 13 14 15 17 19 20 ( 16 and 18 are octal)
    Highlight Your Codes
    The Boost C++ Libraries (online Reference)

    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.."
    Bill Bryson


  12. #12
    بابلی ریکا Masterx's Avatar
    Join Date
    Nov 2007
    Location
    Somewhere nearby,Who Cares?
    Posts
    497
    double post! edited
    Last edited by Masterx; 11-13-2008 at 03:32 AM.
    Highlight Your Codes
    The Boost C++ Libraries (online Reference)

    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.."
    Bill Bryson


  13. #13
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    >> is there any built in function in standard C++ that checks whether a string is a character based string or is just integer( it only includes digits)? <<
    If you really have no idea what data you need, you aren't ready to code a solution. OTOH, if you're expecting an integer, try converting it from text with one of the snippets in the FAQ. By the way, you really ought to try reading the FAQ by now or finding some relevant threads on your own with search.

    >> how can we copy the string (char * ptr) into an array of char?[
    One guess...

    >> how can we understand if a number in decimal form is octal? <<
    Octal numbers are written with an initial zero, always. 031 is octal for 25, as the lame pun goes.

  14. #14
    بابلی ریکا Masterx's Avatar
    Join Date
    Nov 2007
    Location
    Somewhere nearby,Who Cares?
    Posts
    497
    Quote Originally Posted by citizen View Post
    >> is there any built in function in standard C++ that checks whether a string is a character based string or is just integer( it only includes digits)? <<
    If you really have no idea what data you need, you aren't ready to code a solution. OTOH, if you're expecting an integer, try converting it from text with one of the snippets in the FAQ. By the way, you really ought to try reading the FAQ by now or finding some relevant threads on your own with search.

    >> how can we copy the string (char * ptr) into an array of char?[
    One guess...

    >> how can we understand if a number in decimal form is octal? <<
    Octal numbers are written with an initial zero, always. 031 is octal for 25, as the lame pun goes.
    tanx for your reply .
    because im working on a string that is a combination of both! and im trying to split them so ive got to understand an thus know which is which!
    comparing each character with (0~9) would do it. i think this can help me distinguish them and then do the appropriate operation! but i just wanted to see if there any other function that does such a job!

    well about the second question , im trying to get up to n elements stored in ptr (with the format of char * ptr). i think strcpy() copies the whole stuff! i dont need that! is there any other versions of this function?

    and about question number 2. maybe i am the one who didnt understand you, but lets clear it a bit more:
    see there is a program that counts the decimal number , now how can we understand that the current number in its decimal form (for example 8 ) is considered as it is in a octal based system !
    i mean see, in cotal base systems if we want to count , we count like this "
    1 2 3 4 5 6 7 9 10 ... ,.
    to cut the long story short . im tryinhg to simulate an octal base and use it in my project .so any help on this?
    Highlight Your Codes
    The Boost C++ Libraries (online Reference)

    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.."
    Bill Bryson


  15. #15
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Lets start with the "octal" bit. First of all it's 1, 2, 3, 4, 5, 6, 7, 10, 11, ... there's no 8 OR 9 in the number system.

    Second, in the computer, numbers are stored in binary. Octal and hex are popular ways to represent binary numbers so that they can be human readable and still easy to translate back to binary. This works because in octal, 3 bits (binary digits) are represented by one octal digit (0..7) and in hex 4 bits are represented by one hex digit. So it's very easy for a human to work out that 031 in octal is 000 011 001. We can also do the same the other way, using hex: 0 0001 1001 becomes 19.

    Decimal is much harder to deal with because the only way to do it is to divide by 2 or 10 depending on which way we're going, until there is nothing more to divide. This is because 10 is not a "nice" number of bits - each multiple of ten is approximately 3.3 bits... It would have been MUCH easier if we humans were born like cartoon characters (e.g. the Simpsons), with one thumb and three fingers. Then we would have used octal as our number base in the first place, and we wouldn't have had this problem.

    As to knowing if something is a number or not, you probably really want to put that into a function that resolves the symbols into numbers too, since you don't really care which they are when you see a particular instruction - you just want to translate it into a number, whcihever it is. Note that for labels that are forwards in the code, you will need a two-pass approach, so you read the entire file once, store away where each instruction belongs, and what the value of each label is, and then pass through it again and flll in any "gaps".

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. another do while question
    By kbpsu in forum C++ Programming
    Replies: 3
    Last Post: 03-23-2009, 12:14 PM
  2. 20q game problems
    By Nexus-ZERO in forum C Programming
    Replies: 24
    Last Post: 12-17-2008, 05:48 PM
  3. strtok question
    By neandrake in forum C++ Programming
    Replies: 1
    Last Post: 11-18-2003, 03:51 AM
  4. Question...
    By TechWins in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 07-28-2003, 09:47 PM
  5. opengl DC question
    By SAMSAM in forum Game Programming
    Replies: 6
    Last Post: 02-26-2003, 09:22 PM