Thread: strtok() help

  1. #1
    Registered User
    Join Date
    Aug 2012
    Location
    Utrecht, Netherlands
    Posts
    18

    strtok() help

    Hey guys,

    I'm trying to convert a big chunk of Java code into C, and ran into a little problem with a part of it (been a while since I touched C).

    I'm trying to tokenize a string, for example ''a b c d e", using strtok(line, " "); , however I need to put each of the tokens into a different array( result should be [a,b,c,d,e] ). However I don't know how to determine the amount of tokens I will have per line, so I cannot initialize a correctly sized array.

    Is there a simple way to figure out how many tokens I'm looking at? (I'd prefer not to use dynamic arrays, seems like an overkill...and of course I can pass through every line twice, noting the amount of tokens each time, but that's horribly ugly and inefficient).

    Thanks for any input!
    ~Ota

  2. #2
    Registered User
    Join Date
    Aug 2012
    Location
    Utrecht, Netherlands
    Posts
    18
    Update:

    Well, I realized I can just count the spaces and that number+1 is my amount of tokens...better then re-tokenizing it twice, but still seems ugly.

  3. #3
    Registered User piyush.sharma's Avatar
    Join Date
    Aug 2012
    Location
    Noida, India
    Posts
    9
    Quote Originally Posted by otaconBot View Post
    I'd prefer not to use dynamic arrays, seems like an overkill
    What is plan after counting number of spaces ? If you don't want dynamic, how will you manage this with static (means if you have larger number of tokens) ?

  4. #4
    Registered User
    Join Date
    Aug 2012
    Location
    Utrecht, Netherlands
    Posts
    18
    Lets say I'm reading in a file:

    a b c d
    e f
    g h i

    I go through it line by line, getting a string line = "a b c d", "e f", "g h i" respectively per every fgets().

    If I know the amount of spaces+1 , 4,2,3 respectively, I can initialize static arrays that will hold them, resulting in a final nested array:

    [ [a,b,c,d] , [e,f] , [g h i] ], which is what I need for further processing.


    So I don't think I need dynamic arrays for that, but with the counting I'm still iterating through everything twice, and in my case its huge amounts of data, so that's quite inefficient.

    EDIT:

    Basically my goal is to translate a file of form

    a b c d
    e f
    g h i

    into an array structure

    [
    [a,b,c,d]
    [e,f]
    [g,h,i]
    ]

    I'm doing it by going through each line of the file, and then trying to tokenize it over the space delimiter. The problem is since I don't know how many tokens I have per line, I cannot initialize the array ( I know how many lines I have, just not how many tokens per line, so I can initialize the 'large' array)
    Last edited by otaconBot; 08-07-2012 at 06:31 AM. Reason: Clearing up what I'm trying to do better

  5. #5
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    A bit more elegant:

    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main(void) {
       int tokenNum=0, len;
       char line[]={"a b c d e"};
    
       len = strlen(line);
       tokenNum = (len+1)/2;
       
       printf("Line: %s, length: %d, tokenNum: %d\n",line, len, tokenNum);
          
       return 0;
    }
    Note that the above requires a string of char that is strictly formatted as you've described - alternating one char and one space, and no newlines or other punctuation present.

    If you are using fgets, you are adding a newline to the string. Remove it by adding this line of code:

    Code:
    if(line[len]-1=='\n)
       line[len]-1= '\0';
    Which will overwrite the newline, and make things work correctly again. Put this in, right after the "len = strlen(line);" line of code.
    Last edited by Adak; 08-07-2012 at 06:38 AM.

  6. #6
    Registered User
    Join Date
    Aug 2012
    Location
    Utrecht, Netherlands
    Posts
    18
    Thank you Adak, your solution is quite clever, but I did oversimplify the problem a bit, and while the space delimiter is always there, I cannot assume the length of the token, so the little math trick won't work . More specifically here is a part of the data I'm going through:

    -29 -26 -48 0
    58 -10 46 -51 0
    -22 45 33 -36 10 0
    -39 5 -37 4 0
    -10 33 22 36 -45 0
    23 -11 62 -29 -12 0
    -42 -59 63 -15 36 0
    14 8 -18 -31 0
    43 6 -57 56 0
    -40 54 1 11 0
    -18 -3 -9 -41 -19 0
    4 44 46 -16 0
    -55 47 7 -43 -20 0

    and ultimately I need an int[][] array:

    [
    [29,26,48]
    [58,10,46,51]
    ...
    [55,47,7,43,20]
    ]

    Of course I can handle the casting, abs() etc.. just the matter of getting the data in a correctly structured array is where I'm having issues

    (if anyone is interested its part of a CNF file format with a SAT Problem benchmark inside of it, but thats irrelevant to my issue)
    Last edited by otaconBot; 08-07-2012 at 06:45 AM.

  7. #7
    Registered User piyush.sharma's Avatar
    Join Date
    Aug 2012
    Location
    Noida, India
    Posts
    9
    What if you take a large size buffer that will be enough for storing tokens temporarily.

    My idea is :
    Code:
    global buffer;
    int functionToTokenize(char * str)
    {
    while()
    {
    store tokens into buffer and increase a counter;
    }
    return length of the token as it will help you to declare static array, now you know number of tokens;
    }
    This is my rough idea, but might help you. it may be efficient.
    If it is hard to estimate the size of buffer, make it dynamic and allocate space, in between you need more space, simply reallocate at that moment.
    What you say about it ?

  8. #8
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    I had a function that counted commas in a string -> You can change it and use it if you want
    Code:
    char number_of_commas(char *s)
    {
        char count;
    
    
        for (count=0; *s!='\0'; s++)
        {
            if(*s == ',') count++;
        }
    
    
        return count;
    }

  9. #9
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    sscanf() will return the number of objects that it stores. That could be used to your advantage here.

    Click_here: That's the same type of logic he has already. He's looking for something that will not require the counting loop, first.

    Edit:
    I don't believe you can get your final int array[][] to be sized "just right", without making it a dynamic array, or counting it (somehow) beforehand. Barring a great crystal ball.
    Last edited by Adak; 08-07-2012 at 07:12 AM.

  10. #10
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    Correct me if I'm wrong, but isn't strlen() a function that incorporates looping? And you wouldn't think twice about calling that

    I don't think that you should worry about inefficiencies introduced by looping.

  11. #11
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by Click_here View Post
    Correct me if I'm wrong, but isn't strlen() a function that incorporates looping? And you wouldn't think twice about calling that

    I don't think that you should worry about inefficiencies introduced by looping.
    Yes it is. I don't believe it's possible to avoid counting IF he wants a perfectly sized int array[][] - especially with rows having a different number of int's in them.

    Anything you can do to avoid going through a large data set twice, is worth exploring, imo. He hasn't mentioned what the run-time constraints are for this program. You may be spot on, and it's nothing to worry about.

  12. #12
    Registered User
    Join Date
    Aug 2012
    Location
    Utrecht, Netherlands
    Posts
    18
    Thank you guys!

    Yea, the counting of spaces (token separators) is not very expensive, and probably just as efficient as the Java equivalent (StringTokenizer) - i just need to get used to the fact that C doesn't provide me with that as many built in structures as Java .

    I originally didn't think of just counting characters, and considered using the tokenizer twice which seemed inefficient, or at least not the correct way to do it. Using a ST twice in java would be just silly. Which is why I posted the question.

    Got it running properly now. Thanks again. I'm sure you'll see me around here a lot more , as all my research has to move from Java to C... it'll be a painful few weeks for me

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. something like strtok
    By alzar in forum C Programming
    Replies: 21
    Last Post: 01-19-2008, 01:47 AM
  2. strtok
    By jduke44 in forum C Programming
    Replies: 2
    Last Post: 09-28-2005, 08:00 PM
  3. How to use strtok?
    By Turbo02 in forum C++ Programming
    Replies: 10
    Last Post: 04-22-2004, 09:42 AM
  4. strtok again
    By AmazingRando in forum C Programming
    Replies: 2
    Last Post: 12-08-2003, 09:22 AM
  5. help with strtok
    By requiem in forum C Programming
    Replies: 5
    Last Post: 04-29-2003, 04:10 PM

Tags for this Thread