Thread: Problems with String Tokens

  1. #1
    Registered User Paul Adams's Avatar
    Join Date
    Feb 2012
    Posts
    9

    Problems with String Tokens

    I am having an issue with the strtok in my program. It doesn't seem to be matching the delimiters correctly.

    My program is supposed to count the number of words (working) and number of sentences in a text file. My text file has 271 words and 5 sentences. A sentence is defined as a single period '.' in the file. A word is simply ended followed by a blank space ' ' in the file.

    My text file has 5 instances of '...' yes three periods in a row. Even if I count each '...' as 3 I should only have a count of 20 sentences in my file. The program keeps showing 35 sentences.

    As you can see in my code I played with the idea of strstr to find '...' and reduce my count by 3 each time, but this didn't work out either.

    Here is my code:

    Code:
    #include <stdio.h>#include <string.h>
     
    int main(int argc, char* argv[])
    {
        int words = 0; 
        int    sentences = 0;
        char str[200];
        char * test;
        FILE *fp;
        
        //Read in File
        fp = fopen("test.txt", "r");
        if(!fp) return 1; // ends program if file is not found
        while(fgets(str,sizeof(str),fp) != NULL){
            
            // gets rid of the trailing '\n' (if it is there)
            int len = strlen(str)-1;
            if(str[len] == '\n')
                str[len] = 0;
    
    
            // Split string into tokens (found on cplusplus.com website)
            // used to count number of words
            test = strtok (str," .");
            while (test != NULL) {
                test = strtok (NULL, " ");
                ++words;
            }
            test = strtok (str,".");
            while (test != NULL) {
                test = strtok (NULL, ".");// still need to skip '...' ???
                ++sentences;
                if(test == "..."){
                sentences = sentences - 3;
                }
            }
            //test = strstr (str,"...");
            //while (test != NULL) {
            //    test = strstr (str,"...");
            //    sentences = sentences - 3;
            //}
        }
    
    
        //Print out Results
            printf("Results:\n");
            printf("\n\nNumber of Words: %d\n", words);
            printf("Number of Sentences: %d\n", sentences);
    
    
        fclose(fp);
        getchar();
    }
    Thanks for any help on this!!

  2. #2
    Registered User
    Join Date
    Nov 2011
    Posts
    5
    Your issue is here:

    Code:
    if(test == "...")
    can't use the == comparison operator on strings...

  3. #3
    Registered User Paul Adams's Avatar
    Join Date
    Feb 2012
    Posts
    9
    That part should have been commented out, sorry about that. I am tinkering with it to see if I can get it to work properly.

    Even with that commented out, I still get 35 as my sentence count, even though I only have 20 periods in my text file.

    Any other ideas on how I can approach this?

    Thanks again!

  4. #4
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Could you provide us the input file you're using? I think you're getting a little overzealous with the use of strtok. It modifies the original string, replacing the delimiters with null characters. Also, if you have more than one delimiter in a row, it's treated as a single delimiter, thus "..." acts the same as ".". Not just that, once you strtok a buffer, calling strtok(NULL, ...) will move you farther down the string or return a NULL pointer.

    Perhaps you would be better off if you simply iterated through the string:
    Code:
    for (i = 0; i < len; i++)
        if str[i] is '.'
            increment sentence
        increment count

  5. #5
    Registered User Paul Adams's Avatar
    Join Date
    Feb 2012
    Posts
    9
    This is a sample of the text file:

    Friends, Romans, countrymen, lend me your ears.
    I come to bury Caesar, not to praise him.
    The evil that men do lives after them.
    The good is oft interred with their bones.
    So let it be with Caesar … The noble Brutus.


    I see what you are saying about how the original string is modified. I may take another approach to performing the search for strings. But I am still working on it.

    Your approach is the simpler one, but I've already done that earlier in my C course, so I was trying to push my self and learn another way of doing this. We haven't discussed the strtok in class, so all I know is what I'm reading online.

    Thanks for the suggestion though, I will give it a try.

  6. #6
    Registered User
    Join Date
    Mar 2011
    Posts
    546
    you are not counting any of the periods in your file. you have a couple of problems:

    1. strtok modifies the string (it inserts 0 in place of the delimiters) so after your first strtok(str,' .') you have butchered the string, so the next strtok(str,'.') won't work right. if you want to tokenize a string twice, you need to make a copy for the second pass. so what happens here is that your second pass of strtok always counts 1 sentence, the first word in the modified string, then it returns null.

    2. your loop on strtok counts an extra word and sentence because you are counting the final 'null' returned at the end. you need to increment first, then do test = strtok( so your loop quits without a count when it gets a null.

    suggestion : print each token as you get it from strtok (with some markers to show which strtok it came from). this will show you what you are really doing.

    all that said, as anduril points out, a better approach to this type of parsing is to do it one character at a time and use a simple state machine to find your delimiters. but only if you know how to structure a state machine (aka finite state automaton)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Breaking up a string into tokens
    By Cynic in forum C Programming
    Replies: 2
    Last Post: 03-11-2012, 04:26 PM
  2. Split a String into Tokens
    By Coding in forum C++ Programming
    Replies: 68
    Last Post: 12-20-2007, 02:51 PM
  3. string tokens
    By kristy in forum C Programming
    Replies: 3
    Last Post: 03-14-2004, 10:54 AM
  4. string tokens
    By Unregistered in forum C Programming
    Replies: 3
    Last Post: 11-16-2001, 12:30 PM
  5. Splitting a string into tokens
    By unregistered in forum C Programming
    Replies: 4
    Last Post: 11-06-2001, 02:50 PM