Split sentence into words

This is a discussion on Split sentence into words within the C Programming forums, part of the General Programming Boards category; I want to split a sentence into words so "foo bar" is split into the two parts "foo" and "bar". ...

  1. #1
    Registered User
    Join Date
    Apr 2007
    Location
    Sweden
    Posts
    12

    Split sentence into words

    I want to split a sentence into words so "foo bar" is split into the two parts "foo" and "bar". I cannot use any standard functions like strtok(...) so I have to do this manually...

    I've done many types of implementations of this already, but one and the same bug occurs every time. That is, if I type four words, it works for the first three. If I type three words, the first two works. If I type two words, only the first one works. It's not like it doesn't store the last word correctly either when i'm talking about "not working"... instead it adds an unknown amount of empty chars at the end so it appears correct, but isn't correct.

    For debugging, i've added three dots "..." after each word when printing the values to see if anything was wrong. Here's an example of output:
    Sentence: "hello wide world"
    Word 1: "hello..."
    Word 2: "wide..."
    Word 3: "world
    ..."

    As you can see, the last word for some reason is indeed correct but some kind of junk data is added at the emd.
    I don't understand what i'm doing wrong... I would appreciate some help.

    Code looks like this right now:
    Code:
    char sentence[70];
    char words[3][25];
    int a = 0;
    int i = 0;
    int index = 0;
    while (sentence[a] != '\0') {
          if (sentence[a] == ' ') { words[index][i] = '\0'; i=0; a++; index++; } // SPACE detected
    
          words[index][i++] = sentence[a++];
    }
          words[index][i] = '\0';

  2. #2
    ... kermit's Avatar
    Join Date
    Jan 2003
    Posts
    1,528
    I added to your code so I could compile it, and it worked fine.

    The complete program is hardly longer that what you posted here, so how about posting your complete attempt to see what the problem was.
    Last edited by kermit; 07-05-2010 at 07:21 AM.

  3. #3
    Registered User
    Join Date
    Apr 2007
    Location
    Sweden
    Posts
    12
    Quote Originally Posted by kermit View Post
    I added to your code so I could compile it, and it worked fine.
    Are you sure? It also worked for me, except for that "almost invisible" bug.

    The complete program is hardly longer that what you posted here, so how about posting your complete attempt to see what the problem was.
    Well, people usually prefer the getting a part of the code, so I left out the unrelated stuff. But here's the whole program if it helps.

    Code:
    /*
    Test cases for string splitting code
    */
    #include <stdio.h>
    
    int main(int argc, char *argv[]) {
       char input[100];
       fgets(input, 100, stdin);
    
       char args[3][15];
       int a = 0;
       int i = 0;
       int index = 0;
       while (input[a] != '\0') {
          if (input[a] == ' ') { args[index][i] = '\0'; i=0; a++; index++; } // SPACE detected
    
          args[index][i++] = input[a++];
          printf("%c.",args[index][i-1]);
       }
       args[index][i] = '\0';
    
       printf("\nInput: %s...\n",input);
       printf("Arg 1: %s...\n",args[0]);
       printf("Arg 2: %s...\n",args[1]);
       printf("Arg 3: %s...\n",args[2]);
    
       return 0;
    }
    If you type for example "aaa bbb ccc", then Arg1 and Arg2 will be correct. Arg3 will appear correct but it's not directly followed by the three dots so it's not actually what it appears to be.

  4. #4
    ... kermit's Avatar
    Join Date
    Jan 2003
    Posts
    1,528
    You forgot to deal with the newline, which fgets also reads in if there is room in the buffer. Hence you get something like this:

    Code:
    Input: aaa bbb ccc...
    Arg 1: aaa...
    Arg 2: bbb...
    Arg 3: ccc'\n'
    ...

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    int main(void)
    {
        char input[100];
        char *p;
    
    // Do some error checking on the call to fgets
    
        if (fgets(input, 100, stdin) == NULL) {
            perror("fgets");
            exit(EXIT_FAILURE);
        }
    
    // Check if there is a newline in the buffer.  If there is, get rid of it.
    
        if ((p = strchr(input, '\n')) != NULL) {
            *p = '\0';
        }
    
        char args[3][15];
        int a = 0;
        int i = 0;
        int index = 0;
        while (input[a] != '\0') {
            if (input[a] == ' ') {
                args[index][i] = '\0';
                i = 0;
                a++;
                index++;
            }                       // SPACE detected
    
            args[index][i++] = input[a++];
            printf("%c.", args[index][i - 1]);
        }
        args[index][i] = '\0';
    
        printf("\nInput: %s...\n", input);
        printf("Arg 1: %s...\n", args[0]);
        printf("Arg 2: %s...\n", args[1]);
        printf("Arg 3: %s...\n", args[2]);
    
        return 0;
    }
    Now of course I cheated, because I am allowed to use standard functions. . I am confident that you will be able to adjust your code to deal with the newline without using them though.
    Last edited by kermit; 07-05-2010 at 08:11 AM.

  5. #5
    Registered User
    Join Date
    Apr 2007
    Location
    Sweden
    Posts
    12
    It seems to work now, thanks.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. how to split a sentence to words
    By sivapc in forum C++ Programming
    Replies: 13
    Last Post: 09-28-2009, 02:21 AM
  2. Begginer Problem: Extracting words from sentence
    By barlas in forum C++ Programming
    Replies: 5
    Last Post: 05-04-2006, 04:17 PM
  3. New Theme
    By XSquared in forum A Brief History of Cprogramming.com
    Replies: 160
    Last Post: 04-01-2004, 08:00 PM
  4. Searching for words within a sentence
    By drdroid in forum C++ Programming
    Replies: 4
    Last Post: 02-27-2003, 12:09 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21