Thread: Tokenizer in C

  1. #1
    Registered User
    Join Date
    Aug 2004
    Posts
    8

    Tokenizer in C

    Hi I just want to know how can I tokenize the string and display the words only bigger then the 3 chars. Eg. My name is Tarik Output is: name Tarik

    char s1[ ] = "this is an example of how to use token";
    char s2[ ] = " ";
    char *p;

    IN HERE how do I use the token only get the words bigger then 3 chars

  2. #2
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    You need to set up a for loop that goes through the first string and checks to see if each character is equal to " ", and keep a count of letters since the last " ". If that number is greater than three, you'd want to call a seperate function that back tracks that many letters and then reads the word into a separate array.

  3. #3
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by Tarik
    Hi I just want to know how can I tokenize the string and display the words only bigger then the 3 chars.
    This is one way.
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main(void)
    {
       char text[] = "this is an example of how to use token";
       char *token = strtok(text, " ");
       while ( token )
       {
          if ( strlen(token) > 3 )
          {
             puts(token);
          }
          token = strtok(NULL, " ");
       }
       return 0;
    }
    
    /* my output
    this
    example
    token
    */
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  4. #4
    Im a Capricorn vsriharsha's Avatar
    Join Date
    Feb 2002
    Posts
    192

    Smile Be More Specific

    Quote Originally Posted by Tarik
    Hi I just want to know how can I tokenize the string and display the words only bigger then the 3 chars. Eg. My name is Tarik Output is: name Tarik

    char s1[ ] = "this is an example of how to use token";
    char s2[ ] = " ";
    char *p;

    IN HERE how do I use the token only get the words bigger then 3 chars
    Are you counting SPACES as characters? In your above example, would the output be...
    1. Display string s1 and not string s2 or
    2. "this example how use token" (only the words >= 3 chars of the entire string)

    ??

    -Harsha
    Help everyone you can

  5. #5
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by Tarik
    Hi I just want to know how can I tokenize the string and display the words only bigger then the 3 chars. Eg. My name is Tarik Output is: name Tarik

    char s1[ ] = "this is an example of how to use token";
    char s2[ ] = " ";
    char *p;

    IN HERE how do I use the token only get the words bigger then 3 chars
    For future reference, read this Announcement, and this Announcement also.

    Quzah.
    Hope is the first step on the road to disappointment.

  6. #6
    Registered User
    Join Date
    Aug 2004
    Posts
    8
    Whats wrong with this version. Why I am having segmentation fault?
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main(void)
    {
       char *text = malloc ( 21 *sizeof( char ));    /*<--- How can I use this properly */
       text =  "this is an example of how to use token";
       char *token = strtok(text, " ");
                         printf("\t\t Stopped String: ");
       while ( token )
       {
          if ( strlen(token) > 3 )
          {
                      printf(" %s", token);
    	 /*puts(token);*/
          }
          token = strtok(NULL, " ");
    
       }
             printf("\n");
       return 0;
    }

  7. #7
    Registered User linuxdude's Avatar
    Join Date
    Mar 2003
    Location
    Louisiana
    Posts
    926
    1. You didn't include stdlib.h
    2. You didn't check the return value of malloc
    3. You didn't free the memory you allocated
    4. Did you ever read the man page for strtok
    Never use these functions. If you do, note that:
    These functions modify their first argument.
    These functions cannot be used on constant strings.
    The identity of the delimiting character is lost.
    The strtok() function uses a static buffer while parsing, so
    it's not thread safe. Use strtok_r() if this matters to you.
    5. size(char) is 1 always
    6. 21 is way less characters than you try to put in the text memory block. YOu have 38 characters(or somewhere near it(I lost count)) So you need to malloc 38 chars.

  8. #8
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
    char *text = malloc ( 21 *sizeof( char ));    /*<--- How can I use this properly */
       text =  "this is an example of how to use token";
    In the first line, you malloc 21 characters.
    In the second line, you make the pointer point to a string literal, and as such, you've just lost whatever you allocated.
    You get a segfault because you're trying to modify your string literal.

    Try using something like strcpy to copy text into your allocated space. Also, be sure to free your allocated memory when you're done.

    [edit]Curses, foiled again.[/edit]

    Quzah.
    Hope is the first step on the road to disappointment.

  9. #9
    Registered User
    Join Date
    Aug 2004
    Posts
    8
    Can you show me how could u fix that up please? I am quite new in C.
    Can we still use the string stored in this line.
    char *text = malloc ( 21 *sizeof( char ));

  10. #10
    Registered User
    Join Date
    Aug 2004
    Posts
    8
    I did these changes Its not giving me segmentation fault but its not working too

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    int main(void)
    {
       char *text = malloc ( 40 *sizeof( char ));
       char mytext[] =  "this is an example of how to use token";
      strcpy( mytext, text);
      
     /* char text[]="this is an example of how to use token";*/
     
       char *token = strtok(text, " ");
                         printf("\t\t Stopped String: ");
       while ( token )
       {
          if ( strlen(token) > 3 )
          {
                      printf(" %s", token);
    	 /*puts(token);*/
          }
          token = strtok(NULL, " ");
    
       }
             printf("\n");
       return 0;
       free(text);
    }
    Also its complaining about this line --> char *token = strtok(text, " ");
    warning: ISO C90 forbids mixed declarations and code

  11. #11
    ... kermit's Avatar
    Join Date
    Jan 2003
    Posts
    1,534
    dude - you need to re-read the above posts and work out some of the issues already pointed out... unless you have no intention of becoming a better C programmer that is.

  12. #12
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    You have to put all of your variable definitions at the beginning of a code block (or outside of all functions). You're violating this by calling strcpy() and then defining another char * below it.

    Forget malloc() altogether...you don't need it for this. Drop malloc() and leave your mytext declaration...it's perfect.

    See if you can get it from there.
    If you understand what you're doing, you're not learning anything.

  13. #13
    Registered User linuxdude's Avatar
    Join Date
    Mar 2003
    Location
    Louisiana
    Posts
    926
    declarations go at the beginning of blocks ex:main I can see you are trying. This is what you want right? Does this help you?
    P.S. I prefer you not use strtok though for reasons I stated earlier
    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    int main (void){
      char mytext[] =  "this is an example of how to use token";
      char *ptr=malloc(40),*token;
      if(!ptr){
        printf("Not Enough Memory\n");
        return EXIT_FAILURE;
      }
      strcpy(ptr,mytext);
      token=strtok(ptr," ");
      printf("Stopped String:  \n");
      while(token){
            printf("%s\n",token);
            token=strtok(NULL," ");
      }
      return 0;
      free (ptr);
    }

  14. #14
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > return 0;
    > free (ptr);
    Yeah, that'll work
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  15. #15
    Registered User
    Join Date
    Jun 2004
    Posts
    201
    Code:
    char *text = malloc ( 21 *sizeof( char ));
    text =  "this is an example of how to use token";
    Every C programmer has to deal with this issue some day.

    The problem is that if you do this:
    char *s = "hello hello";
    s[0] = 'b';
    You have a bug in your program, undefined behaviour. On some systems/compiler this works but on others this crashes

    Your program crashes because strtok writes to the string when you call it.

    char *s = strtok(s, " ");
    strtok wants to place a '\0' character on every space.

    You can fix this in a few ways:
    Code:
    char *text = malloc ( 128 *sizeof( char ));
    if ( text )
    strcpy(text, "this is an example of how to use token");
    Here you malloc space and then COPY the string into it.

    or you can do

    char text[] = "this is an example of how to use token"

    Here the compilers generates a big enough array that is modifiable.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Scanner? Lexical analyzer? Tokenizer?
    By audinue in forum A Brief History of Cprogramming.com
    Replies: 8
    Last Post: 12-23-2008, 11:32 PM
  2. string tokenizer
    By mbooka in forum C Programming
    Replies: 4
    Last Post: 02-15-2006, 06:00 PM
  3. C++ String Tokenizer
    By Annorax in forum Game Programming
    Replies: 10
    Last Post: 07-13-2005, 10:41 AM
  4. Tokenizer
    By PJYelton in forum C++ Programming
    Replies: 2
    Last Post: 01-29-2003, 03:01 PM
  5. deriving ifstream class for tokenizer
    By djh000 in forum C++ Programming
    Replies: 0
    Last Post: 09-23-2001, 02:37 AM