Thread: Parsing and Tokens (strtok)

  1. #1
    Registered User
    Join Date
    Sep 2001
    Posts
    12

    Question Parsing and Tokens (strtok)

    I am having difficulty correctly tokeninzing a file that is read in.

    I am taking an Assembler class, but we are to write a scanner program in C. Each terminal needs to be recoginized and printed on a separate line. Terminals can include ( : ) , ; + - * / as well as words. I have been able to tokenize the file using the space or new line (\n) as a delimiter, but cannot figure out how to use the above symbols as the delimiters also. If I add them to the "sep[]" definition, they do not print.

    How can I instruct the program to identify the terminals above, print them and go on to the next part of the file?

    Any assistance or a point in the right direction would be greatly appreciated!

    I have attached a text file that shows the program, the output and the original text file.

    Thanks!

    Sandy

  2. #2
    Registered User Nutshell's Avatar
    Join Date
    Jan 2002
    Posts
    1,020
    Can you tell us the 'exact' text ( original ) version of you problem?

  3. #3
    Registered User
    Join Date
    Sep 2001
    Posts
    12
    Here is the document that I didn't see attached. It shows the C program, the output and the original text file that was read. What I haven't been able to figure out was a way to print the tokens like ( or ) or , etc.

    /* illustrates tokenizing */

    #include <string.h>
    #include <stdio.h>

    FILE *f;
    int main()
    {
    char string[1000], seps[] = " \n,( )";
    char *p;

    f=fopen("Test.dat","r");
    if(!f)
    return 1;

    while(fgets( string, sizeof(string)-1, f) != NULL)
    {

    /* Break into tokens. */
    p = string;
    p = strtok( string, seps ); /* Find first token*/

    while( p != NULL )
    {
    printf("Token: %s\n", p);
    p = strtok( NULL, seps ); /* Find next token*/
    }
    }
    return 0;
    }
    ------------------------------------------------------------
    Output

    Token: PROGRAM
    Token: STATS
    Token: VAR
    Token: SUM
    Token: SUMSQ
    Token: I
    Token: VALUE
    Token: MEAN
    Token: VARIANCE
    Token: :
    Token: INTEGER
    Token: BEGIN
    Token: SUM
    Token: :=
    Token: 0;
    Token: SUMSQ
    Token: :=
    Token: 0;
    Token: FOR
    Token: i
    Token: :=
    Token: 1
    Token: TO
    Token: 100
    Token: DO
    Token: BEGIN
    Token: READ
    Token: VALUE
    Token: ;
    Token: SUM
    Token: :=
    Token: SUM
    Token: +
    Token: VALUE;
    Token: SUMSQ
    Token: :=
    Token: SUMSQ
    Token: +
    Token: VALUE
    Token: *
    Token: VALUE
    Token: END;
    Token: MEAN
    Token: :=
    Token: SUM
    Token: DIV
    Token: 100
    Token: VARIANCE
    Token: :=
    Token: SUMSQ
    Token: DIV
    Token: 100
    Token: -
    Token: MEAN
    Token: *
    Token: MEAN;
    Token: WRITE
    Token: MEAN
    Token: VARIANCE
    ---------------------------------------------------------------
    Actual File

    PROGRAM STATS
    VAR
    SUM, SUMSQ, I, VALUE, MEAN, VARIANCE : INTEGER
    BEGIN
    SUM := 0;
    SUMSQ := 0;
    FOR i := 1 TO 100 DO
    BEGIN
    READ (VALUE);
    SUM := SUM + VALUE;
    SUMSQ := SUMSQ + VALUE * VALUE
    END;
    MEAN := SUM DIV 100
    VARIANCE := SUMSQ DIV 100 - MEAN * MEAN;
    WRITE (MEAN, VARIANCE)

  4. #4
    Registered User Nutshell's Avatar
    Join Date
    Jan 2002
    Posts
    1,020
    so thats the answer for you? it's pretty easy right?

  5. #5
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    You would probably be better off only using a space and newline as the delimter for strtok, then analyze the token that you've taken from the string for ';', '(', etc... If you read a special token then print it, remove it, and continue parsing the token. You'll have to make some special cases of course, such as in :=.

    -Prelude
    My best code is written with the delete key.

  6. #6
    Registered User
    Join Date
    Sep 2001
    Posts
    12

    Parsing & Tokens

    Thanks Prelude. I did end up just using the space and the new line as the delimiter, because the tokenizing deletes the delimiter.

    I have tried the sccanf, but have only been able it to work on one of the "special characters" This is what I have added:

    while(p!=NULL)
    {
    if(sscanf(p, "( %s ",p)) printf("Token: (\n"); //to print (
    printf("Token: %s\n",p); // to print the rest of this token
    p = strtok(NULL, seps); //Find the next token

    This prints the output okay, except the ) and , and ; are printed right next to the word in front of it.

    I haven't been able to get any of the other versions of sscanf to work. I've tried
    sscanf(p,"%s;",p) printf("Token: ;\n");
    but it just prints the ; on every other line. I'm not sure what I'm doing wrong.

    You mentioned analyzing the string. Is there another way to analyze each token?

    Thanks for any help in advance.
    Sandy

  7. #7
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >You mentioned analyzing the string. Is there another way to analyze each token?
    By analyzing I mean look in detail at the token. Loop through it and check for the special characters. If you find them, then double check the token to make sure that they are indeed in the context that you want them, if so you can either print them right away or save them to an array of strings and print that array in order later. Here's a very ugly implementation of that, sorry but I don't have time to come up with something more elegant. It should work though.
    Code:
    static void analyze ( char *a )
    {
      int i = 0, j = 0, k = 0;
      char buf[10][40] = {'\0'};
      for ( i = 0; a[i] != '\0'; i++ )
        if ( a[i] == ',' || a[i] == ';' || a[i] == '(' || a[i] == ')' )
          buf[++j][0] = a[i], j++;
        else
          buf[j][k++] = a[i];
      for ( i = 0; i < 10; i++ )
        if ( strlen ( buf[i] ) > 0 )
          printf ( "Token: %s\n", buf[i] );
    }
    -Prelude
    My best code is written with the delete key.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Need help parsing string (homework)
    By MSF1981 in forum C Programming
    Replies: 4
    Last Post: 04-17-2009, 05:45 AM
  2. parsing command line strings
    By John_L in forum C Programming
    Replies: 15
    Last Post: 05-28-2008, 08:26 AM
  3. String/token parsing
    By Mostly Harmless in forum C++ Programming
    Replies: 3
    Last Post: 03-04-2008, 09:49 AM
  4. added start menu crashes game
    By avgprogamerjoe in forum Game Programming
    Replies: 6
    Last Post: 08-29-2007, 01:30 PM
  5. Parsing for Dummies
    By MisterWonderful in forum C++ Programming
    Replies: 4
    Last Post: 03-08-2004, 05:31 PM