Thread: Storing tokenized strings into different variables

  1. #1
    Registered User spendotw's Avatar
    Join Date
    Dec 2011
    Location
    England
    Posts
    40

    Question Storing tokenized strings into different variables

    Hi there guys, i'm relatively new to C and have a bit of a dilemma.

    So far in my program i have used strtok() to tokenize the buffer of which is used to process data within a simple input file. The input file has a basic layout of such:

    Processid[space]Qaunta[space]Priority[newline]:
    Process0 56 1
    Process1 189 23
    Process 2 1 942

    Once I have extracted these strings i can go on to processing them in my process scheduler. Oh and another note, I am only using core system functions of the C standard, so please no fputs, fgets, fopen etc...

    Many thanks for any available advice

    Code:
        {
            for ( i = 0; i < status; i++)
            {
                if (buffer[i] == '\n')
                {
                memcpy(tmpbuff, buffer,BUFFSIZE);
                 }
            }
        } else { eof = 1;
            }
    /* Tokenize 'process string' attributes */
        procstr = strtok(tmpbuff,space);
    while ( procstr != NULL ) {
        proc[i].id = procstr;    
        procstr = strtok( NULL, space);
        printf("%s\n", proc[i].id);
        }

  2. #2
    Registered User
    Join Date
    Jan 2009
    Location
    Australia
    Posts
    375
    Please make your intentions clearer. You haven't said there is anything wrong, and on first glance there doesn't appear to be anything wrong. If there is a problem, please post the expected output, the actual output, and/or any messages that your compiler is giving you.

    I find it difficult to believe that file input/output is not a core system function whereas string manipulation is.

  3. #3
    Registered User spendotw's Avatar
    Join Date
    Dec 2011
    Location
    England
    Posts
    40
    Thank you for your reply. Basically what is wrong here is that I need to extract the data from each line separately into separate variables such as proc[i].id to store all the "Process0" strings and proc[i].qaunta to store the Qaunta values and proc[i].Priority to store all the priority values. But as you may have seen at this point in my program, the program stores the whole contents of the line proc[i].id. And I am finding it difficult on how to store Process,Qaunta and Priority into each variables rather than storing it all in proc[i].id. If you could give me any advice on a suitable approach toward identifying each element and how to store them in some sort of sequential order I would very much appreciate it.

    Heres my output of the program:
    Process834
    35
    14
    Process835
    74
    12
    Process836
    76
    9
    Process837
    175
    10
    Process838
    186
    9
    Process839
    82
    14
    Process840
    49
    2

    This is only the end of the output of my program the output is extremely long and continuously the similar.
    What i want my output to show is the same but from three different variables rather than all from proc[i].id,
    so the use of proc[i].Qaunta = procstr and proc[i].Priority = procstr being used at a suitable point in the program using suitable mechanisms (a loop of some sort maybe?)

    sorry for the confusion, yes all my syntax and system functions are core. I mean to say that any advice telling me to use any unformatted functions wont be relevant

    Thank you again for any advice and what you have given

    So in short I need a way to extract the three elements from the input file, store them in their own variables

  4. #4
    Registered User
    Join Date
    Jan 2009
    Location
    Australia
    Posts
    375
    Why don't you just do exactly what you're doing right now, except add two more member variables to the structure 'proc' and then store the other two parts in those variables?

  5. #5
    Registered User spendotw's Avatar
    Join Date
    Dec 2011
    Location
    England
    Posts
    40

    Question

    This is my exact problem.... I have attempted to do this, but cannot seem to completely extract the data separately. I am unsure whether I am using the strtok function correctly, and starting believe that I have muddled it up abit in my code.
    Heres what i have added to the code:
    Code:
    if (status > 0)
        {
            for ( i = 0; i < status; i++)
            {
                if (buffer[i] == '\n')
                {
                memcpy(tmpbuff, buffer,BUFFSIZE);
                 }
            }
        } else { eof = 1;
            }
    /* Tokenize 'process string' attributes */
        procstr = strtok(tmpbuff,space);
    while ( procstr != NULL ) {
        proc[i].id = procstr;    
        procstr = strtok( NULL, space);
        printf("%s\n", proc[i].id);
        proc[i].qaunta = procstr;
        procstr = strtok(NULL , nl); /
        printf("%s\n", proc[i].qaunta); 
        proc[i].priority = procstr;
        procstr = strtok( NULL , nl); 
        }
    }
    close(fd); 
    }
    and im getting this output:
    Process29
    149
    89
    14
    Process31
    17
    Process32
    89
    177
    16
    Process34
    14
    Process35
    61
    36
    5
    Process37
    1
    Proces
    Segmentation fault

    As you suggested I already added the two members to the structure proc. But im finding it difficult to store the correct data in the those members appropriately.

    Thanks DeadPlanet
    Attached Files Attached Files

  6. #6
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    In your code space is defined as ... char space[] = " "; ... that is not a single character, it's a 2 byte string consisting of a space and the trailing 0... if that doesn't exist in your string strtok will fail.

    Try this...
    Code:
    /* Tokenize 'process string' attributes */
        procstr = strtok(tmpbuff,' ');  <--- note single quotes!
    while ( procstr != NULL ) {
        proc[i].id = procstr;    
        procstr = strtok( NULL, ' ');
        printf("%s\n", proc[i].id);
        proc[i].qaunta = procstr;
        procstr = strtok(NULL , '\n'); /
        printf("%s\n", proc[i].qaunta); 
        proc[i].priority = procstr;
        procstr = strtok( NULL , '\n'); 
        }
    Hint: Clever tricks like defining space or newline as strings instead of just typing 3 characters, very seldom turn out to be even half as clever as we think they are.

  7. #7
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Reread your docs Tater
    Quote Originally Posted by man strtok
    SYNOPSIS

    #include <string.h>
    char *strtok(char *s1, const char *s2);
    Note that s2 is a char *, meaning it needs double quotes " ". strtok will not split on a null character because of a null terminator in s2. Still, defining a variable called space that is a string of just a space character is ridiculous. If you do have a number of delimiters, it's common to do something like:
    Code:
    char *delims = " ,;.()";

  8. #8
    Registered User spendotw's Avatar
    Join Date
    Dec 2011
    Location
    England
    Posts
    40
    Quote Originally Posted by CommonTater View Post
    In your code space is defined as ... char space[] = " "; ... that is not a single character, it's a 2 byte string consisting of a space and the trailing 0... if that doesn't exist in your string strtok will fail.

    Try this...
    Code:
    /* Tokenize 'process string' attributes */
        procstr = strtok(tmpbuff,' ');  <--- note single quotes!
    while ( procstr != NULL ) {
        proc[i].id = procstr;    
        procstr = strtok( NULL, ' ');
        printf("%s\n", proc[i].id);
        proc[i].qaunta = procstr;
        procstr = strtok(NULL , '\n'); /
        printf("%s\n", proc[i].qaunta); 
        proc[i].priority = procstr;
        procstr = strtok( NULL , '\n'); 
        }
    Hint: Clever tricks like defining space or newline as strings instead of just typing 3 characters, very seldom turn out to be even half as clever as we think they are.
    Okay so I tried applying the changes you suggested but i recieve these error messages from the compiler:

    sched.c:78:2: warning: passing argument 2 of ‘strtok’ makes pointer from integer without a cast [enabled by default]
    /usr/include/string.h:348:14: note: expected ‘const char * __restrict__’ but argument is of type ‘int’

  9. #9
    Registered User spendotw's Avatar
    Join Date
    Dec 2011
    Location
    England
    Posts
    40

    Question

    CommonTater & anduril462 before you posted your replies I compiled the following program. Which did extracted the data into the three members, but there was a seg fault, which im guessing is something to do with my buffer right?

    Code:
    #include <stdio.h> 
    #include <stdlib.h>
    #include <sys/types.h> 
    #include <sys/stat.h> 
    #include <fcntl.h> 
    #include <string.h> /* strtok() */
    #include <unistd.h>  
    #define BUFFSIZE 200
    #define MAXPROCS 1000
    
    struct process {
    	char *id; 
    	int state; 
    	char *priority;
    	char *qaunta;
    	int working; 
    	int waiting;
    	struct process *next;
    };
    
    main (int argc, char **argv)
    { 
    int fd, i;
    int status; 
    char buffer[BUFFSIZE];
    char tmpbuff[BUFFSIZE]; /* Temporary buffer to store read lines */
    char space[] = " "; 	 /* Space delimeter */
    char nl[] = "\n";
    char *procstr = NULL; /* String to store processes */
    int eof = 0;
    int ticks; /* Number of ticks for process */
    int ticktime; 
    
    status = 99; 
    
    fd = open(argv[1], O_RDONLY);
    
    	if ( fd == -1 ) 
    		{
    		printf("There was an error opening the file \n"); 
    		}
    while (!eof)
    {
    	status = read(fd, buffer,sizeof(buffer));
    	
    	if (status == -1)
    	
    	printf("There was an error reading the file \n");
    	
    if (status > 0)
    	{
    		for ( i = 0; i < status; i++)
    		{
    			if (buffer[i] == '\n')
    			{
    			memcpy(tmpbuff, buffer,BUFFSIZE);
    	 		}
    		}
    	} else { eof = 1;
    		}
    /* Tokenize 'process string' attributes */
    	procstr = strtok(tmpbuff,' ');
    while ( procstr != NULL ) {
    	proc[i].id = procstr;	
    	procstr = strtok( NULL, ' ');
    	printf("%s\n", proc[i].id);
    	
    	proc[i].qaunta = procstr;
    	procstr = strtok(NULL , '  ');
    	printf("%s\n", proc[i].qaunta); 
    	
    	proc[i].priority = procstr;
    	procstr = strtok( NULL ,'\n'); 
    	printf("%s\n", proc[i].priority); 
    	}
    	
    }
    close(fd); 
    }
    Heres my output:
    Process0
    7
    10
    Process1
    194 13
    Process2
    180
    2
    Process3
    47
    19
    Process4
    179
    0
    Process5
    174
    16
    Process6
    171
    14
    Process7
    113
    11
    Process8
    77
    14
    Process9
    15
    6
    Process10
    136
    20
    Process11
    147
    4
    Process12
    50
    5
    Segmentation fault

    Bare in mind there are hundreds of processes that i need to process from each file. Is it possible for me to carry on declaring my two delimeters as they are?

    Thanks for any advice

  10. #10
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Code:
    $ make sched
    gcc -Wall -g  -lm  sched.c   -o sched
    sched.c:22: warning: return type defaults to ‘int’
    sched.c: In function ‘main’:
    sched.c:62: warning: passing argument 2 of ‘strtok’ makes pointer from integer without a cast
    /usr/include/string.h:346: note: expected ‘const char * __restrict__’ but argument is of type ‘int’
    sched.c:64: error: ‘proc’ undeclared (first use in this function)
    sched.c:64: error: (Each undeclared identifier is reported only once
    sched.c:64: error: for each function it appears in.)
    sched.c:65: warning: passing argument 2 of ‘strtok’ makes pointer from integer without a cast
    /usr/include/string.h:346: note: expected ‘const char * __restrict__’ but argument is of type ‘int’
    sched.c:69:28: warning: multi-character character constant
    sched.c:69: warning: passing argument 2 of ‘strtok’ makes pointer from integer without a cast
    /usr/include/string.h:346: note: expected ‘const char * __restrict__’ but argument is of type ‘int’
    sched.c:73: warning: passing argument 2 of ‘strtok’ makes pointer from integer without a cast
    /usr/include/string.h:346: note: expected ‘const char * __restrict__’ but argument is of type ‘int’
    sched.c:32: warning: unused variable ‘ticktime’
    sched.c:31: warning: unused variable ‘ticks’
    sched.c:28: warning: unused variable ‘nl’
    sched.c:27: warning: unused variable ‘space’
    You have some bugs and warnings to clean up. As I said in my post, Tater made a little mistake and you do need double " " quotes for the second argument to strtok. Also, you should be explicit when declaring main: int main(int argc, char **argv), and return an int at the end, usually 0 for success.

    You have a few major problems. First, read will read up to 200 bytes, if it's available, regardless of new lines. That means you may get a partial line (almost certainly will with a big input file), which will really screw up your program.

    As it happens, fopen and fgets are "core functions of the C standard", as is strtok. open and read, however, are not standard C. They are specific to *nix systems. Why can't you use fopen and fgets, when you can use strtok?

    You debug output should be a little better. Put a prefix before you print out the result of strtok: printf("id: %s\n", proc[i].id); It will help show you how you're improperly tokenizing your buffer. Your buffer after your call to read contains something like "Process0 7 10\nProcess1 194 13\nProcess2 180 2...". Notice that the first call to strtok will correctly split off Process0 (space after it), the second call will correctly split off 10 (space after it), but the 3rd call will actually split off "10\nProcess2". You have 3 pieces of data, so you need 3 calls to strtok in the loop (which you have), but not the one outside the loop. Something more like:
    Code:
    while (read one line into tmpbuf)  // fgets would work great here, otherwise it will be a royal pain using read
        ptr = strtok(tmpbuf, " ");
        id = allocate mem and copy contents of ptr -- look into strdup
        ptr = strtok(NULL, " ");
        quanta = strdup(ptr);
        ...
    You need to allocate memory and make a copy yourself, since strtok doesn't make a copy of the token for you. It just returns a pointer to somewhere inside tmpbuf, which means that with a large dataset (i.e. filling up tmpbuff more than once) will result in garbled data for proc[0], proc[1], etc.

  11. #11
    Registered User spendotw's Avatar
    Join Date
    Dec 2011
    Location
    England
    Posts
    40

    Question

    Thanks for your reply anduril462, it was very informative and useful...

    Though the reason for me using open, read, close rather than fopen, fread and fclose is because this is how i am required to practice programming whilst at university. So i am left with only using open, read and close.

    I had look at strdup, which will be of use in my program even though its still leaving me with seg faults.

    I added the prefix's like you said and it did exactly what you said it would. But i dont quite understand why the second line "Process1 194 30" doesn't store "Process1" in any variable. Which leads the rest of the program to incorrectly store each data element in the wrong variable.

    Code:
    while ( tmpbuff[i] = status ) {
    	procstr = strtok (tmpbuff, " ");
    	proc[i].id = strdup(procstr);
    	procstr = strtok( NULL, " ");
    	printf("id: %s\n", proc[i].id);
    	
    proc[i].qaunta = strdup(procstr);
    	procstr = strtok (NULL, " ");
    	printf("Qaunta: %s\n", proc[i].qaunta); 
    	
    proc[i].priority = strdup(procstr);
    	procstr = strtok (NULL, "\n");
    	printf("Priority: %s\n", proc[i].priority); 
    	}
    Okay so this is how i tried to impliment what you suggested and yes it compiles fine, but qaunta and priority have nulls in them. And it is nothing but a infinite loop.

    Thanks for your time, sorry to be a pain.

  12. #12
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Do you really mean tmpbuff[i] = status or do you mean tmpbuff[i] == status, either way I'm not sure what's up with that. Could you post your whole program?

  13. #13
    Registered User spendotw's Avatar
    Join Date
    Dec 2011
    Location
    England
    Posts
    40
    Quote Originally Posted by oogabooga View Post
    Do you really mean tmpbuff[i] = status or do you mean tmpbuff[i] == status, either way I'm not sure what's up with that. Could you post your whole program?
    Code:
     #include <stdio.h> 
    #include <stdlib.h>
    #include <sys/types.h> 
    #include <sys/stat.h> 
    #include <fcntl.h> 
    #include <string.h> /* strtok() */
    #include <unistd.h>  
    #define BUFFSIZE 200
    #define MAXPROCS 1000
    
    struct process {
        char *id; 
        int state; 
        char *priority;
        char *qaunta;
        int working; 
        int waiting;
        struct process *next;
    };
    
    int main (int argc, char **argv)
    { 
    int fd, i;
    int status; 
     char buffer[BUFFSIZE];
    char tmpbuff[BUFFSIZE]; /* Temporary buffer to store read lines */
    char *procstr = NULL; /* String to store processes */
    char *delim = " \n"; /* Added delimeter variable */ 
    int eof = 0; 
    
    status = 99; 
    
    fd = open(argv[1], O_RDONLY);
    
        if ( fd == -1 ) 
            {
            printf("There was an error opening the file \n"); 
            }
    while (!eof)
    {
        status = read(fd, buffer,sizeof(buffer));
        
        if (status == -1)
        
        printf("There was an error reading the file \n");
        
    if (status > 0)
        {
            for ( i = 0; i < status; i++)
            {
                if (buffer[i] == '\n')
                {
                memcpy(tmpbuff, buffer,BUFFSIZE);
                 }
            }
        
        /* Tokenize 'process string' attributes */
        
    procstr = strtok(tmpbuff, delim);
    while ( procstr != NULL ) {
    proc[i].id = strdup(procstr); 
        procstr = strtok( NULL,delim);
        printf("id: %s\n", proc[i].id);
        
    proc[i].qaunta = strdup(procstr);
        procstr = strtok( NULL ,delim);
        printf("Qaunta: %s\n", proc[i].qaunta); 
        
        proc[i].priority = strdup(procstr);
    procstr = strtok( NULL ,delim); 
        printf("Priority: %s\n", proc[i].priority); 
            } }
        }
        else { eof = 1; }
    
        
    }
    close(fd);
    return 0; 
    }
    Thanks for your reply oogabooga and what i was meanin to do was to read one line into tmpbuff, as suggested previously...

    As this wasn't really working for me i changed it back, and added a delimeter variable which seemed to successfully seperate the variables from the input file. But only to a certain extent which is i am faced with a segmentation fault as such:
    id: Process0
    Qaunta: 7
    Priority: 10
    id: Process1
    Qaunta: 194
    Priority: 13
    id: Process2
    Qaunta: 180
    Priority: 2
    id: Process3
    Qaunta: 47
    Priority: 19
    ............... Continuous output
    ...............
    id: Process23
    Qaunta: 136
    Priority: 17
    id: Process24
    Qaunta: 24
    Priority: 8
    id: Process Until here <<<<
    Segmentation fault
    Last edited by spendotw; 12-31-2011 at 11:47 AM. Reason: Program re-revised

  14. #14
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    But this can't be your whole program. There are undeclared variables (delim, proc). I can't even compile it!

    BTW, do you really have to do this with strtok? It seems unusual. Also, do you really want to store your priority and quanta as strings or would you rather store them as ints?

  15. #15
    Registered User spendotw's Avatar
    Join Date
    Dec 2011
    Location
    England
    Posts
    40
    Code:
     #include <stdio.h> 
    #include <stdlib.h>
    #include <sys/types.h> 
    #include <sys/stat.h> 
    #include <fcntl.h> 
    #include <string.h> /* strtok() */
    #include <unistd.h>  
    #define BUFFSIZE 400
    #define MAXPROCS 1000
    
    struct process {
    	char *id; 
    	int state; 
    	char *priority;
    	char *qaunta;
    	int working; 
    	int waiting;
    	struct process *next;
    };
    struct process proc[MAXPROCS]; /* Current process */
    struct process* nextInQ = &proc[0];
    struct process *nextInLine = NULL;
    struct process *lastInLine = NULL;
    
    struct CPU { 
    	struct process *onCPU; 
    	int qauntum;
    }; 
    struct CPU CPU; 
    
    /* Decleration of function prototypes */ 
    void initproc();
    void checkForQJoin();
    void joinReadyQ();
    void checkDone();
    
    int main (int argc, char **argv)
    { 
    int fd, i;
    int status; 
    char buffer[BUFFSIZE];
    char tmpbuff[BUFFSIZE]; /* Temporary buffer to store read lines */
    char *procstr = NULL; /* String to store processes */
    char *delim = " \n";
    int eof = 0;
    int ticks; /* Number of ticks for process */
    int ticktime; 
    
    status = 99; 
    
    fd = open(argv[1], O_RDONLY);
    
    	if ( fd == -1 ) 
    		{
    		printf("There was an error opening the file \n"); 
    		}
    while (!eof)
    {
    	status = read(fd, buffer,sizeof(buffer));
    	
    	if (status == -1)
    	
    	printf("There was an error reading the file \n");
    	
    if (status > 0)
    	{
    		for ( i = 0; i < status; i++)
    		{
    			if (buffer[i] == '\n')
    			{
    			memcpy(tmpbuff, buffer,BUFFSIZE);
    	 		}
    		}
    	
     /* Tokenize 'process string' attributes */
    	
    	procstr = strtok(tmpbuff, delim);
    while ( procstr != NULL ) {
    	proc[i].id = strdup(procstr);
     procstr = strtok( NULL,delim);
    	printf("id: %s\n", proc[i].id);
    	
     proc[i].qaunta = strdup(procstr);
    	procstr = strtok( NULL ,delim);
    	printf("Qaunta: %s\n", proc[i].qaunta); 
    	
    	proc[i].priority = strdup(procstr);
     procstr = strtok( NULL ,delim); 
    	printf("Priority: %s\n", proc[i].priority); 
    		} 
    	}
    	else { eof = 1; }
    }
    close(fd);
    return 0; 
    }
    This program does have delim but i accidentially remove proc.

    Yes i do want to do this with strtok and i have been advised by my lecturer that this function should be used.
    How else would i be able to seperate the data attributes in the file and process them.

    I would prefably like to store them as ints but when i declared the variable proc[i].id as a int and tried to assign the value of whats in the buffer i was having problems.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help: Storing tokenized 2-digit char (infix->postfix converter)
    By misterpogos in forum C++ Programming
    Replies: 3
    Last Post: 09-26-2011, 11:19 AM
  2. Storing Changing Variables in Array
    By Tien1868 in forum C Programming
    Replies: 8
    Last Post: 07-31-2009, 11:55 AM
  3. Storing classes in variables?
    By jw232 in forum C++ Programming
    Replies: 10
    Last Post: 02-19-2009, 06:34 PM
  4. storing variables permanentely
    By Saimadhav in forum C++ Programming
    Replies: 8
    Last Post: 08-09-2008, 09:15 PM
  5. storing alot of variables...
    By MikeyIckey in forum C Programming
    Replies: 11
    Last Post: 05-30-2008, 12:31 PM

Tags for this Thread