Storing tokenized strings into different variables

**~~CommonTater~~** · 12-31-2011

Yes i do want to do this with strtok and i have been advised by my lecturer that this function should be used.
How else would i be able to seperate the data attributes in the file and process them.

You might find that sscanf() is more helpful here than strtok...

sscanf - C++ Reference

**oogabooga** · 12-31-2011

As tater said, it's probably more common in this case to use sscanf. But you can stick with strtok if you wish.

You weren't updating your i variable, so I changed your outer loop to add i and simplified the loop by removing the eof variable. You don't seem to need the tmpbuff so I got rid of it.

You can turn the strings read with strtok into integers by using atoi as below.

But the code still reads one more time than it should. I'm not sure why.

Code:

 #include <stdio.h> 
#include <stdlib.h>
#include <sys/types.h> 
#include <sys/stat.h> 
#include <fcntl.h> 
#include <string.h> /* strtok() */
#include <unistd.h>  
#define BUFFSIZE 400
#define MAXPROCS 1000

struct process {
    char *id; 
    int state; 
    int priority;
    int qaunta;
    int working; 
    int waiting;
    struct process *next;
};
struct process proc[MAXPROCS]; /* Current process */

int main (int argc, char **argv)
{ 
    int fd, i;
    char buffer[BUFFSIZE];
    char *procstr = NULL;
    char *delim = " \n";
    int ticks; /* Number of ticks for process */
    int ticktime; 
    int status;

    fd = open(argv[1], O_RDONLY);
    if ( fd == -1 ) 
    {
        printf("There was an error opening the file \n"); 
        exit(1);
    }

    for (i = 0; i < MAXPROCS; i++) /* quit loop if more than MAXPROCS */
    {
        status = read(fd, buffer, sizeof(buffer));
        if (status <= 0)
            break; /* quit loop if EOF or other error */
    
        /* Tokenize 'process string' attributes */
        procstr = strtok(buffer, delim);
        while ( procstr != NULL )
        {
            proc[i].id = strdup(procstr);
            printf("id: %s\n", proc[i].id);

            procstr = strtok( NULL,delim);
            proc[i].qaunta = atoi(procstr);
            printf("Qaunta: %d\n", proc[i].qaunta); 

            procstr = strtok( NULL ,delim);
            proc[i].priority = atoi(procstr);
            printf("Priority: %d\n", proc[i].priority); 

            procstr = strtok( NULL ,delim); 
        }
    }
    close(fd);
    return 0; 
}

**oogabooga** · 12-31-2011

Forget the above code, it doesn't count processes properly. See the code below instead.

Also, I figured out what was wrong with the operation of the strtok function. The buffer is not null-terminated after the read, so you must do so using the return value to fix it.

Also note that I've fixed your spacing, which is very important.

The variable named i below should probably be changed to numProcs or some such.

Code:

#include <stdio.h> 
#include <stdlib.h>
#include <sys/types.h> 
#include <sys/stat.h> 
#include <fcntl.h> 
#include <string.h> /* strtok() */
#include <unistd.h>  
#define BUFFSIZE 400
#define MAXPROCS 1000

struct process {
    char *id; 
    int state; 
    int priority;
    int qaunta;
    int working; 
    int waiting;
    struct process *next;
};
struct process proc[MAXPROCS]; /* Current process */

int main (int argc, char **argv)
{ 
    int fd, i, j;
    char buffer[BUFFSIZE+1]; /* extra byte for null */
    char *procstr = NULL;
    char *delim = " \n";
    int ticks; /* Number of ticks for process */
    int ticktime; 
    int status;

    fd = open(argv[1], O_RDONLY);
    if ( fd == -1 ) 
    {
        printf("There was an error opening the file \n"); 
        exit(1);
    }

    i = 0;
    while ((status = read(fd, buffer, BUFFSIZE))) > 0) /* read until EOF or error */
    {
        /* null terminate buffer */
        buffer[status] = 0;
 
        /* Tokenize 'process string' attributes */
        procstr = strtok(buffer, delim);
        while (procstr)
        {
            proc[i].id = strdup(procstr);
            //printf("id: %s\n", proc[i].id);

            procstr = strtok(NULL, delim);
            proc[i].qaunta = atoi(procstr);
            //printf("Qaunta: %d\n", proc[i].qaunta); 

            procstr = strtok(NULL, delim);
            proc[i].priority = atoi(procstr);
            //printf("Priority: %d\n", proc[i].priority); 

            procstr = strtok(NULL, delim); 
            if (++i >= MAXPROCS)
                goto break_outer; /* quit loop if more than MAXPROCS */
        }
    }
break_outer:

    printf("Size: %d\n", i);
    for (j = 0; j < i; j++)
    {
        printf("%s %3d %3d\n", proc[j].id, proc[j].qaunta, proc[j].priority);
    }

    close(fd);
    return 0; 
}

**spendotw** · 12-31-2011

Thanks oogabooga

This seems to have cleared my program up and thank you for resolving the white space problem... though there is still the same issue of getting the segmentation fault and the program only reads up until process24 which is far from the last.

**oogabooga** · 12-31-2011

This is more difficult than I thought. Your problem probably occurs after the first buffer is used up and you need to read the next one. But the buffer can end at any point, not just at the end of a line. So you need to check for end-of-buffer after every strtok. Worse than that, it's very possible that a field will start at the end of one buffer and continue at the beginning of the next!

**anduril462** · 12-31-2011

You might consider reading one character at a time and doing the "end of line" control yourself. You read one char at a time, putting each one in subsequent spots in your temporary buffer. When you read a new line character, you have one full entry. Stop calling read for a moment, null terminate the buffer and strtok the temp buf with just a space as a delimiter, extracting your id, quanta and priority. When you're all done there, go back to reading another line. It's a bit painful, but perhaps less painful than dealing with partial line/field reads and gluing the next bit onto your temporary buffer.

**spendotw** · 01-01-2012

Originally Posted by oogabooga

This is more difficult than I thought. Your problem probably occurs after the first buffer is used up and you need to read the next one. But the buffer can end at any point, not just at the end of a line. So you need to check for end-of-buffer after every strtok. Worse than that, it's very possible that a field will start at the end of one buffer and continue at the beginning of the next!

Yes it becoming a real pain to fix :\

How is it possible that the buffer can end at any point? when it should be NULL terminated once it meets a delimeter.

When you say end of buffer do you mean each time it reaches the end of line character?
From the code u provided previously i added a few 'if' statements to test whether the buffer had a newline character, but when compiled the prefix's show that theres no newline character.

Thanks

Code:

{
            proc[i].id = strdup(procstr);
            printf("id: %s\n", proc[i].id);
            if (buffer[i] == '\n')
            printf("End of line detected! \n");
 
            procstr = strtok(NULL, delim);
            proc[i].qaunta = atoi(procstr);
            printf("Qaunta: %d\n", proc[i].qaunta);
            if (buffer[i] == '\n')
            printf("End of line detected! \n");
 
            procstr = strtok(NULL, delim);
            proc[i].priority = atoi(procstr);
            printf("Priority: %d\n", proc[i].priority); 
            if (buffer[i] == '\n')
            printf("End of line detected! \n");
            
 	
            
            procstr = strtok(NULL, delim); 
            if (++i >= MAXPROCS)
                goto break_outer; /* quit loop if more than MAXPROCS */
        }

**spendotw** · 01-01-2012

Originally Posted by anduril462

You might consider reading one character at a time and doing the "end of line" control yourself. You read one char at a time, putting each one in subsequent spots in your temporary buffer. When you read a new line character, you have one full entry. Stop calling read for a moment, null terminate the buffer and strtok the temp buf with just a space as a delimiter, extracting your id, quanta and priority. When you're all done there, go back to reading another line. It's a bit painful, but perhaps less painful than dealing with partial line/field reads and gluing the next bit onto your temporary buffer.

Well I resorted back to my old code for your suggestion as it copies all previous data read into the buffer each time it reaches the new line character :

Code:

 
			if (buffer[i] == '\n')
			{
			memcpy(tmpbuff, buffer,BUFFSIZE);
	 		}

And I used only space as the delimiter but this only extracted the first "Process0" and qaunta and priority for a short time before resulting in a seg fault.

Code:

	 procstr = strtok(tmpbuff, " ");
        while (procstr)
        {
            proc[i].id = strdup(procstr);
            printf("id: %s\n", proc[i].id);
 
            procstr = strtok(NULL, " ");
            proc[i].qaunta = atoi(procstr);
            printf("Qaunta: %d\n", proc[i].qaunta);

 
            procstr = strtok(NULL, " ");
            proc[i].priority = atoi(procstr);
            printf("Priority: %d\n", proc[i].priority); 
            
            procstr = strtok(NULL, delim);

**oogabooga** · 01-01-2012

How is it possible that the buffer can end at any point?

Your question shows that you do not understand buffers. Since that is what your assignment is apparently about you should probably have learned about it, unless you're confused about having to use "open" and "read" instead of the more usual "fopen" and "fread" (or fgets or fscanf), which handle the buffer for you.

The code you've given is not even close to what you'd need. What I would do in this situation is make my own function similar to fgets, but that takes a file descriptor instead of a FILE* (and will either need to be passed "buffer" and "buffer_pos" variables (or a struct) or have them as static locals). This is not too difficult, but not entirely trivial either since lines will occasionally span buffer boundaries. With such a function, I would write my main program to read the file line-by-line, and use sscanf (if you're allowed to!?) to extract the data from each line.

As for "how can the buffer end at any point", suppose the buffer were 10 chars long and you read a file containing the chars "hello world\n":

Code:

0123456789
hello worl

Notice that the d and newline won't fit. The buffer doesn't know anything about end of lines, it's just a fixed chunk of bytes. The next buffer-full of the file will have to be read before the line can be completed. So we'd copy what we have so far in the buffer to our line string, read the next buffer:

Code:

0 123456789
d\n........

and add up to the newline to the end of the line string.

BTW, a bandaid solution to your problem would be to make your buffer size bigger than your file size. But that's cheating if this assignment is all about implementing your own buffer handling.

**spendotw** · 01-01-2012

Originally Posted by oogabooga

Your question shows that you do not understand buffers. Since that is what your assignment is apparently about you should probably have learned about it, unless you're confused about having to use "open" and "read" instead of the more usual "fopen" and "fread" (or fgets or fscanf), which handle the buffer for you.

The code you've given is not even close to what you'd need. What I would do in this situation is make my own function similar to fgets, but that takes a file descriptor instead of a FILE* (and will either need to be passed "buffer" and "buffer_pos" variables (or a struct) or have them as static locals). This is not too difficult, but not entirely trivial either since lines will occasionally span buffer boundaries. With such a function, I would write my main program to read the file line-by-line, and use sscanf (if you're allowed to!?) to extract the data from each line.

As for "how can the buffer end at any point", suppose the buffer were 10 chars long and you read a file containing the chars "hello world\n":

Code:

0123456789
hello worl

Notice that the d and newline won't fit. The buffer doesn't know anything about end of lines, it's just a fixed chunk of bytes. The next buffer-full of the file will have to be read before the line can be completed. So we'd copy what we have so far in the buffer to our line string, read the next buffer:

Code:

0 123456789
d\n........

and add up to the newline to the end of the line string.

BTW, a bandaid solution to your problem would be to make your buffer size bigger than your file size. But that's cheating if this assignment is all about implementing your own buffer handling.

Thanks for your reply!

Well my assignment is to create a process scheduler, using only the system functions available ... but nothing specifically based on buffers.

I used your bandaid solution which worked

Although i probably wont learn from this method the most, it works good enough! Thanks oogabooga!

Thread: Storing tokenized strings into different variables

Thread Tools

Search Thread

Display

Similar Threads

Help: Storing tokenized 2-digit char (infix->postfix converter)

Storing Changing Variables in Array

Storing classes in variables?

storing variables permanentely

storing alot of variables...

Tags for this Thread