Thread: strtok

  1. #1
    Registered User
    Join Date
    Apr 2010
    Posts
    4

    strtok

    Hi there. Im trying to extract words from a file and print them out one by one. Eg if the file contained "The flying machine" it will print to stdout on separate lines,
    "The"
    "flying"
    "machine"
    So i assumed i would use the library function strtok to achieve this as it says it splits a string up into tokens using delimiters. I have written some code so that the file is opened for reading and if the user wants, it can convert the strings to lower case also. So with the splitString function i am wanting to read in the file and take the words in it and print them all out separately in either retained or lower case. I no there is something seriously wrong with it at the moment. Still not really sure how to write it as i guess i should be reading each word into a char* buffer that is only able to hold buffSize-1 chars in it and print it out in retained or lower case the continue to the next word or do you do it all in one big array holding all the words from the file and somehow make sure that they are no longer than buffSize-1 chars (10-1) and print them out on seperate lines. Any help would be appreciated.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    
    #define BUFFER_SIZE 10
    #define MAX_WORD_LENGTH 256
    #define USE "Usage: strings [-p] [file]\n"
    char strings[MAX_WORD_LENGTH] = {'\0'};
    
    int splitString(FILE *file, char *buffer, int bufferSize, int retainCase);
    int convertStringToLower(FILE *file); 
    
    int main(int argc, char *argv[])
    {
    	char *result = NULL;
    	FILE *file = NULL;
    	int preserveCase = 0;
    	int error = 0;
    	if((argc > 3) || ((strcmp(argv[1], "-c") != 0) && (argc > 2)) || (argc < 2))
    	{
    		error++;
    	}
    	if(strcmp(argv[1], "-p") == 0)
    	{
    		preserveCase = 1;
    		file = fopen(argv[2], "r");
    	}
    	if(strcmp(argv[1], "-p") != 0)
    	{
    		file = fopen(argv[1], "r");
    	}
    	if(error)
    	{
    		fprintf(stderr, USE, argv[0]);
    		return EXIT_FAILURE;
    	}
    	if(!preserveCase)
    	{
    		splitString(file, result, BUFFER_SIZE, 0);
    	}
    	if(preserveCase)
    	{
    		while(fgets(strings, MAX_WORD_LENGTH, file) != 0)
    		{
    			splitString(file, result, BUFFER_SIZE, 1);
    		}
    	}
    	fclose(file);
    	return EXIT_SUCCESS;
    }
    
    int splitString(FILE *file, char *buffer, int buffSize, int retainCase)
    {
    	char delims[] = " ,().";
    	if(retainCase)
    	{
    		while(fgets(buffer, buffSize - 1, fp))
    		{
    			buffer = strtok(string, delims);
    			while(buffer != NULL)
    			{
    				printf("%s\n", buffer);
    				buffer = strtok(NULL, delims);
    			}
    		}
    	}
    	if(!retainCase)
    	{
    		convertStringToLower(file);
    		while(fgets(buffer, buffSize - 1, fp))
    		{
    			buffer = strtok(strings, delims);
    			while(buffer != NULL)
    			{
    				printf("%s\n", buffer);
    				buffer = strtok(NULL, delims);
    			}
    		}
    	}
    	return EXIT_SUCCESS;
    }
    
    int convertStringToLower(FILE *file)
    {
    	int i = 0;
    	while(fgets(strings, MAX_WORD_LENGTH - 1, file) != 0)
    	{
    		for(i = 0; i < MAX_WORD_LENGTH - 1; i++)
    		{
    			strings[i] = tolower(strings[i]);
    		}	
    	}
    	return EXIT_SUCCESS;
    }

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    All you need is a paltry number of lines of code for this.

    Once you get your string into a char array, just

    Code:
    your char array is initialized to EOS char:
    char array[100] = { '\0' };
    
    len = sizeof(array);
    j = len;
    while(array[--j] != '\0');
    
    for(i = 0; i < j; i++) {
      printf("%c", array[i];
      if(array[i] == ' ')
         putchar('\n');
    }
    I know you can use strtok() for this, but this seems natural and simple, imo?

    I haven't run this code, and it's quite late, but it's close.

  3. #3
    Registered User
    Join Date
    Apr 2010
    Posts
    4
    so i need to get the strings in from the file using fgets? And each word can only be 9 chars long. How do i incorporate that. Like I get what that's doing, its just going through until it hits a nullbite then looping through the length of the total string in the file and printing out the desperate words due to the if(array[i] == " ") as when it hits a space it inserts a new line. But i want to read the next word from the given file into the given buffer, which is of size buffSize bytes and each words length to be buffsize-1 maximum. And chars such as spaces are used to separate the words. Thats why i thought that strtok would be the best idea. Im just confused as to how to do what i said above with reading each word into the buffer then blah blah blah etc etc. So i pass the word in to the buffer using fgets or something then either convert it to lower case if the user wants to or not and print it then move onto the next word. argh!!!

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by Smithy View Post
    so i need to get the strings in from the file using fgets? And each word can only be 9 chars long. How do i incorporate that. Like I get what that's doing, its just going through until it hits a nullbite then looping through the length of the total string in the file and printing out the desperate words due to the if(array[i] == " ") as when it hits a space it inserts a new line. But i want to read the next word from the given file into the given buffer, which is of size buffSize bytes and each words length to be buffsize-1 maximum. And chars such as spaces are used to separate the words. Thats why i thought that strtok would be the best idea. Im just confused as to how to do what i said above with reading each word into the buffer then blah blah blah etc etc. So i pass the word in to the buffer using fgets or something then either convert it to lower case if the user wants to or not and print it then move onto the next word. argh!!!
    You do not need to use fgets(), sorry for the confusion. You can do this, right from the file, without any char array. You can also do this from any char array, no matter how you get the char's into the array. The idea is the same:

    Code:
    start loop
    read a char - from stdin, a file, a string, whatever the input is that you want
    
    if the char is not a space or punctuation that ends a word, or newline
      print that char
    else
      print a newline
    
    end loop when you have what you want
    Your buffer (if you use a char array as a buffer), just has to be big enough to always handle the longest words, AND an end of string char: '\0'

    strtok can't match this algorithm for simplicity and transparency, and those are very important concepts. Google up a strtok program that does this, or read your strtok man pages, or C book, and see if you don't agree.

  5. #5
    Registered User
    Join Date
    Apr 2010
    Posts
    4
    ok. but how do i make sure that a word is not over 9 chars, and if it is, print it on a new line as a seperate word. ie fantastable, prints, fantastab and le. Just a simple if statement? Also this is prob gona sound stupid but in my main i have initialised char *buffer to null as i thought this was good practice. i then pass it into splitString where it it still null of course and then i guess i should use the buffer to store the contents of the file yes? and then do a strncpy into the new char array[] and then loop over using belows stuff.

    Code:
    int splitString(FILE *fp, char *buffer, int buffSize, int retainCase)
    {
    	int j = 0;
    	int len = 0;
    	int i = 0;
    	
    	if(buffer == NULL)
    	{
    		return EXIT_FAILURE;
    	}
    	
    	char array[MAX_WORD_LENGTH];
    	strncpy(array, buffer, MAX_WORD_LENGTH); /* at the moment this is just copying NULL so it segmentation faults  but i put that if statement above to stop it*/
    	
            /*so i guess i want to read in the file into the buffer, stuck at how to do it*/
    
    	len = sizeof(array);
    	j = len;
    	
    	if(retainCase)
    	{
    		while(array[i++] != '\0')
    		{
    			for(i = 0; i < j; i++) 
    			{
    				printf("%c", array[i]);
    				if(array[i] == ' ')
    				{
    					putchar('\n');
    				}
    			}
    			printf("\n");
    		}
    	}
    	
    	if(!retainCase)
    	{
    		convertStringToLower(fp);
    		while(array[i++] != '\0')
    		{
    			for(i = 0; i < j; i++) 
    			{
    				printf("%c", array[i]);
    				if(array[i] == ' ')
    				{
    					putchar('\n');
    				}
    			}
                            printf("\n");
    		}
    	}
    	return EXIT_SUCCESS;
    }
    Last edited by Smithy; 04-09-2010 at 06:16 PM.

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I don't understand your 9 char limit ?? Your char string should be larger if you intend to put unknown words, into it.

    You can see how long the word is by #including <string.h> and using wordLen = strlen(word)

    You don't need to assign anything to NULL, here, and you shouldn't.

    Put all the details of this aside, for now. Get the main flow of the program execution right FIRST, THEN add the rest to it. That's top-down design, in a (very small), nutshell. Keep that in mind when you're starting up programs and you'll save yourself a HUGE amount of time.
    Last edited by Adak; 04-09-2010 at 06:47 PM.

  7. #7
    Registered User
    Join Date
    Apr 2010
    Posts
    4
    like i just want the maximum word length to be 9. And i meant in the main function i have char *buffer = NULL and then i pass that into the splitString. Well in my first post i have it called char *result = NULL. At the moment i have a file called file.txt which i have in it a sentence that is "The triumphant fox jumped over the lazy dog", as i have a word in there that is 10 chars so it wil print the 10th one as a separate word. So the way i thought to do it was to have a BUFFER_SIZE of 10 and read each word in from this file into the char *buffer which is of buffSize-1 and print it (retained or lowercase) then move onto the next word as a space was detected or the buffer was full as the word was 9 chars long.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. 20q game problems
    By Nexus-ZERO in forum C Programming
    Replies: 24
    Last Post: 12-17-2008, 05:48 PM
  2. strtok is causing segmentation fault
    By yougene in forum C Programming
    Replies: 11
    Last Post: 03-08-2008, 10:32 AM
  3. trying to use strtok() function to parse CL
    By ohaqqi in forum C Programming
    Replies: 15
    Last Post: 07-01-2007, 09:38 PM
  4. strtok tokenizing on spaces as well as my delimiter
    By snowblind37 in forum C++ Programming
    Replies: 2
    Last Post: 06-15-2004, 12:39 AM
  5. Trouble with strtok()
    By BianConiglio in forum C Programming
    Replies: 2
    Last Post: 05-08-2004, 06:56 PM