Thread: reading text-and-numbers file word by word

  1. #1
    Registered User
    Join Date
    Nov 2008
    Posts
    45

    reading text-and-numbers file word by word

    hi,

    i have a 3-page text file which contains both text and numbers. i wish to jump to the 3rd page (to save computation time), read the words in that page word by word, and each word that i read i have to compare with a word that i have in mind. once the read word matches the desired word, i wish to extract it, and a number that is near (but not beside/adjacent to) it, then write both items to another text file.

    does anyone have any idea how i can go about doing it?

    thanks.

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    You'd have to use your knowledge of the data file, to do this. You say the text is full of numbers, so how can you find out which one is THE number?

    A page can hold a lot of words, or very few words. You can fseek() to jump ahead in the file, you'd have to know how far ahead you should seek, in byes, or count up the page break char's, or something.

    Since you already know the "word" so you can find it, there's no need to "extract" it, but a simple assignment to a char word[] array, could do that for you, if you wanted to.

    In summary, this is a simple thing to do, but you need to know the file name, the word you're seeking, and how to find this number, within the text.

    In practical terms, you wouldn't gain much by jumping to the 3rd page, because all three pages will probably fit in one large char buffer[], and the word search can be done very quickly in memory, using strstr(). You'll definitely want to include string.h for that function.

  3. #3
    Registered User
    Join Date
    Nov 2008
    Posts
    45
    yes, the file is sorted in such a way that it is like a table, in the sense that each row starts with a word (a scientific quantity, really), and then this is followed by 3 numbers, separated by tabs. what i intend to do is to go to the row which has the desired quantity, then retrieve a particular number among those three.

    in this case, then, can i read the file row by row, then use strstr() to match the word i want, and then zoom in on the row number and get the number that i want?

    is there any other way?

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    When you say "go to the row which has the desired quantity", you know what you're saying, right? The computer will have to read each desired quantity to "go to the row".

    Yes, go line by line with fgets(). You may want strstr(), but you can do this other ways as well (always more than one way to get a job done with C).

    This is a simple task, and the way I suggested is a simple solution. Other ways are basically the same, with slight wrinkles. For instance, you could do most of this just using fscanf(), but fscanf() can be troublesome, depending on the data. Using strstr() is much easier.

    As the name of the function indicates, it looks for a string, inside a larger string. Perfect for what you want to do.

  5. #5
    Registered User
    Join Date
    Nov 2008
    Posts
    45
    i have written the following:

    Code:
    	FILE *infile;
    	infile = fopen("some_text_file", "r");
    	char str[10000];
    	fgets(str, 10000, infile);
    	printf("this is the string: %s\n", str);
    	fclose(infile);
    however, this only returns the top line in the file and nothing else, presumably because the function fgets() stops when it meets a newline character. i tried to include an end-of-file check by

    Code:
    	while (!feof(infile)) {
    	fgets(str, 10000, infile);
    	}
    but then this returns only the last line in the file. how can i change this to make str[] contain all the lines in the infile?

    thx.

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by bored_guy View Post
    i have written the following:

    Code:
    	FILE *infile;
    	infile = fopen("some_text_file", "r");
    	char str[10000];
    	fgets(str, 10000, infile);
    	printf("this is the string: %s\n", str);
    	fclose(infile);
    however, this only returns the top line in the file and nothing else, presumably because the function fgets() stops when it meets a newline character. i tried to include an end-of-file check by

    Code:
    	while (!feof(infile)) {
    	fgets(str, 10000, infile);
    	}
    but then this returns only the last line in the file. how can i change this to make str[] contain all the lines in the infile?

    thx.
    Ding, ding, ding!

    Yes, fgets() stops when it gets to the end of the line. Usually, you'd want to use it in a loop like this:

    Code:
    //where buffer is your char array, and fp is your FILE *fp (file pointer).
    
    while((fgets(buffer, sizeof(buffer), fp) != NULL) { //EDIT: NULL, not EOF - thanks Brewbuck for that correction.
       //your code to work with the buffer 
        //char's in here
    }
    Of course, it's OK to take in the whole file, into a big char array, but your description pointed toward a line based logic as best, IMO. This also avoids using large amounts of memory, and the possible problem of crashing the program because either:

    1) There wasn't enough contiguous memory available

    or

    2) The file was larger than the char array.

    Your memory request is being made from a rather small part of your total memory (static memory), not the larger dynamic memory.
    And like I mentioned, I think the code can just be easier, as well.
    Last edited by Adak; 10-16-2009 at 05:08 PM.

  7. #7
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    To read the file and have it read in the whole thing at once, you'll have to not use "string" based IO (ie, with fgets - the "s" means "string") and use binary I/O (ie, fread).

    However, since you are reading a text file. it would be easier to read it a line at a time, and when you get to the line you want, just parse it out.
    Mainframe assembler programmer by trade. C coder when I can.

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    [QUOTE=Adak;901775]
    Code:
    //where buffer is your char array, and fp is your FILE *fp (file pointer).
    
    while((fgets(buffer, sizeof(buffer), fp) != EOF) { //EOF = end of file marker
       //your code to work with the buffer 
        //char's in here
    }
    fgets() never returns EOF. It returns NULL if it reaches end-of-file without reading anything.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  9. #9
    Registered User
    Join Date
    Nov 2008
    Posts
    45
    thanks for all the suggestions. however, i have not really grasped the concept of pointers, and since my text files (well, some are binary) are rather straightforward and i am only extracting a few parameters, i think i will avoid pointers for the time being. so, i have written the following to pick out the lines that contain the desired parameter:

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    #include <string.h>
    
    int main()
    {
    	
    	FILE *infile;
    	FILE *outfile;
    	
    	infile = fopen("/media/Data/some_text_file", "rb");
    	outfile = fopen("/media/Data/output", "w");
    	
    	long int length = 1000;      //an arbitrary maximum length for each line
    	char str[length];
    	char str1[] = "desired_parameter";
    	int i = 1;
    	
    	while (!feof(infile)) {
    	fgets(str, 1000, infile);
    	if(strstr(str, str1) == NULL) {}
    	else {
    		fprintf(outfile, "string%i found: %s\n", i, str);
    		i++;
    	}
    	
    }
    	if(i==1) fprintf(outfile, "your search for %s is not found anywhere in the document\n", str1);
    	
    	fclose(infile);
    	fclose(outfile);
    	
      return 0;
    }
    this works just fine, and returns what i want.

    however, the next part i want to do is to break down each retrieved line into separate components and retrieve just 1. each line is basically made up of 1 to 2 words and 2-3 numbers (floating point), ie each line has about 3-5 components, each separated by a space/tab. how should i go about doing that?

  10. #10
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> while (!feof(infile))

    You're checking for end-of-file too early. It should be done immediately after the call to fgets. Incidentally, it isn't even necessary: just check the return value of fgets - if it's NULL then you've hit EOF (or else some other error occured).

    >> if(strstr(str, str1) == NULL)

    Why not just eliminate the following 'else' by reversing the sense of the comparison (eg : !=)?

    >> fgets(str, 1000, infile);

    Avoid magic numbers. You've already got 'length', so use it.

    >> however, the next part i want to do is to break down each retrieved line into separate components and retrieve just 1. each line is basically made up of 1 to 2 words and 2-3 numbers (floating point), ie each line has about 3-5 components, each separated by a space/tab. how should i go about doing that?

    Initialize a flag to 'clear' (eg: 'false'). Read a character. If it's not whitespace, then add it to a buffer, and 'set' the flag (eg: 'true'). Otherwise, if the flag is set, copy the buffer to your array (or what have you), clear the buffer and the flag, and repeat until no characters are left.
    Last edited by Sebastiani; 10-18-2009 at 08:23 PM. Reason: Should have been "Initialize to 'clear'", not 'set'

  11. #11
    Registered User
    Join Date
    Nov 2008
    Posts
    45
    i used the following code to read a file (infile) and write every read line to outfile as floating point numbers. everything goes smoothly except that the last line of infile is always written to outfile twice. what is the reason for that?

    Code:
    #define LENGTH 100
           
            char str[];
    
    	while (!feof(infile)) {
    		fgets(str, LENGTH, infile);
    		float f = atof(str);
    		fprintf(outfile, "%f\n", f);
    	}
    @sebastiani: i tried what you suggested, putting fgets before the while loop. but that was a disaster because the loop never ended and my outfile became exponentially big -- i had to kill the program at the ~100mb stage

  12. #12
    Registered User
    Join Date
    Oct 2009
    Location
    While(1)
    Posts
    377
    Code:
    #include <stdio.h>
    
    char line[LINE_MAX];
    while (fgets(line, LINE_MAX, fp) != NULL) {
    // fprintf();
    }

    did u compiled your code i am having doubt on it

  13. #13
    Registered User
    Join Date
    Nov 2008
    Posts
    45
    hey thanks loads rockymarrone, that did it!

    yes, i compiled on command line using gcc -Wall. no compile- or run-time errors/warnings whatsoever.

    which part of my code caused the last line of infile to be fprinted twice? is it something to do with how fgets reacts to an end-of-line character?

  14. #14
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Did you read the FAQ on why not to use feof to control a loop?

  15. #15
    Registered User
    Join Date
    Nov 2008
    Posts
    45
    oh i see. thanks for the pointer!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Advice reading lines from a text file.
    By Fujitaka in forum C Programming
    Replies: 2
    Last Post: 08-11-2009, 09:43 PM
  2. Reading numbers from a text file
    By wolfindark in forum C++ Programming
    Replies: 12
    Last Post: 03-24-2007, 01:57 PM
  3. How to copy a word or numbers from text to other location
    By trancedeejay in forum C Programming
    Replies: 12
    Last Post: 02-09-2006, 06:43 AM
  4. Read word from text file (It is an essay)
    By forfor in forum C Programming
    Replies: 7
    Last Post: 05-08-2003, 11:45 AM
  5. Help reading text file word by word
    By Unregistered in forum C++ Programming
    Replies: 6
    Last Post: 05-25-2002, 05:13 PM