Thread: Reading words and analyzing words from file

  1. #1
    Registered User
    Join Date
    Dec 2007
    Posts
    30

    Reading words and analyzing words from file

    I'm trying to analyze a file, given with a structure like this:

    Key1 param1 param2 param3
    Key2 param4 param5 param6 param7
    Key3 param8 param9

    The file has unknown number of lines, each line begins with a key and has unknown amount of parameters.

    I want to compare one key's parameter with another key's parameter, the one's in the same column respectively, to be exact (in the example above, these would be param1 and param4, or param2 and param5 and so on).

    I started by opening the file and looping through all the lines, like this:
    Code:
    char row[30];	
    while (fscanf(fp, "%s\n", &row) == 1)
    {
       ...
    }
    But I already don't like that I'm defining a fixed array as I'll be screwed if the row's length is greater that 30.

    Ignoring that, inside the loop, I tried to split the row into parts (using strtok), to get the keys and parameters. I did it like that:
    Code:
    typedef struct
    {
    	char *name;
    	char params[20][6];
    	int n;
    } rows;
    
    void splitrows(const char *row, rows *frows)
    {
    	int i = 0;	
    	char *tbc = "	"; // Horizontal tab character
    	char *frow = malloc(sizeof *row); 
    	frow = strdup(row);
    	char params[10][20];	 
    		
    	char *t = strtok(frow, tbc); (*frows).name = t;
    	while (t != NULL)
    	{
          		strcpy((*frows).params[i++], t);
          		t = strtok(NULL, tbc);
          		(*frows).n = i;
    	}  	
         	
      	free(frow);			
    }
    I would then call the function out in the while loop and load up an array of rows (the structure 'rows') and then do the math and everything else i really need to do. Like this:

    Code:
    	char row[30]; int rnr = 0, i = 0;
    	while (fscanf(fp, "%s\n", &row) == 1) rnr++; // Get the number of lines in the file
    	rows frows[rnr]; // Make an array of rows.
    	
    	while (fscanf(fp, "%s\n", &row) == 1)
    	{
    		splitrows(row, &frows[i]); i++;
    	}
    But unfortunately it doesn't work. And it's really messy. I'm using fixed size array's where I shouldn't (as I don't have a glue how to do it otherwise).

    Is there another way to do this ? Could anyone please at least point me in the right direction ?

    Best wishes and many thanks,
    Desmond

  2. #2
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    scanfs stops on spaces
    use fgets to read the whole string

    note that row is already pointer to char, not need to use & with it
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  3. #3
    Registered User
    Join Date
    Dec 2007
    Posts
    30
    Any pointers how to solve this puzzle in general (not using fixed size arrays and so on) ?

  4. #4
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    1. Read FAQ about reading lines from user (the same goes for files)

    You anyway will need some static array to read a line - make it big enough to store any possible line

    2. Anyway - fgets will prevent buffer overrun, and you can analyze the buffer for presense of '\n' - if it is not stored by fgets - probably line was truncated - parse what is available and read again to parse what is left

    3. Another approach - use ftell to determine the file size, allocate the buffer of the required size and read the whole size into this buffer. Afterwards - work with this buffer only
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  5. #5
    uint64_t...think positive xuftugulus's Avatar
    Join Date
    Feb 2008
    Location
    Pacem
    Posts
    355
    Quote Originally Posted by desmond5 View Post
    Any pointers how to solve this puzzle in general (not using fixed size arrays and so on) ?
    Go ahead and use fixed size arrays noting the fact that computers have so much memory, why not use a fixed size array of say 1000000 bytes? It would rarely overflow reading a line from a text file.

    Well ok actually i am joking. The best way to do it is to have a reasonable sized buffer for a line. I am not going to state the number of bytes that makes a reasonable sized buffer, as that depends on its application. Then you can use fgets, to read a line, and check the last character of the read in string, if it is '\n' then the line was read entirely, if not you must take special action according to your application, as the line was longer than the buffer.

    My approach would use a combination of realloc and strcat on a variable that would be a char * representing a line of input, and a fixed size buffer populated using fgets.
    Code:
    ...
        goto johny_walker_red_label;
    johny_walker_blue_label: exit(-149$);
    johny_walker_red_label : exit( -22$);
    A typical example of ...cheap programming practices.

  6. #6
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    Open the file, scan in character by character and read into an array, what's wrong with a fixed size array?

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by xuftugulus View Post
    Go ahead and use fixed size arrays noting the fact that computers have so much memory, why not use a fixed size array of say 1000000 bytes?
    I agree that using fixed size buffers isn't a bad idea, as long as their size is reasonable. And 1M bytes isn't out of the ordinary. Just make sure that such large variables are not "auto" variables, meaning they should not be local variables in a function - using large amounts of stack-space by using large arrays as local variables is a good way to run out of stack and thus crashing the application in a completely unrecoverable way [as in the OS has no other option than to kill the entire application].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Then you can use fgets, to read a line, and check the last character of the read in string, if it is '\n' then the line was read entirely, if not you must take special action according to your application, as the line was longer than the buffer.
    Note that the last line in the file might not contain a terminating newline -- so in this case, you need to check the last character against '\n' and use feof() or something to see if the end of the file has been reached.

    Just make sure that such large variables are not "auto" variables, meaning they should not be local variables in a function - using large amounts of stack-space by using large arrays as local variables is a good way to run out of stack and thus crashing the application in a completely unrecoverable way [as in the OS has no other option than to kill the entire application].
    Global (gasp) or static variables, including local static variables, would work in this case -- all of the mentioned storage types are stored on the heap, not on the stack.

    My approach would use a combination of realloc and strcat on a variable that would be a char * representing a line of input, and a fixed size buffer populated using fgets.
    Me too -- except you don't have to use strcat(), because you probably need to keep track of how many characters you've already read, in which case you can just go
    Code:
    strcpy(buffer + length, newchars);
    and save strcat() from iterating over the string to find its newline.

    Note: I have written a function like this a while ago, for codeform. (Grab codeform's source and search for "get_string".) It's not very well written -- a structure would improve things a lot -- but it should give you a general idea of what we're talking about.

    Two hints when you're writing this kind of function: don't free any memory until you're done reading. Allocating and freeing memory all the time is wasteful and inefficient. And double the size of the allocated memory each time you run out -- this helps when dealing with very, very long lines. Adding, say, BUFSIZ each time is okay, but can get slow.

    That's another thing -- BUFSIZ is a good initial value. Disk reading and writing is supposed to be efficient in chunks of BUFSIZ -- and of course, doubling BUFSIZ will give you a multiple of BUFSIZ, and so on. BUFSIZ is at least 512, sometimes more like 8192 on 64-bit systems, and can be found in <stdio.h>.
    Last edited by dwks; 02-26-2008 at 03:58 PM.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Seg Fault in Compare Function
    By tytelizgal in forum C Programming
    Replies: 1
    Last Post: 10-25-2008, 03:06 PM
  2. seg fault at vectornew
    By tytelizgal in forum C Programming
    Replies: 2
    Last Post: 10-25-2008, 01:22 PM