Thread: Reading a file

  1. #1
    Registered User
    Join Date
    May 2010
    Posts
    3

    Reading a file

    Hello,

    I am trying to read the entire contents of a file and put it in to some type of variable that will allow manipulation of the entire string. The actual file length will vary over time so I am unsure how to proceed with a set array size.

    The basics of the file is that it is going to have tags to specify it's content like so (simple example)-

    <person>
    <name>Gary</name>
    <text>...Some text describing person...</text>
    </person>
    <person>
    <name>Susan</name>
    <text>...Some text describing person...</text>
    </person>
    ...etc...

    In another language I could just put it in to one string variable and then parse out each piece into it's appropriate object property, then rinse and repeat. Thus I would end up with a bunch of person objects with the correct information.

    I have managed to figure out how to do a pseudo-OO with structs and functions, but the nature of i/o in c is kicking my tail.

    Any help would be appreciated as I have done days of reading trying to grasp things like fscanf and still don't know how to get the entire file read, let alone put it in one variable (array) and then parse it into it's appropriate pieces.

    Thanks.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Since you seem to be parsing XML, how about using a pre-existing XML parser library?
    The Expat XML Parser

    It's a bit of extra work up front, but at least you won't have to worry too much about feature creep or minor changes in the text file which might otherwise break a simpler parser.

    3 quick hacks for reading files
    Code:
    // Good - file is stored as a linked list of lines
    struct linelist {
      struct linelist *next;
      char    buff[BUFSIZ];
    };
    
    struct linelist *head = NULL, *tail = NULL;
    char buff[BUFSIZ];
    
    while ( fgets ( buff, sizeof buff, fp ) != NULL ) {
      struct linelist *temp = malloc( sizeof *temp );
      if ( temp ) {
        strcpy( temp->buff, buff );
        temp->next = NULL;
        if ( !head ) {
          head = tail = temp;
        } else {
          buff->next = tail;
          tail = temp;
        }
      }
    }
    
    
    // Bad - file is all in a single buffer, but expensive to create for long files
    char *file = malloc(1);
    *file = '\0';
    char buff[BUFSIZ];
    
    while ( fgets ( buff, sizeof buff, fp ) != NULL ) {
      size_t  newlen = strlen(file) + strlen(buff) + 1;
      void *temp = realloc( file, newlen );
      if ( temp ) {
        file = temp;
        strcat( file, buff );
      }
    }
    
    // Ugly - cheap hack to store the whole file at once
    // Makes some assumptions about how text files are read (see "rt" mode)
    // Probably best not to use this one...
    fseek( fp, 0, SEEK_END );
    long len = ftell( fp );
    rewind( fp );
    char *file = malloc( len + 1 );
    fread( file, 1, len, fp );
    file[len] = '\0';
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    Code:
          buff->next = tail;
          tail = temp;
    Hmmm. buff has no next

    more like
    Code:
    tail->next = temp;
    tail = temp;
    i suppose?
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Yeah something like that (probably).
    It's just an idea code
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User
    Join Date
    May 2010
    Posts
    3
    Thanks for the help Salem. I have looked over your code for good and ugly so far. Ugly worked like a charm and I'm wondering what makes it a bad idea other than it making assumptions of how the file is read? In my case only the program will be opening that file and only in a "r" state, but maybe there is a whole lotta bad that can happen I'm unaware of. Thoughts?

    As for good, it worked great as well, except for the following code which confused me-

    Quote Originally Posted by Salem View Post
    Code:
    if ( temp ) {
      strcpy( temp->buff, buff );
      temp->next = NULL;
      if ( !head ) {
          head = tail = temp;
        } else {
          buff->next = tail;
          tail = temp;
        }
    }
    I ended up using-

    Code:
    while ( fgets ( buff, sizeof buff, fp ) != NULL ) {
    	line = malloc(sizeof *line);
    	strcpy( line->buff, buff );
    	line->next = head;
    	head = line;
    }
    But other than that it worked perfect. I'm more familiar with five functions because of your code and I got away from fscanf. YES!

    As for the XML parser I will definitely check it out and thanks for the heads up.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Quote Originally Posted by c99
    7.19.9.2 The fseek function
    Synopsis
    1 #include <stdio.h>
    int fseek(FILE *stream, long int offset, int whence);
    Description
    2 The fseek function sets the file position indicator for the stream pointed to by stream.
    If a read or write error occurs, the error indicator for the stream is set and fseek fails.

    3 For a binary stream, the new position, measured in characters from the beginning of the
    file, is obtained by adding offset to the position specified by whence. The specified
    position is the beginning of the file if whence is SEEK_SET, the current value of the file
    position indicator if SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not
    meaningfully support fseek calls with a whence value of SEEK_END.

    4 For a text stream, either offset shall be zero, or offset shall be a value returned by
    an earlier successful call to the ftell function on a stream associated with the same file
    and whence shall be SEEK_SET.
    Seeking the end of a file can be problematic. The only sure way is to read the whole file until you get EOF, then use ftell().

    If you're on a modern POSIX system like say Linux, then "r" and "rb" are the same thing. There is no difference in taste between the two.

    The most obvious difference is seen on DOS/Windows platforms where the end of line marker is "\r\n" (two characters) is read by your program as just "\n" (one character) when you read the file in text mode. The seek may tell you the number of bytes, whereas the read would perform the character mapping. In this case, it would not be an issue as the read would deliver fewer characters than bytes (because all \r's are effectively stripped).

    Other still rarer systems could conceivably deliver MORE characters than bytes, so a simple seek-malloc strategy would fail horribly.

    The other downside is the stream may not be seekable at all (say stdin, a pipe or a socket), so a "read and allocate as you go" is the only way to deal with the problem.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    May 2010
    Posts
    3
    I wanted to come back to this thread to post an easy solution I came across. The GNU library GLib has a set of functions for file reading. In particular I used this function:

    Code:
    gboolean            g_file_get_contents                 (const gchar *filename,
                                                                             gchar **contents,
                                                                             gsize *length,
                                                                             GError **error);
    which worked like a charm. Also, it has a simple xml parser which I am currently figuring out how to use. So...heads up.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. opening empty file causes access violation
    By trevordunstan in forum C Programming
    Replies: 10
    Last Post: 10-21-2008, 11:19 PM
  2. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  3. Replies: 3
    Last Post: 03-04-2005, 02:46 PM
  4. Possible circular definition with singleton objects
    By techrolla in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2004, 10:46 AM
  5. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM