Thread: Reading textfile

  1. #1
    C Beginner
    Join Date
    Dec 2011
    Location
    Portugal
    Posts
    187

    Reading textfile

    Hello everyone,

    I have a textfile that looks like this :

    Code:
    64	.
    16	./.DS_Store
    8	./a
    8	./merdas.txt
    8	./o
    24	./so
    8	./so.c

    I need to read this into a structure in which the values (64,16,8,24) will be the keys and ". , ./.DS_Store, etc ..." will be the info.

    Would this be a proper structure for that ending ?

    Code:
    typedef struct Info {
        char key;
        int value;
        struct Info *next;
    }Info;
    Thanks

  2. #2
    Registered User migf1's Avatar
    Join Date
    May 2013
    Location
    Athens, Greece
    Posts
    385
    Assuming you intend to use a linked list for storing the processed file in memory, I would say something like the following...

    Code:
    struct Node {
        int key;
        char *info;
        struct Node *next;
    };
    or if info cannot exceed a known maximum size, then...

    Code:
    struct Node {
        int key;
        char info[MAX_INFOSIZ];
        struct Node *next;
    };
    which it may overall use more memory but it can considerably simplify the code.

  3. #3
    C Beginner
    Join Date
    Dec 2011
    Location
    Portugal
    Posts
    187
    Quote Originally Posted by migf1 View Post
    Assuming you intend to use a linked list for storing the processed file in memory, I would say something like the following...

    Code:
    struct Node {
        int key;
        char *info;
        struct Node *next;
    };
    or if info cannot exceed a known maximum size, then...

    Code:
    struct Node {
        int key;
        char info[MAX_INFOSIZ];
        struct Node *next;
    };
    which it may overall use more memory but it can considerably simplify the code.
    I'm gonna stick with the first one, thanks.

    How should I be doing the reading to key and info now ?

    Thanks.

  4. #4
    Registered User migf1's Avatar
    Join Date
    May 2013
    Location
    Athens, Greece
    Posts
    385
    Quote Originally Posted by DeanWinchester View Post
    I'm gonna stick with the first one, thanks.

    How should I be doing the reading to key and info now ?

    Thanks.
    If there's a maximum expected length for each line in the text file (say MAX_LNLEN), you could read lines one by one using for example...

    Code:
    fgets(buffer, MAX_LNLEN+1, fp)
    with buffer declared as an array of MAX_LNLEN+1 characters (+1 is for '\0').

    Now, if you use the second struct, you could simply...
    Code:
    sscanf(buffer, "%d %s", &node->key, node->info);
    By using the first struct, things get a bit more complicated because you need to tokenize the buffer yourself (for example using strtok) and then count the length of the 2nd token, so you can allocate the required memory for node->info and copy the 2nd token into it.

    Also, when freeing up your linked list, then for every node you should first free up node->info before freeing up the node itself. This wouldn't be necessary if you had chosen then 2nd struct.

    Finally, if there is no expected maximum length for the lines in the file, then using fgets is not really an option. In that case you should read the file char by char and do all the housekeeping manually. However, if that's just a homework, I doubt this will be necessary.
    Last edited by migf1; 05-25-2013 at 06:12 PM.

  5. #5
    C Beginner
    Join Date
    Dec 2011
    Location
    Portugal
    Posts
    187
    Quote Originally Posted by migf1 View Post
    If there's a maximum expected length for each line in the text file (say MAX_LNLEN), you could read lines one by one using for example...

    Code:
    fgets(buffer, MAX_LNLEN+1, fp)
    with buffer declared as an array of MAX_LNLEN+1 characters (+1 is for '\0').

    Now, if you use the second struct, you could simply...
    Code:
    sscanf(buffer, "%d %s", &node->key, node->info);
    By using the first struct, things get a bit more complicated because you need to tokenize the buffer yourself (for example using strtok) and then count the length of the 2nd token, so you can allocate the required memory for node->info and copy the 2nd token into it.

    Also, when freeing up your linked list, then for every node you should first free up node->info before freeing up the node itself. This wouldn't be necessary if you had chosen then 2nd struct.

    Finally, if there is no expected maximum length for the lines in the file, then using fgets is not really an option. In that case you should read the file char by char and do all the housekeeping manually. However, if that's just a homework, I doubt this will be necessary.
    Thanks for your awesome replies.
    And no, it's not homework, it's something I need to deliver for a class in University and it's very very important, should be done to the perfection.

    I email'ed the Professor and he replied saying my struct shouldn't be as shown, that I should use an Hash Table since I'll have the same key a few times.

    What would be the typical struct for an hash table ?

  6. #6
    Registered User migf1's Avatar
    Join Date
    May 2013
    Location
    Athens, Greece
    Posts
    385
    Quote Originally Posted by DeanWinchester View Post
    Thanks for your awesome replies.
    You are very welcome
    And no, it's not homework, it's something I need to deliver for a class in University and it's very very important, should be done to the perfection.

    I email'ed the Professor and he replied saying my struct shouldn't be as shown, that I should use an Hash Table since I'll have the same key a few times.

    What would be the typical struct for an hash table ?
    In its simplest form, and judging from the sample data you provided, I can only guess that you have a relative small and finite set of keys (say from 0 to 128), then a simple hash table would be an array of 129 pointers, with each one pointing to a linked list. Each of those linked-lists will contain info-strings that are hashed under the same key. For example, in your sample data, 4 different info-strings will be hashed under the key 8.

    So, something like this should get you started...

    Code:
    #define NKEYS 129
    
    struct Node {
        char *info;
        struct Node *next;
    };
    
    Node hashtable[NKEYS];
    Note that this a rather naive implementation. There are several sources for much more efficient implementations (your textbook most probably is one of them ).
    Last edited by migf1; 05-25-2013 at 06:40 PM.

  7. #7
    C Beginner
    Join Date
    Dec 2011
    Location
    Portugal
    Posts
    187
    Quote Originally Posted by migf1 View Post
    You are very welcome

    In its simplest form, and judging from the sample data you provided, I can only guess that you have a relative small and finite set of keys (say from 0 to 128), then a simple hash table would be an array of 129 pointers, with each one pointing to a linked list. Each of those linked-lists will contain info-strings that are hashed under the same key. For example, in your sample data, 4 different info-strings will be hashed under the key 8.

    So, something like this should get you started...

    Code:
    #define NKEYS 129
    
    struct Node {
        char *info;
        struct Node *next;
    };
    
    Node hashtable[NKEYS];
    Note that this a rather naive implementation. There are several sources for much more efficient implementations (your textbook most probably is one of them ).
    I understood, this looks like a very good implementation by you.

    Thanks so much.

    I can always raise NKEYS depending on what Professor might say but that looks like a perfect hashtable, I'm gonna use it.

    I'll be trying to read the textfile now and update this with doubts I might find.

    Thanks so much once again!
    Last edited by DeanWinchester; 05-25-2013 at 07:07 PM.

  8. #8
    C Beginner
    Join Date
    Dec 2011
    Location
    Portugal
    Posts
    187
    Now that I'm working with an hash table what would be a way of reading from the text file to it ?

  9. #9
    Registered User migf1's Avatar
    Join Date
    May 2013
    Location
    Athens, Greece
    Posts
    385
    Quote Originally Posted by DeanWinchester View Post
    Now that I'm working with an hash table what would be a way of reading from the text file to it ?
    More or less like before. But instead of having the key part of the node you can use a local variable for it (say: key) so you can insert the node containing the info-string into the hashtable[key] linked list.

  10. #10
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    If you happen to have a C99 compiler and a POSIX.1-2008 -compatible C library (basically all except Microsoft ones are), the code for reading hashtableable lines from any standard C stream is pretty straightforward:
    Code:
    #define _ISO_C99_SOURCE
    #define _POSIX_C_SOURCE 200809L
    #include <stdlib.h>
    #include <string.h>
    #include <stdio.h>
    #include <ctype.h>
    #include <errno.h>
    
    struct entry {
        struct entry    *next;
        long             key;
        size_t           size;
        char             data[];
    };
    
    struct entry *fentry(char **const dataptr, size_t *const sizeptr,
                         FILE *const input)
    {
        ssize_t         len;
        size_t          datalen;
        long            key;
        int             keyend, head;
        struct entry   *ent;
    
        len = getline(dataptr, sizeptr, input);
        if (len < (ssize_t)1) {
            if (ferror(input))
                errno = EIO;    /* Read error */
            else
            if (!feof(input))
                errno = ENOMEM; /* Out of memory */
            else
                errno = 0;      /* End of input */
            return NULL;
        }
    
        /* Parse key. */
        keyend = head = -1;
        (void)sscanf(*dataptr, " %ld%n %n", &key, &keyend, &head);
        if (head == keyend) {
            errno = EDOM; /* Input formatting error */
            return NULL;
        }
    
        /* Trim trailing whitespace off from the line. */
        while (len > (ssize_t)0 && isspace((*dataptr)[len - 1]))
            len--;
    
        /* Data is all that follows head, up till newline. */
        if ((ssize_t)head < len)
            datalen = (size_t)len - (size_t)head;
        else
            datalen = 0;
    
        /* Allocate memory for the new entry, data, and end-of-string mark. */
        ent = malloc(datalen + 1 + sizeof *ent);
        if (!ent) {
            errno = ENOMEM;
            return NULL;
        }
    
        /* Fill in the entry structure. */
        ent->next = NULL;
        ent->key  = key;
        ent->size = datalen;
        if (datalen > 0)
            memmove(ent->data, (*dataptr) + head, datalen);
        ent->data[datalen] = '\0';
    
        return ent;
    }
    The above code can be made C89 compliant, but I like C99 and POSIX.1-2008 better. It ignores leading and trailing whitespace on each line, and handles embedded NUL bytes (\0) correctly.

    The
    " %ld%n %n"
    scanning pattern may require a bit of an explanation. It basically reads: After any number of whitespace (or none), convert and assign a long int integer, store the number of characters consumed thus far into an int, skip any number of whitespace (or none), and store the number of characters consumed thus far.

    Because different implementations of sscanf() may or may not count %n in the result, I cannot rely on the sscanf() return value. I instead preset keyend (the variable corresponding to the first %n) and head (the variable corresponding to the second %n) to -1. If these are unchanged, the conversion failed.

    If you look closely, I actually only verify that head > keyend. Because %n stores a nonnegative value, that test actually verifies that the initial long int was matched and assigned (as otherwise both would still be -1), and that at least one whitespace character followed it (as otherwise head would be -1, and then head <= keyend).

    I did this because I didn't want to accept data such as '1234abc'; I require some kind of space in between the key and the value.

    To read entries (lines) from say standard input, you can first try the following main():
    Code:
    int main(void)
    {
        char         *cache_data = NULL;
        size_t        cache_size = 0;
    
        struct entry *curr = NULL;
    
        while ((curr = fentry(&cache_data, &cache_size, stdin))) {
            printf("Key %ld, data '%s' (length %lu)\n", curr->key, curr->data, (unsigned long)curr->size);
            free(curr);
        }
        if (errno) {
            const char *const errmsg = strerror(errno);
            fprintf(stderr, "Error reading standard input: %s.\n", errmsg);
            return 1;
        }
    
        free(cache_data);
        cache_data = NULL;
        cache_size = 0;
    
        return 0;
    }
    To create a hash table, first allocate an array of (hash table entries) of pointers to struct entry:
    Code:
    struct entry **hashtable_entry; /* Array of pointers */
    unsigned long  hashtable_size; /* number of pointers allocated */
    Initialize all the pointers to NULL, when you create the (empty) hash table.

    Because your hash keys are already integer numbers, your hash function can be as simple as
    Code:
    ((unsigned long)key) % hashtable_size
    in other words, positive key modulo hash table size. All possible key values will map to [0, hashtable_size-1] this way.

    Inserting a new entry curr into the hash table is as simple as
    Code:
    {
        const unsigned long  i = ((unsigned long)key) % hashtable_size;
        curr->next = hashtable_entry[i];
        hashtable_entry[i] = curr;
    }
    Note that the above actually prepends the new entry in front of existing hashed entries, because it's simpler. (It turns out that in practice, it does not affect performance.)

    A loop over all hash table entries having key key is as simple as
    Code:
    {
        struct entry *ent = hashtable_entry[((unsigned long)key) % hashtable_size];
        while (ent) {
            if (ent->key == key) {
    
                /* Yes, ent is one of the entries with the desired key */
    
            }
            ent = ent->next;
        }
    }
    When the number of entries is not known beforehand, the hash table size is basically a guess. Since the actual hash table does not use that much memory (4+4N bytes on 32-bit architectures, and 8+8N bytes on 64-bit architectures), it makes sense to start with a large hash table, say one with 4093 or 8191 entries.

    Having a hash table "too large" is never an issue; the only cost is the memory used (not much!) and the CPU time used to initialize the pointers to NULL (even less!).

    When you start having collisions (multiple entries chained on the same hash), you can recreate the hash table at a larger size. You simply create a new array, and insert each entry from the old array to the new one.

    The point where rehashing is needed/useful, depends heavily on the data. In this case, I suspect it would be an acceptable starting point to increase the hash table size by say 50% whenever there are more total entries than hash table entries. That approach should scale well to millions of entries.

    If the order of the entries in the file was also meaningful, you could either add an index variable, or a second pointer, to the entry structure. I'd prefer a second pointer. When reading new entries, I'd prepend the new entry into the singly-linked list via those additional pointers. (If you have to, you can always reverse the list afterwards in a single pass, if you need them in the original order; prepending causes the list to be in reverse order.)


    So, as you can see, the main thing here is to have a robust, reliable function that reads your input correctly, and into an useful data structure. Test it first, so you know it works. You should test it not only for correctness (that it parses your data correctly), but also that it is fast enough for your needs. I/O is surprisingly often the bottleneck!

    Adding the hash table stuff, and manipulating the data read, is pretty simple afterwards.

  11. #11
    C Beginner
    Join Date
    Dec 2011
    Location
    Portugal
    Posts
    187
    Thanks, I'll study everything you said upfront.
    Right now, I'm just reading the file, not into the hash table yet.

    My code is the following :

    Code:
     char buffer[500];
        FILE *fp=NULL;
        fp = fopen ("merdas.txt", "r");
        while(!feof(fp)){
            fgets(buffer,200,fp);
            printf("%s",buffer);
        }
        fclose(fp);
        printf("%s",buffer);
    It's not printing anything though.
    You know what's the error ?

  12. #12
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Well...
    • Check that the file was opened correctly.
    • Do not use feof to control a loop like that; rather, use the return value of fgets.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #13
    C Beginner
    Join Date
    Dec 2011
    Location
    Portugal
    Posts
    187
    Quote Originally Posted by laserlight View Post
    Well...
    • Check that the file was opened correctly.
    • Do not use feof to control a loop like that; rather, use the return value of fgets.
    Followed your advice and I made it like this :

    Code:
    fp = fopen("/Users/machd/Desktop/SistemasOperativos2013/GuardaLinhas/merdas.txt","r");
        if (!fp) {
            printf("sucess");
        }
    Sucess isn't being printed.
    Guess that means the file isn't being opened correctly right ?

  14. #14
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Yes (though I was confused for a moment because I think of a file not opening as "failure", not "success" ). If the file path is correct, check file permissions.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  15. #15
    C Beginner
    Join Date
    Dec 2011
    Location
    Portugal
    Posts
    187
    Quote Originally Posted by laserlight View Post
    Yes (though I was confused for a moment because I think of a file not opening as "failure", not "success" ). If the file path is correct, check file permissions.
    Haha.
    I have, every user has reading permissions

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Reading from a textfile with unknown length?
    By chickenlittle in forum C++ Programming
    Replies: 4
    Last Post: 09-10-2011, 11:59 PM
  2. Reading textfile into array
    By a.mlw.walker in forum C Programming
    Replies: 9
    Last Post: 08-14-2011, 07:39 PM
  3. Replies: 2
    Last Post: 01-29-2011, 12:58 PM
  4. Replies: 9
    Last Post: 08-09-2008, 10:47 AM
  5. Reading a TextFile into a Linked List
    By cisco_leon in forum C Programming
    Replies: 13
    Last Post: 10-30-2006, 07:11 PM