Thread: Store file to char double pointer?

  1. #1
    Registered User
    Join Date
    Nov 2021
    Posts
    5

    Store file to char double pointer?

    Hello!

    I'm fairly new to c programming and how to work with somewhat low level concepts.

    I'm writing a function where I want the user to input a file (let's assume that it already has been inputted), have that file read, and store it to memory, but instead of storing the whole contents into one big string, a char double pointer is being used for each string array split by the newline character.

    main.c
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    typedef struct {
        char *file;
        char **inMem;
        int inMemCount;
    } FileDataToMemory;
    FileDataToMemory fd;
    
    void fileToMem(char *file) {
        FILE *fp;
        fp = fopen(file, "r");
        if (!fp) exit(1);
    
        fd.inMemCount = 1;
        fd.inMem = (char **)malloc(fd.inMemCount * sizeof(char *));
        if (!fd.inMem) exit(1);
        char ch;
        char *stringbuild = (char *)malloc(sizeof (char));
        if (!stringbuild) exit(1);
        while ((ch = fgetc(fp)) != EOF) {
            if (ch != '\n') {
                *(stringbuild + strlen(stringbuild)) = ch;
                stringbuild = (char *)realloc(stringbuild, strlen(stringbuild));
                if (!stringbuild) exit(1);
            } else {
                if (strlen(stringbuild) != 0) {
                    fd.inMemCount++;
                    fd.inMem = (char **)realloc(fd.inMem, fd.inMemCount * sizeof (char*));
                    if (!fd.inMem) exit(1);
                    *(fd.inMem + fd.inMemCount) = (char *)malloc(strlen(stringbuild) * sizeof (char));
                    if (!(fd.inMem + fd.inMemCount)) exit(1);
                    strcpy(*(fd.inMem + fd.inMemCount), stringbuild);
                    printf("%s\n", *(fd.inMem + fd.inMemCount));
                }
                memset(stringbuild, 0, strlen(stringbuild));
                stringbuild = (char *)realloc(stringbuild, sizeof (char));
                if (!stringbuild) exit(1);
            }
        }
        free(stringbuild);
        fclose(fp);
    
        fd.file = file;
    }
    
    int main() {
        char *input = "test.txt";
        fileToMem(input);
    
        printf("File Name: %s - Line Count: %d - Lines:\n", fd.file, fd.inMemCount);
        for (int i = 0; i < fd.inMemCount; i++) {
            printf("%s\n", *(fd.inMem + i));
            free(*(fd.inMem + i));
        }
        free(fd.inMem);
    
        return 0;
    }
    test.txt
    Code:
    THis is
    a test
    file
    Hello, World!
    With all of this, in my mind, the output would be something like this:
    - *(fd.inMem + 0) = THis is
    - *(fd.inMem + 1) = a test
    - *(fd.inMem + 2) = file
    - *(fd.inMem + 3) = Hello, World!

    The basic synopsis for stringbuild and fd.inMem allocating memory and being a dynamic array rather than being a static char array is that there is a possibility where a line can have more data than the fixed length is able to take in so some of that data will be lost. Dynamically allocating and reallocating memory can be the work around that where every time a new character is inputted and it's not a newline character, it will add to stringbuild and allocate more memory for the next character. When we do finally hit a newline character, we allocate memory to fd.inMem to include one more string/char pointer, then allocate the length of stringbuild to *(fd.inMem + the current count) and do a strcpy() with stringbuild to *(fd.inMem). Once all said is done, we clear stringbuild and reset it's allocated memory back to 1, then repeat the whole process again until we hit the end of file.

    This sadly doesn't work and from what I've been able to deduce using the compiler's warning message is that realloc says that the old size is invalid which I'm guessing means I'm doing something wrong when I'm allocating memory, and Valgrind states that I'm losing bytes especially when reallocating stringbuild and fd.inMem.

    May I know what I'm doing wrong and how I can prevent this from happening in the future?
    Last edited by Salem; 11-22-2021 at 08:43 AM. Reason: Removed spurious font tag

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    It looks like wherever you copy/pasted this code from has removed a bunch of critical whitespace from your code.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Nov 2021
    Posts
    5
    Whoops, my bad. I guess VSCodium isn't as reliable. Should be fixed now!

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > *(stringbuild + strlen(stringbuild)) = ch;
    > stringbuild = (char *)realloc(stringbuild, strlen(stringbuild));
    1. You don't need to cast the result of malloc / calloc / realloc in a C program.
    If you're getting errors about casting void*, then you need to use your C compiler, not your C++ compiler.

    2. Neither malloc nor realloc are going to guarantee having a \0 in just the right place for all your str... functions to work properly.
    So at some point, strlen is going to see something which isn't \0, in a place you expected a \0, and you're off in the weeds wondering what valgrind is moaning about.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User
    Join Date
    Nov 2021
    Posts
    5
    I decided to look more into how the data was being allocated. I was able to fix it with the new updated code:

    Code:
    while ((ch = fgetc(fp)) != EOF) {
            if (ch != '\n') {
                *(stringbuild + strlen(stringbuild)) = ch;
                if (!stringbuild) exit(1);
            } else {
                if (strlen(stringbuild) != 0) {
                    *(stringbuild + strlen(stringbuild)) = '\0';
                    fd.inMem = realloc(fd.inMem, fd.inMemCount * sizeof (char*) + 1);
                    if (!fd.inMem) exit(1);
                    *(fd.inMem + fd.inMemCount) = malloc(strlen(stringbuild) * sizeof (char) + 1);
                    if (!(fd.inMem + fd.inMemCount)) exit(1);
                    strcpy(*(fd.inMem + fd.inMemCount), stringbuild);
                    fd.inMemCount++;
                }
                memset(stringbuild, 0, strlen(stringbuild));
                stringbuild = realloc(stringbuild, sizeof (char));
                if (!stringbuild) exit(1);
            }
        }
    Thanks a lot! I felt a bit stupid about the mistake but now I learnt that apparently there's a null character after every string. That's pretty interesting the more I think of it.

    One more question that has me confused about what you said is casting when malloc-ing or realloc-ing. Should I not cast a char pointer or any data type pointer when malloc-ating or realloc-ating? I've seen many people and video/web-tutorials where they do cast it, so does it make any difference if I do or not? Would it be bad practice if I do cast it? Maybe it helps with readability?

    Thanks again! I spent too long trying to fix the malloc/realloc issues but I hope I'm getting the hang of it
    Last edited by Salem; 11-22-2021 at 09:32 PM. Reason: more spurious font tags in code

  6. #6
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    This is a meaningless statement:
    Code:
    *(stringbuild + strlen(stringbuild)) = '\0';
    If it works, then there must already be a '\0' character at that exact position, otherwise strlen wouldn't know where to stop. strlen is basically something like:
    Code:
    size_t strlen(const char *str) {
        const char *p = str;
        while (*p != '\0') ++p;  // loop until you hit a '\0'
        return p - str;          // return how far you had to search
    }
    You shouldn't be using strlen at all. Instead, just keep a len variable to keep track of the length.

    Also, simply pointing the file (filename) pointer to the given input file name is not a good idea since the input may have come from a function's local variable that may cease to exist before you are done with your data structure. It's best to make a copy.

    And there's no reason for fd to be global.

    Here's a rewrite that seems to work, although I didn't test it much.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
     
    typedef struct {
        char  *file;
        char **lines;
        int    lineCount;
    } FileData;
     
    void *xrealloc(void *ptr, size_t size) {
        void *newptr = realloc(ptr, size);
        if (!newptr) { perror("xrealloc"); exit(EXIT_FAILURE); }
        return newptr;
    }
     
    FileData fileToMem(const char *file) {
        FILE *fp = fopen(file, "r");
        if (!fp) { perror(file); exit(EXIT_FAILURE); }
     
        FileData fd;
        fd.file = strcpy(xrealloc(NULL, strlen(file) + 1), file);
        fd.lines = NULL;
        fd.lineCount = 0;
     
        char *line = xrealloc(NULL, 1);
        line[0] = '\0';
        int len = 1;
     
        for (char ch; (ch = fgetc(fp)) != EOF; ) {
            if (ch != '\n') {
                line[len - 1] = ch;
                line = xrealloc(line, ++len);
                line[len - 1] = '\0';
            } else {
                fd.lines = xrealloc(fd.lines, ++fd.lineCount * sizeof(char*));
                fd.lines[fd.lineCount - 1] = xrealloc(NULL, len);
                strcpy(fd.lines[fd.lineCount - 1], line);
                line = xrealloc(line, 1);
                line[0] = '\0';
                len = 1;
            }
        }
     
        free(line);
        fclose(fp);
     
        return fd;
    }
     
    int main() {
        const char *input = "test.txt";
        FileData fd = fileToMem(input);
     
        printf("File Name: %s - Line Count: %d - Lines:\n", fd.file, fd.lineCount);
        for (int i = 0; i < fd.lineCount; i++) {
            printf("%s\n", fd.lines[i]);
            free(fd.lines[i]);
        }
        free(fd.lines);
        free(fd.file);
     
        return 0;
    }
    It's actually insanely inefficient to realloc for every character! I would use fgets to read into a non-dynamic buffer and transfer the string to the lines array. A caveat is that if the string in the buffer doesn't end in a newline character, that means that (unless it's end-of-file) there is more of that line in the file, so you'd need to handle that possibility. I may write up an example and post it too.

    As for casting malloc/calloc/realloc, since they return a void*, you don't need to cast the result in C (although you do in C++). One reason we don't cast it is to catch the error of forgetting to include stdlib.h. In that case the default return value is int, which would cause an error if you don't cast it, but the cast would cover it up.

    It doesn't help with readability since you already know the type that it's being stored to.

    Also it's just another meaningless thing that needs to change if you change the type you're storing to. If you write it like this then you don't even need to change the sizeof expression to change the type:
    Code:
    // Only the initial int needs to be changed to change the type:
    int *values = malloc(numValues * sizeof *values);
    EDIT: It just occurred to me that if the last line of the file happens to lack a newline character then that line will not be stored. To fix that you could add the following after the loop, before the free:
    Code:
        if  (len > 1) {
            fd.lines = xrealloc(fd.lines, ++fd.lineCount * sizeof(char*));
            fd.lines[fd.lineCount - 1] = xrealloc(NULL, len);
            strcpy(fd.lines[fd.lineCount - 1], line);
        }
    Last edited by john.c; 11-22-2021 at 04:30 PM.
    A little inaccuracy saves tons of explanation. - H.H. Munro

  7. #7
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    Here's an example of using fgets. It's still inefficient to allocate the line pointers one-at-a-time, though. To improve that you would store not just the number of lines currently stored but a separate number of line pointers currently allocated, increasing that (usually by doubling or perhaps increasing by a factor of 3/2) when you run out of allocated pointers. You could then realloc it down to the actual number of used line pointers before returning.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdbool.h>
     
    typedef struct {
        char  *filename;
        char **lines;
        int    numLines;
    } Data;
     
    void *xrealloc(void *ptr, size_t size) {
        void *newptr = realloc(ptr, size);
        if (!newptr) { perror("xrealloc"); exit(EXIT_FAILURE); }
        return newptr;
    }
     
    Data initData(const char *filename) {
        Data data = {NULL, NULL, 0};
        if (filename) {
            data.filename = xrealloc(NULL, strlen(filename) + 1);
            strcpy(data.filename, filename);
        }
        return data;
    }
     
    Data readFile(const char *filename) {
        Data data = initData(filename);
        FILE *fin = fopen(filename, "r");
        if (!fin) { perror(filename); exit(EXIT_FAILURE); }
        char line[256];
        bool more = false;
        while (fgets(line, sizeof line, fin) != NULL) {
            size_t len = strlen(line);
            bool still_more = false;
            if (len && line[len - 1] == '\n')
                line[--len] = '\0';
            else
                still_more = true;
            if (!more) {
                data.lines = xrealloc(data.lines, ++data.numLines * sizeof(char*));
                const int n = data.numLines - 1;
                data.lines[n] = xrealloc(NULL, len + 1);
                strcpy(data.lines[n], line);
            }
            else {
                const int n = data.numLines - 1;
                const size_t oldlen = strlen(data.lines[n]);
                data.lines[n] = xrealloc(data.lines[n], oldlen + len + 1);
                strcpy(&data.lines[n][oldlen], line);
            }
            more = still_more;
        }
        fclose(fin);
        return data;
    }
     
    int main() {
        const char *input = "test.txt";
        Data data = readFile(input);
     
        printf("File Name: %s - Line Count: %d - Lines:\n", data.filename, data.numLines);
        for (int i = 0; i < data.numLines; i++) {
            printf("%s\n", data.lines[i]);
            free(data.lines[i]);
            data.lines[i] = NULL;
        }
     
        free(data.filename);
        free(data.lines);
        data = initData(NULL);
     
        return 0;
    }
    A little inaccuracy saves tons of explanation. - H.H. Munro

  8. #8
    Registered User
    Join Date
    Apr 2021
    Posts
    138
    Two things:


    1. john.c has shown you a great technique for simplifying almost all of your C code. If you are coding on Linux, you can make use of a library called libexplain. The same library might be available on Windows, but you may have to download and install it, first. libexplain is a library that defines functions just like the xrealloc() function john.c provided.

      If you can't make use of libexplain, then I recommend that you write the functions yourself. Create a "libx.h" header file and one or more source files, and write functions like xmalloc, xrealloc, xstrdup, or better still write explain_malloc_or_die, explain_realloc_or_die, etc. and then provide short x-names using #define. Your code will "flow" better if you don't have to keep adding
      Code:
      if (...) { exit(1); }
      everywhere.
    2. You can use the ftell function in conjunction with the fseek function to determine the length of a file. fseek takes three parameters: a file, an offset, and a symbolic location that can be one of "the beginning", "the current position", or "the end." So you can call fseek and say "move the file position in this file to this offset places relative to this location."

      A not-totally-obvious fact is that this means you can say "move to zero bytes from the end of file". Then you can say, "ftell me what my position is" and get an answer that is always relative to the start of the file -- this is effectively the length of the file (in bytes - fseek and ftell don't know about character encodings).

      Obviously, if you know the length of the file, you know how many bytes to malloc to store the file in memory. You can then scan all those bytes looking for newlines and set the line pointers. (Change the newlines to zeros, too!)

  9. #9
    Registered User
    Join Date
    Nov 2021
    Posts
    5
    Sorry for the lack thereof a reply. I later realized how broken my code was and how much potential leakage it could cause. I have been studying your code for the past few days, making sense of it in my head, and I think I fully understand it. I definitely should've prioritized fgets() over fgetc() but I was not confident using it, and I've learnt a lot of new things thanks to you.

    I also really like how you approached all of the problems as any attempts I've at optimizing has failed, probably because I lack creativity to do so unless it's wasting lines/using up CPU load time when allocating.

    Thanks a lot!

  10. #10
    Registered User
    Join Date
    Nov 2021
    Posts
    5
    This is great information. Since I'm on Linux, thankfully I could use and look more into xrealloc, xmalloc, etc. I'll definitely look into libexplain!
    Last edited by thisisausername; 11-24-2021 at 02:33 PM.

  11. #11
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    aghast has suggested another possibility, to read the entire file into a single allocated buffer, count the newlines, allocate the line pointers, point them to the beginnings of the lines and replace the newlines with '\0' to terminate the individual lines.

    The file must be opened in binary mode for ftell to work properly. This means that Windows line endings will be left as is, which is a '\r' followed by a '\n'. The following code is written to work on linux (which just uses '\n') and windows (or even the old Mac endings which just used '\r').
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
     
    typedef struct {
        char  *filename;
        char  *text;
        char **lines;
        int    lineCount;
    } File;
     
    void *xmalloc(size_t size) {
        void *ptr = malloc(size);
        if (!ptr) { perror("xmalloc"); exit(EXIT_FAILURE); }
        return ptr;
    }
     
    File readFile(const char *filename) {
        // Must open in binary mode for ftell to work properly.
        FILE *fin = fopen(filename, "rb");
        if (!fin) {
            perror(filename);
            exit(EXIT_FAILURE);
        }
     
        File file = {NULL, NULL, NULL, 0};
        
        strcpy(file.filename = xmalloc(strlen(filename) + 1), filename);
     
        fseek(fin, 0, SEEK_END);
        long size = ftell(fin);
        rewind(fin);
     
        if (size > 0) {
            file.text = xmalloc(size + 1);
            fread(file.text, 1, size, fin);
            file.text[size] = '\0';
     
            // Count lines.
            char *p = file.text;
            for ( ; *p; ++p) {
                if (*p == '\n' || *p == '\r') ++file.lineCount;
                if (*p == '\r' && p[1] == '\n') ++p;
            }
            // last line may not end in a newline
            if (p[-1] != '\n' && p[-1] != '\r') ++file.lineCount;
     
            // Allocate and set line pointers. 
            file.lines = xmalloc(file.lineCount * sizeof *file.lines);
            *file.lines = file.text;
            int i = 1;
            for (p = file.text; *p; ++p)
                if (*p == '\n' || *p == '\r') {
                    if (*p == '\r' && p[1] == '\n') *p++ = '\0';
                    else                            *p   = '\0';
                    if (i < file.lineCount) file.lines[i++] = p + 1;
                }
        }
     
        fclose(fin);
        return file;
    }
     
    int main(int argc, char **argv) {
        if (argc != 2) {
            fprintf(stderr, "Usage: readfile FILENAME\n");
            exit(EXIT_FAILURE);
        }
     
        File file = readFile(argv[1]);
     
        printf("Filename: %s    Lines: %d\n", file.filename, file.lineCount); 
        for (int i = 0; i < file.lineCount; ++i)
            printf("%s\n", file.lines[i]);
     
        free(file.filename);
        free(file.lines);
        free(file.text);
     
        return 0;
    }
    A little inaccuracy saves tons of explanation. - H.H. Munro

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 3
    Last Post: 10-30-2009, 04:41 PM
  2. Store only non-double elements
    By simone.marras in forum C Programming
    Replies: 8
    Last Post: 10-29-2008, 09:29 AM
  3. Help - opening a file with a char pointer as the name
    By Theoren in forum C Programming
    Replies: 14
    Last Post: 07-14-2008, 02:01 AM
  4. Segment fault on double whammy char pointer
    By Kleid-0 in forum C Programming
    Replies: 18
    Last Post: 12-20-2004, 05:51 PM
  5. Replies: 7
    Last Post: 09-05-2002, 08:01 AM

Tags for this Thread