Thread: Combining files together

  1. #1
    Registered User
    Join Date
    Mar 2011
    Posts
    216

    Reading 2 files, writing out 1

    Hi,

    The objective of the following code is to read in 2 files, and write them both out to a single .bin file. Eventually another progam would read the .bin file, and re-write the files so they are back to their original state. Here is the code for packing the files into a single .bin file.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    long int GetFileSize(long int fileSize, FILE *fp)
    {
        fseek(fp, 0, SEEK_END);
        fileSize = ftell(fp);
        rewind(fp);
        return fileSize;
    }
    
    int main()
    {
        FILE* fp;
        FILE* out;
        char* buffer;
        long int fileSize;
        size_t result;
        
        // Open file
        fp = fopen("Files.rar", "rb");
        if(fp == NULL) printf("Error reading file\n");
        
        // obtain file size
        fileSize = GetFileSize(fileSize, fp);
        printf("%i\n", fileSize);
        
        // allocate memory for the whole file
        buffer = (char*) malloc (sizeof(char)*fileSize);
        if(buffer == NULL) printf("Memory allocation error\n");
        
        // copy file into the buffer
        result = fread(buffer, 1, fileSize, fp);
        if(result != fileSize) printf("Could not load into memory\n");
        
        // terminate
        fclose(fp);
        
        /* the whole file is now loaded in the memory buffer */
        
        // write data to outfile
        out = fopen("packed.bin", "wb");
        
        fwrite(buffer, sizeof(buffer), 1, fp);
        
        free(buffer);
        fclose(out);
        
        /* Read data from next file */
        
        // Open file
        fp = fopen("file-2.zip", "rb");
        if(fp == NULL) printf("Error reading file\n");
        
        // obtain file size
        fileSize = GetFileSize(fileSize, fp);
        printf("%i\n", fileSize);
        
        // allocate memory for the whole file
        buffer = (char*) malloc (sizeof(char)*fileSize);
        if(buffer == NULL) printf("Memory allocation error\n");
        
        // copy file into the buffer
        result = fread(buffer, 1, fileSize, fp);
        if(result != fileSize) printf("Could not load into memory\n");
        
        /* the whole file is now loaded in the memory buffer */
        
        // terminate
        fclose(fp);
        
        // write data to outfile
        out = fopen("packed.bin", "ab");
        fwrite(buffer, sizeof(buffer), 1, fp);
        
        free(buffer);
        fclose(out);
        
        sleep(3000);
        return 0;
    }
    And here is the output from the .bin file generated.

    Code:
    Rar!PK
    I don't see what I'm doing wrong. Help is appreciated, and I hope it's easy to read. I have tried to keep it neat and commented it.

    Thanks
    Last edited by binks; 04-25-2012 at 03:39 PM.

  2. #2
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Code:
    fwrite(buffer, sizeof(buffer), 1, fp);
    A few problems:

    • You don't check this for success.
    • sizeof(buffer) is sizeof(char *), not the total bytes you read. You should be writing fileSize bytes instead.
    • Technically, you have the size and count parameters backwards. You are writing fileSize elements, each one has a size of 1*.
    • And generally, if you can't open a file, can't read/write all the bytes, etc, you need to exit the program immediately.

    Code:
    if (fwrite(buffer, 1, fileSize, fp) != fileSize)
        print error
        exit
    * It's possible fwrite only writes some of the data. Because of the way fwrite works, and what it returns, if you only try to write 1 element, and it doesn't write the whole thing, you can't tell how much was actually written. In production code, what you did would not work, you would need to put fwrite in a loop with the sizeof and count elements in the right place to make sure it all gets written properly and you can detect and recover from errors as best as possible.
    Last edited by anduril462; 04-25-2012 at 04:15 PM.

  3. #3
    Registered User
    Join Date
    Mar 2011
    Posts
    216
    Well, that helped a lot. It works now (or so it seems!). And what is the most efficient way to read the bin file, and re-write the files so they are in original state? Would I just have to have the filename and filesize, then I read the data, and write it to the filename (and so forth for additional files)?

  4. #4
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Yeah, I would store a "table" of file names and sizes, in the order they will exist in your packed.bin. Since everything else is in binary, you will have to do this with fread, so you may want a "file name length" stored as well, to make it easy on you. That is, if you want to pack the files "/home/user/foo.rar" (12345 bytes) and "/home/user/folder/bar.zip" (6789 bytes), the header would look like:
    Code:
    <12345><18>/home/user/foo.rar\0
    <6789><25>/home/user/folder/bar.zip\0
    Note, I put the numbers in < > to represent the fact that they are stored in binary format, i.e. using a call to fwrite/fread (versus text, using fprintf/fscanf). The first number is the size of the file, the second is the length of the file name, not including the trailing null. It would look something like this:
    Code:
    char *fileName = "/home/user/foo.rar";
    int fileSize = GetFileSize(...);
    int fileNameLen = strlen(fileName);
    fwrite(&fileSize, sizeof(int), 1, fp)
    fwrite(&fileNameLen, sizeof(int), 1, fp)
    fwrite(fileName, sizeof(char), fileNameLen, fp)
    // similar for writing other file names and reading them back
    Obviously you should do all your error checking, etc, to make this robust.

  5. #5
    Registered User
    Join Date
    Mar 2011
    Posts
    216
    Yes, of course that makes sense. But how would the program not count < and > as part of the integers? What's the reason for writing the string length of the filename?

    Also, how would you limit the program to say writing 5mb a file, then spliting into another file, and so forth? Would you calculate all the filesizes, and determine when to split to the second file?

  6. #6
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    I would put the number of files first, like this:
    Code:
    <2>
    <12345><18>/home/user/foo.rar
    <6789><25>/home/user/folder/bar.zip
    <bytes of 1st file>
    <bytes of 2nd file>
    The < and > are NOT PART OF THE FILE AT ALL (and neither are the newlines indicated above). It's just a notation used above to indicate that the values are stored in binary (do you know what that means?).
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  7. #7
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by binks View Post
    Yes, of course that makes sense. But how would the program not count < and > as part of the integers?
    The < and > aren't actually in the file. Neither are the ASCII digits '1', '2', '3', '4', '5'. It is binary data. As an example, make a quick program that opens a file in binary mode, sets an int variable to some random value like 42, writes that int (using fwrite) and closes the file. Then open that file in a text editor, and you will see what I mean.
    What's the reason for writing the string length of the filename?
    To know how long the filename is. This is not a text file, it's binary, so reading big chunks of data out and using strlen can be problematic. There is not actually a newline in there (I did that to make it easy to read), so you can't just "read until the end of the line". Binary files don't really have lines, and even if there were a '\n' in there, it's not significant because it's binary. Lots of data in your file could have a byte of information that has the same value as a '\n', but is not meant to be a new line. '\n' has the ASCII value 10, so an integer with a value of 10 would look like some null characters and a new line, but it isn't.

    The alternative is to write the string in with the null terminator, then read one char at a time until you get a null, to read in the file name, but that is a pain in the butt if you need to malloc space for the string. You would have to read to the null to get the length, malloc enough space for the string +1 for the null, rewind that many bytes in the file, then read the whole thing again into your malloc'ed buffer. This way, you read an int, malloc that many bytes +1 for the null, and fread that many bytes into the buffer. No rewinding. It's simpler and faster, and only takes up a few more bytes in your file.

    Also, how would you limit the program to say writing 5mb a file, then spliting into another file, and so forth? Would you calculate all the filesizes, and determine when to split to the second file?
    One thing at a time turbo. Get this working flawlessly with one big file, then worry about splitting it up. Then, you need to think about it. How do you track how many bytes have been written to a file so far? Once you hit your 5mb limit, how do you close that file and open the next? Plan it all out on paper, and implement in small steps, compiling often and fixing all errors each time.

  8. #8
    Registered User
    Join Date
    Mar 2011
    Posts
    216
    The < and > aren't actually in the file. Neither are the ASCII digits '1', '2', '3', '4', '5'. It is binary data. As an example, make a quick program that opens a file in binary mode, sets an int variable to some random value like 42, writes that int (using fwrite) and closes the file. Then open that file in a text editor, and you will see what I mean.
    Sorry, I didn't catch that. But yes, I have seen binary it's close to gibberish. I thought you had manually added 'dividers' so that it was organized.

    And for the table, would a structure work like so?

    Code:
    int numfiles = 2;
    
    struct file
    {
         char filename[128];
         int filenameLength;
         int fileSize;
    }file[numfiles - 1]; // so it would be file[0] and file[1]
    Last edited by binks; 04-25-2012 at 05:53 PM.

  9. #9
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    You would just declare the array to be file[numfiles]. If you do -1, it's like saying file[1], which is an array with one element, namely file[0]. Arrays in C always go from 0 to size-1, so if you want elements 0 and 1, you need file[2] or file[numfiles].

    You have a few options if you're defining a struct for the file table. You can always just write the full struct. If that's the case, you don't need the filenameLength field. You will always write all 128 bytes of the file name, so the null character, will be written in there, along with any garbage after it should your filename be less than 127 characters. It may waste some space, but if that's not an issue, go ahead. You would write it like so:
    Code:
    fwrite(&file[i], sizeof(file[i]), 1, fp);  // write the file at index i to fp
    If you don't write the full struct, and you don't want to write all 128 bytes of the filename field, then that struct is fine, then you have to write each piece of the struct separately:
    Code:
    fwrite(&file[i].fileSize, sizeof(file[i].fileSize), 1, fp);  // first write the file size
    fwrite(&file[i].filenameLength, sizeof(file[i].filenameLength), 1, fp);  // then write the file name length
    fwrite(file[i].filename, 1, file[i].filenameLength, fp);  // last write the file name, with no extra data after it
    Of course, in either case, you would actually check the return value of fwrite to make sure it worked.

  10. #10
    Registered User
    Join Date
    Mar 2011
    Posts
    216
    Like so?

    Code:
    void GetFileInfo(char *filename, int fileNameLen, int fileSize)
    {
        FILE* fp; 
        fp = fopen(filename, "rb");
        
        fileNameLen = strlen(filename);
        fileSize = GetFileSize(fp);
        
        fclose(fp);
    }
    
    int main()
    {
        FILE* fp;
        FILE* out;
        char* buffer;
        long int fileSize;
        size_t result;
        
        int i;
        int numfiles = 2;
        
        struct file 
        {      
               char *fileName;      
               int fileNameLen;      
               int fileSize; 
        }file[numfiles];
        
        /* Get information regarding files to pack */
        
        file[0].fileName = "Files.rar";
        file[1].fileName = "c-steven.zip";
        
        for(i = 0; i < numfiles; i++)
              GetFileInfo(file[i].fileName, file[i].fileNameLen, file[i].fileSize);
              
        // Now write header information to the file
        
        out = fopen("packed.bin", "wb");
        
        for(i = 0; i < numfiles; i++)
        {
              
              if( fwrite(file[i].fileName, sizeof(char), file[i].fileNameLen, fp) != file[i].fileSize)
              {
                    printf("Error writing filename header %i\n", i);
                    sleep(3000);
                    exit(0);
              }
              
              if( fwrite(&file[i].fileNameLen, sizeof(int), 1, fp) != file[i].fileNameLen)
              {
                    printf("Error writing filename length header %i\n", i);
                    sleep(3000);
                    exit(0);
              }
              
              if( fwrite(&file[i].fileSize, sizeof(int), 1, fp) != file[i].fileSize)
              {
                    printf("Error writing file size header %i\n", i);
                    sleep(3000);
                    exit(0);
              }
              
              printf("Finished writing file header %i\n", i);
              
        }
        
        fclose(out);
    Is using structs efficient for a project like this?
    Last edited by binks; 04-25-2012 at 06:36 PM.

  11. #11
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Define "efficient". Actually, don't. What you are doing is called "preoptimization", which being concerned with making your code as fast or small as possible before you know that it is too slow/big. You should always focus on making your code clear, easy to follow, easy to debug and maintain, and only optimize later, when you know you really, really, really need it.

    As for problems with your code:

    1. For GetFileInfo, fileNameLen and fileSize need to be int *, and you need to pass their addresses to the function when you call: &file[i].fileNameLen.
    2. You need to make sure all your calls to fopen worked. If any of them fail, print an error and exit your program. That pretty much goes for all file IO, for every program you ever write, for the rest of your life. If you can't access critical data, you can not carry on with your program.
    3. Read the documentation for fwrite to see what it returns on success. You are not checking correctly when writing fileNameLen and fileSize.
    4. You need to write fileSize and fileNameLen before you write fileName. Otherwise, you have no idea where fileName ends and at what offset fileSize and fileNameLen are stored.
    5. What's up with all those sleep calls? If your terminal is closing before you can read anything, find a better way to run your program. Read this FAQ article: FAQ > Stop my Windows Console from disappearing everytime I run my program? - Cprogramming.com.


    You should be taking time to think through and plan your code before you write it (I'm not saying you didn't give any thought). Then you should read the documentation for the functions you're using and make sure you're using it correctly, checking return codes correctly, and that you cover all possible errors and strange cases. Then, write in small chunks (say, 5-10 lines), and test as you go. It's easy to test the success scenario and assume it works, but you need to test what happens if a file is missing, if you enter bad input, etc.

  12. #12
    Registered User
    Join Date
    Mar 2011
    Posts
    216
    This is what i changed in GetFileInfo from what I understand.

    Code:
    void GetFileInfo(char *filename, int *fileNameLen, int *fileSize)
    {
        FILE* fp; 
        fp = fopen(filename, "rb");
        if(fp == NULL)
        {
              printf("No file handle\n");
              sleep(3000);
              exit(0);
        }
        
        fileNameLen = (int*)strlen(filename);
        fileSize = (int*)GetFileSize(fp);
        
        fclose(fp);
    }
    fwrite returns a size_t, so for checking the outcome, you could do either

    Code:
    if( fwrite(&file[i].fileSize, sizeof(file[i].fileSize), 1, out) != file[i].fileSize)
    
    /* or */
    
    result = fwrite(&file[i].fileSize, sizeof(file[i].fileSize);
    if(result != file[i].fileSize)
    {
         printf("Error writing\n");
         exit(0);
    }
    You said this was incorrect

    Code:
        for(i = 0; i < numfiles; i++)
        {
              printf("Entered loop %i\n", i);
              // write filesize
              if( fwrite(&file[i].fileSize, sizeof(file[i].fileSize), 1, out) != file[i].fileSize)
              {
                    printf("Error writing file size header %i\n", i);
                    sleep(3000);
                    exit(0);
              }
              
              printf("Wrote file size header %i\n", i);
              // write filename length
              if( fwrite(&file[i].fileNameLen, sizeof(file[i].fileNameLen), 1, out) != file[i].fileNameLen)
              {
                    printf("Error writing filename length header %i\n", i);
                    sleep(3000);
                    exit(0);
              }
              
              printf("Wrote filename length header %i\n", i);
              // write filename 
              if( fwrite(file[i].fileName, 1, file[i].fileNameLen, out) != file[i].fileNameLen)
              {
                    printf("Error writing filename header %i\n", i);
                    sleep(3000);
                    exit(0);
              }
              printf("Wrote filename header %i\n", i);
              
              printf("Finished writing file header %i\n", i);
              
        }
    Well if your writing the fileSize, shouldn't you be checking to make sure its return equils fileSize. And if you writing fileNameLen, shouldn't you check to make sure the return is fileNameLen?
    Last edited by binks; 04-25-2012 at 07:49 PM.

  13. #13
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Regarding GetFileInfo, you're close, but since you pass in pointers to (you did change the call to GetFileInfo, right?), you need to dereference them when you want to refer to the integer they point to:
    Code:
    *fileNameLen = strlen(fileName);
    // similar for fileSize
    You also need to check and make sure fopen worked when you open packed.bin, i.e. check if out == NULL.

    Regarding fwrite:
    size_t is just a type suitable for describing the size of something, it's generally some sort of unsigned integer/long. What that size represents depends on the context. From the documentation of fread/fwrite:
    Quote Originally Posted by man fwrite
    size_t fwrite(const void *ptr, size_t size, size_t nmemb,
    FILE *stream);
    ...
    RETURN VALUE
    fread() and fwrite() return the number of items successfully read or written (i.e., not
    the number of characters). If an error occurs, or the end-of-file is reached, the return
    value is a short item count (or zero).
    So you need to check the return value against how many things you asked it to write, i.e. the nmemb parameter. That is 1 in the case of fileSize and fileNameLen.

  14. #14
    Registered User
    Join Date
    Mar 2011
    Posts
    216
    Alright, so by making changes according to what you said, this seems to work well now. (it says it wrote everything fine)

    Here is the code:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    long int GetFileSize(FILE *fp)
    {
        long int fileSize; 
         
        fseek(fp, 0, SEEK_END);
        fileSize = ftell(fp);
        rewind(fp);
        return fileSize;
    }
    
    void GetFileInfo(char *filename, int *fileNameLen, int *fileSize)
    {
        FILE* fp; 
        fp = fopen(filename, "rb");
        if(fp == NULL)
        {
              printf("No file handle\n");
              sleep(3000);
              exit(0);
        }
        
        *fileNameLen = strlen(filename);
        *fileSize = GetFileSize(fp);
        
        fclose(fp);
    }
    
    int main()
    {
        FILE* fp;
        FILE* out;
        char* buffer;
        long int fileSize;
        size_t result;
        
        int i;
        int numfiles = 2;
        
        struct file 
        {      
               char *fileName;      
               int fileNameLen;      
               int fileSize; 
        }file[numfiles];
        
        /* Get information regarding files to pack */
        
        file[0].fileName = "Files.rar";
        file[1].fileName = "files2.zip";
        
        printf("Set filenames\n");
        
        for(i = 0; i < numfiles; i++)
              GetFileInfo(file[i].fileName, &file[i].fileNameLen, &file[i].fileSize);
              
        printf("Gathered required information\n");  
            
        // Now write header information to the file
        
        out = fopen("packed.bin", "wb");
        if(out == NULL)
        {
              printf("No file handle\n");
              sleep(3000);
              exit(0);
        }
        
        printf("Open bin file\n");
        
        for(i = 0; i < numfiles; i++)
        {
              printf("Entered loop %i\n", i);
              // write filesize
              if( fwrite(&file[i].fileSize, sizeof(file[i].fileSize), 1, out) != 1)
              {
                    printf("Error writing file size header %i\n", i);
                    sleep(3000);
                    exit(0);
              }
              
              printf("Wrote file size header %i\n", i);
              // write filename length
              if( fwrite(&file[i].fileNameLen, sizeof(file[i].fileNameLen), 1, out) != 1)
              {
                    printf("Error writing filename length header %i\n", i);
                    sleep(3000);
                    exit(0);
              }
              
              printf("Wrote filename length header %i\n", i);
              // write filename 
              if( fwrite(file[i].fileName, 1, file[i].fileNameLen, out) != file[i].fileNameLen)
              {
                    printf("Error writing filename header %i\n", i);
                    sleep(3000);
                    exit(0);
              }
              printf("Wrote filename header %i\n", i);
              
              printf("Finished writing file header %i\n", i);
              
        }
        
        fclose(out);
        
        // Open file
        fp = fopen("Files.rar", "rb");
        
        if(fp == NULL) 
        {
              printf("Error reading file\n");
              sleep(3000);
              exit(0);
        }
        
        // obtain file size
        fileSize = GetFileSize(fp);
        printf("%i\n", fileSize);
        
        // allocate memory for the whole file
        buffer = (char*) malloc (sizeof(char)*fileSize);
        if(buffer == NULL)  
        {
              printf("Memory allocation error\n");
              sleep(3000);
              exit(0);
        }
        
        // copy file into the buffer
        result = fread(buffer, 1, fileSize, fp);
        if(result != fileSize)  
        {
              printf("Could not load into memory\n");
              sleep(3000);
              exit(0);
        }
        
        // terminate
        fclose(fp);
        
        /* the whole file is now loaded in the memory buffer */
        
        // write data to outfile
        out = fopen("packed.bin", "ab");
        if(out == NULL)
        {
              printf("No file handle\n");
              sleep(3000);
              exit(0);
        }
        
        if( fwrite(buffer, 1, fileSize, fp) != fileSize)
        {
              printf("Error writing file 1");
              sleep(3000);
              exit(0);
        }
        
        free(buffer);
        fclose(out);
        
        /* Read data from next file */
        
        // Open file
        fp = fopen("files2.zip", "rb");
        if(fp == NULL)  
        {
              printf("Error reading file\n");
              sleep(3000);
              exit(0);
        }
        
        // obtain file size
        fileSize = GetFileSize(fp);
        printf("%i\n", fileSize);
        
        // allocate memory for the whole file
        buffer = (char*) malloc (sizeof(char)*fileSize);
        if(buffer == NULL)  
        {
              printf("Memory allocation error\n");
              sleep(3000);
              exit(0);
        }
        
        // copy file into the buffer
        result = fread(buffer, 1, fileSize, fp);
        if(result != fileSize)  
        {
              printf("Could not load into memory\n");
              sleep(3000);
              exit(0);
        }
        
        /* the whole file is now loaded in the memory buffer */
        
        // terminate
        fclose(fp);
        
        // write data to outfile
        out = fopen("packed.bin", "ab");
        if( fwrite(buffer, 1, fileSize, fp) != fileSize)
        {
              printf("Error writing file 2");
              sleep(3000);
              exit(0);
        }
        
        free(buffer);
        fclose(out);
        
        sleep(3000);
        return 0;
    }
    Now for reading in the data to re-write the files, or spliting into volumes, where to go from here?
    Last edited by binks; 04-26-2012 at 03:03 PM.

  15. #15
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Your struct should be declared outside of any of your functions, otherwise only main knows what that struct looks like, and you can't pass such structs to other functions. Usually struct definitions go just below all the #include statements, above all your functions and prototypes.

    Not to be rude, but you are seriously lacking in the design portion of this, as though you did all the coding before you planned out how this will work. Your GetFileSize and GetFileInfo functions are fine, and a decent attempt at splitting up your code, but it's not enough. You need more functions, currently you have a lot of duplicated code. Good organization and program structure is critical to being able to extend your program to do other things, like split up volumes. Frankly, I would restart this from scratch. You'll be able to reuse much of what you already wrote, it just wont all be in a main function.

    You may need to reconsider your header design. Will every volume have the entire header stored in it or just the portion relevant to that volume? How will a volume know whether it's the first, second, third volume, etc? How will your program know what parts of which files are in what volumes?

    You've figured out the basic file handling stuff now, and have that down pretty solid. I think the next step for you is to plan this out on paper, and go through all the use cases you can think of:

    • storing one file with no max package size (single volume)
    • storing several files with no max package size (single volume)
    • storing one file smaller than the max package size
    • storing several files smaller that combined are smaller than the max package size
    • storing one file larger than the max package size (multiple volumes)
    • extracting a file from the beginning/middle/end of your archive
    • extracting a file that spans multiple volumes
    • etc



    Make sure your design covers all possible cases. Walk through the steps by hand. Then begin coding in small chunks (say 10 lines), compile, fix all errors, then write 10 more lines, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. New to C : combining lines in text files
    By play_fetch in forum C Programming
    Replies: 55
    Last Post: 03-11-2011, 04:01 PM
  2. GUIs and combining files
    By sharrakor in forum C Programming
    Replies: 2
    Last Post: 03-22-2009, 07:00 AM
  3. Combining files
    By TheDan in forum C++ Programming
    Replies: 5
    Last Post: 04-07-2006, 07:18 AM
  4. Combining multiple wav files into one
    By eam in forum Tech Board
    Replies: 3
    Last Post: 01-17-2005, 11:08 AM
  5. combining C and C++
    By volk in forum C Programming
    Replies: 7
    Last Post: 03-18-2003, 08:41 PM