Thread: appending binary files

  1. #1
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242

    appending binary files

    Hi all.

    I am trying to write a program that can combine video files. Can someone please explain why the number of characters written is not equal to the number of characters read? I am getting the "File combine error" message.

    Thanks in advance.

    Code:
    // joinvideofiles.c: combines video files by appending them
    // 2 functions: openfile(), combine()
    
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    #define FILENAMELENGTH 40
    #define BUFFERLENGTH 4096
    
    FILE *fp1, *fp2;
    
    int openfile(const char *file1, const char *file2);
    void combine(const char *file1, const char *file2);
    
    int main(void)
    {
        int x; // return value of openfile()
        char file1[FILENAMELENGTH], file2[FILENAMELENGTH];
    
        do
        {
            printf("Enter file 1: ");
            fgets(file1, FILENAMELENGTH, stdin);
            printf("Enter file 2: ");
            fgets(file2, FILENAMELENGTH, stdin);
            file1[strlen(file1) - 1] = '\0'; // known filenames, no overflow problems
            file2[strlen(file2) - 1] = '\0'; // known filenames, no overflow problems
        } while((x = openfile(file1, file2)) == 1);
    
        combine(file1, file2);
    
        return 0;
    }
    
    int openfile(const char *file1, const char *file2)
    {
        if(((fp1 = fopen(file1, "ab")) == NULL) || ((fp2 = fopen(file2, "rb")) == NULL))
        {
            fprintf(stderr, "Error opening file.\n");
            return 1;
        }
    
        return 0;
    }
    
    void combine(const char *file1, const char *file2)
    {
        char buffer[BUFFERLENGTH];
        size_t x; // number of characters read by fread()
    
        while((x = fread(buffer, 1, BUFFERLENGTH, fp2)) > 0)  // read file2 into buffer
        {
            if((fwrite(buffer, 1, BUFFERLENGTH, fp1)) != x)  // write buffer to file 1
            {
                fprintf(stderr, "File combine error.\n");
                exit(EXIT_FAILURE);
            }
        }
    
        fclose(fp1);
        fclose(fp2);
    }
    IDE: Code::Blocks | Compiler Suite for Windows: TDM-GCC (MingW, gdb)

  2. #2
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Your code will always report a "file combine error" if the length of the input file is not an exact multiple of BUFFERLENGTH.

    You need to check the return values of fread() and fwrite() more completely. Use ferror() and feof() to do that. fread() can return a value that is both non-zero and not equal to the count (BUFFERLENGTH in your code) if it encounters end of file.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  3. #3
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    You have the order mixed up in fread/fwrite, you are reading BUFFERLENGTH items of size 1. And if you assume that fread decided to not read all BUFFERLENGTH items, you still attempt to write BUFFERLENGTH items instead of x. Just some ideas.

  4. #4
    Registered User
    Join Date
    Mar 2010
    Posts
    583
    What grumpy said.

    If there are only 1000 characters left in the file, fread will return 1000.
    fwrite will write however much you ask it to - you fwrite the whole 4096 buffer, which is not the amount you just read.

    ferror and feof are essential!

    Couple of general thoughts:
    If file1 and file2 are the same file and the file exists, you could run into trouble. Data written using fwrite could be written out to the file at any time, so you would end up reading it again.

    Also, are you sure you really want to directly append to the first file? If something goes wrong in combine(), the first file will likely be corrupted with partial data from the second. If it were me I'd output the first and second to a new file. I guess with big files this might be painful, but less painful than corrupted videos...

    Unrelated and picky:

    Code:
    if(((fp1 = fopen(file1, "ab")) == NULL) || ((fp2 = fopen(file2, "rb")) == NULL))
    If file1 and file2 are the same and point to files that don't exist, the first fopen will create the file and the second will reopen the same empty file.



    Quote Originally Posted by Subsonics View Post
    You have the order mixed up in fread/fwrite, you are reading BUFFERLENGTH items of size 1.
    Hmm? Are you saying that he should be reading 1 element of size BUFFERSIZE? I don't agree - I think reading BUFFERLENGTH items of size 1 is the right thing to do. Reading 1 item of length BUFFERLENGTH would mean the fread call would return 0 when there were less than BUFFERLENGTH chars left, which isn't what the OP wants.

  5. #5
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    Quote Originally Posted by Subsonics View Post
    You have the order mixed up in fread/fwrite, you are reading BUFFERLENGTH items of size 1. And if you assume that fread decided to not read all BUFFERLENGTH items, you still attempt to write BUFFERLENGTH items instead of x. Just some ideas.
    It does produce working code. 2 ways, although I suppose the second way would be more correct, since it is making sure that whatever number of characters is read is also written.

    Code:
    void combine(const char *file1, const char *file2)
    {
        char buffer[BUFFERLENGTH];
        size_t x; // number of characters read by fread()
    
        while((x = fread(buffer, BUFFERLENGTH, 1, fp2)) > 0)  // read file2 into buffer
        {
            if((fwrite(buffer, BUFFERLENGTH, 1, fp1)) != x)  // write buffer to file1
            {
                fprintf(stderr, "File combine error.\n");
                exit(EXIT_FAILURE);
            }
        }
    
        puts("Files successfully combined.");
    
        fclose(fp1);
        fclose(fp2);
    }
    Code:
    void combine(const char *file1, const char *file2)
    {
        char buffer[BUFFERLENGTH];
        size_t x; // number of characters read by fread()
    
        while((x = fread(buffer, 1, BUFFERLENGTH, fp2)) > 0)  // read file2 into buffer
        {
            if((fwrite(buffer, 1, x, fp1)) != x)  // write buffer to file1
            {
                fprintf(stderr, "File combine error.\n");
                exit(EXIT_FAILURE);
            }
        }
    
        puts("Files successfully combined.");
    
        fclose(fp1);
        fclose(fp2);
    }
    IDE: Code::Blocks | Compiler Suite for Windows: TDM-GCC (MingW, gdb)

  6. #6
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by smokeyangel View Post
    Hmm? Are you saying that he should be reading 1 element of size BUFFERSIZE? I don't agree - I think reading BUFFERLENGTH items of size 1 is the right thing to do. Reading 1 item of length BUFFERLENGTH would mean the fread call would return 0 when there were less than BUFFERLENGTH chars left, which isn't what the OP wants.
    Let's look at the prototype of fread.

    Code:
         size_t
         fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);
    As you can see from the prototype, the 2nd argument is size and the 3rd item amount. However the important point that I made was that the amount that is written needs to be the amount returned by fread, not some pre-defined constant.

  7. #7
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    Strangely enough, this doesn't work, with BUFFERLENGTH and x set as the size. No error message though. Looks like the if() loop doesn't get executed at all. Too much programming makes me C sick.

    I'll try using feof() and ferror() later.

    Code:
    void combine(const char *file1, const char *file2)
    {
        char buffer[BUFFERLENGTH];
        size_t x; // number of characters read by fread()
    
        while((x = fread(buffer, BUFFERLENGTH, 1, fp2)) > 0)  // read file2 into buffer
        {
            if((fwrite(buffer, x, 1, fp1)) != x)  // write buffer to file1
            {
                fprintf(stderr, "File combine error.\n");
                exit(EXIT_FAILURE);
            }
        }
    
        puts("Files successfully combined.");
    
        fclose(fp1);
        fclose(fp2);
    }
    IDE: Code::Blocks | Compiler Suite for Windows: TDM-GCC (MingW, gdb)

  8. #8
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Too much programming makes me C sick.
    Spend a few hours with Intercal, you'll soon find C a glorious paradise.

    Soma

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Your function is... in a word... bad.
    You are providing two const char* arguments which you are then ignoring and then using some global FILE handles instead.
    I would suspect that you are never opening the files and that it never goes into the loop.

    Fix it to remove the dependance on globals.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    I wonder, is reading and writing one character at a time inefficient to reading 4096[BUFFERLENGTH] characters at a time?

    Code:
    // joinvideofiles.c: combines video files by appending them
    // 2 functions: openfile(), combine()
    
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    #define FILENAMELENGTH 40
    #define BUFFERLENGTH 4096
    
    FILE *fp1, *fp2;
    
    int openfile( char *file1,  char *file2);
    void combine( char *file1,  char *file2);
    
    int main(void)
    {
        int x; // return value of openfile()
        char file1[FILENAMELENGTH], file2[FILENAMELENGTH];
    
        do
        {
            printf("Enter file 1: ");
            fgets(file1, FILENAMELENGTH, stdin);
            printf("Enter file 2: ");
            fgets(file2, FILENAMELENGTH, stdin);
            file1[strlen(file1) - 1] = '\0'; // known filenames, no overflow problems
            file2[strlen(file2) - 1] = '\0'; // known filenames, no overflow problems
        } while((x = openfile(file1, file2)) == 1);
    
        combine(file1, file2);
    
        return 0;
    }
    
    int openfile( char *file1,  char *file2)
    {
        if(((fp1 = fopen(file1, "ab")) == NULL) || ((fp2 = fopen(file2, "rb")) == NULL))
        {
            fprintf(stderr, "Error opening file.\n");
            return 1;
        }
    
        return 0;
    }
    
    void combine(char *file1, char *file2)
    {
        char buffer[BUFFERLENGTH];
    
        while(fread(buffer, sizeof(char), 1, fp2), !feof(fp2) && !ferror(fp2))  // read file2 into buffer
        {
             fwrite(buffer, sizeof(char), 1, fp1); // write buffer to file1
        }
    
        if(feof(fp2))
        {
            puts("Files combined successfully.");
        }
    
        if(ferror(fp2))
        {
            puts("An error has occured");
        }
    
        fclose(fp1);
        fclose(fp2);
    }
    IDE: Code::Blocks | Compiler Suite for Windows: TDM-GCC (MingW, gdb)

  11. #11
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by cfanatic View Post
    I wonder, is reading and writing one character at a time inefficient to reading 4096[BUFFERLENGTH] characters at a time?
    I suggest you would be better off pondering techniques to get your code functioning as intended, rather than wasting your time with questions of "efficiency". Increasing efficiency does not take code that is wrong and make it right.

    To answer your question, however, there is rarely any practical difference. I/O tends to be the slowest part or parts of any computer program, because it involves reading data from or writing data to slow relatively devices. For example, hard drive I/O is several orders of magnitude slower than moving data between processor registers and in memory. Even modern devices (say, a SSD) are significantly slower than a processor or RAM. With modern buffering and caching, the differences you can make by switching between reading a large buffer versus one character at a time are a few percent - if you are lucky. And that difference is insignificant.

    You are therefore better off picking a technique that is easiest to code correctly, to solve your problem at hand.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  12. #12
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    Here you go grumpy.

    Code:
    // joinvideofiles.c: combines video files by appending them
    // 2 functions: openfile(), combine()
    
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    #define FILENAMELENGTH 40
    #define BUFFERLENGTH 4096
    
    FILE *fp1, *fp2;
    
    int openfile(char *file1, char *file2);
    void combine(char *file1, char *file2);
    
    int main(void)
    {
        int x; // return value of openfile()
        char file1[FILENAMELENGTH], file2[FILENAMELENGTH];
    
        do
        {
            printf("Enter file 1: ");
            fgets(file1, FILENAMELENGTH, stdin);
            printf("Enter file 2: ");
            fgets(file2, FILENAMELENGTH, stdin);
            file1[strlen(file1) - 1] = '\0'; // known filenames, no overflow problems
            file2[strlen(file2) - 1] = '\0'; // known filenames, no overflow problems
        } while((x = openfile(file1, file2)) == 1);
    
        combine(file1, file2);
    
        return 0;
    }
    
    int openfile(char *file1, char *file2)
    {
        if(((fp1 = fopen(file1, "ab")) == NULL) || ((fp2 = fopen(file2, "rb")) == NULL))
        {
            fprintf(stderr, "Error opening file.\n");
            return 1;
        }
    
        return 0;
    }
    
    void combine(char *file1, char *file2)
    {
        char buffer[BUFFERLENGTH];
    
        while((fread(buffer, sizeof(char), 1, fp2)) > 0)  // read file2 into buffer
        {
            fwrite(buffer, sizeof(char), 1, fp1);  // write buffer to file 1
        }
    
        puts("Files combined successfully.");
    
        fclose(fp1);
        fclose(fp2);
    }
    Last edited by cfanatic; 09-09-2012 at 07:35 AM. Reason: got rid of all the const
    IDE: Code::Blocks | Compiler Suite for Windows: TDM-GCC (MingW, gdb)

  13. #13
    Registered User
    Join Date
    Mar 2010
    Posts
    583
    Quote Originally Posted by Subsonics View Post
    Let's look at the prototype of fread.

    Code:
         size_t
         fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);

    As you can see from the prototype, the 2nd argument is size and the 3rd item amount.
    Well, yes, but isn't that what the OP had already?

    Code:
        while((x = fread(buffer, 1, BUFFERLENGTH, fp2)) > 0)
    I think "size" is meant to be the size of a single element, not the size of the whole buffer. Like if you were reading an array of structs, I think size would be the size of a single struct.
    That said, either are valid and legal. Nothing wrong with having a single, large element.

    Quote Originally Posted by Subsonics View Post
    However the important point that I made was that the amount that is written needs to be the amount returned by fread, not some pre-defined constant.
    Yep, definitely. Didn't mean to detract from your point, sorry.

    Quote Originally Posted by cfrantic
    Strangely enough, this doesn't work, with BUFFERLENGTH and x set as the size. No error message though. Looks like the if() loop doesn't get executed at all. Too much programming makes me C sick.

    I'll try using feof() and ferror() later.
    I'd put my money on it being EOF.

    That said, there are some bits of your code that could easily lead to problems. Having an if statement with 2 fopen calls, in a loop..... is probably not a great idea. It can call fopen multiple times on maybe-different maybe-same files and throw away the file handles on the next call.
    Is this legal C and probably not the cause of your problems, but it'd definitely help avoid future pain if you could look after your file handles better

    Quote Originally Posted by cfrantic
    I wonder, is reading and writing one character at a time inefficient to reading 4096[BUFFERLENGTH] characters at a time?
    The C standard is oddly specific about how these functions work:
    Quote Originally Posted by C11 draft
    The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fgetc function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read.
    So that says that fgetc is called size * num times whichever way you pitch it. In reality, I expect it'll be cached and buffered and speculatively read and blah blah blah.

    Of course, in a program with a tight loop being repeated many many times, any inefficiency will be probably be noticable, and of course there is an overhead of the extra calls to fread and fwrite. And the C library may be optimised differently for big buffers and little buffers.

    Speculation aside I modified your program a bit an did some experiments with different buffer sizes, different placement (stack or heap). I copied a 237MB file to another file, then combined the two. I found:
    • 4096 seems to be the best buffer size (I tried 2048 and 8192). This may be because I think MSVC has an internal 4096 buffer
    • Putting the buffer on the stack was faster than on the heap. Not sure why that'd be
    • Writing a character at a time is significantly slower - took about seconds 148, compared to 66 with buffer.
    • But using a larger buffer really hammered my PC. It slowed to a crawl, and the transfer rate was very lumpy with 5-10 second pauses, presumably while I/O was flushed. Transfer rate was getting gradually slower during the test
    • Conversely, doing 1 char at a time was smoother and seemed more predictable
    • I added some code to measure the processing rate (how many bytes was it managing to read and write in a second). A 2048 sized buffer peaked at processing (read and write) 16MB of data a second, with apparently a low of 2MB/s. Writing byte by byte was steady 3MB/s the whole time.
    • I tried this on MSVC 2010 express edition


    So yes, it's slower. It's probably worth pursuing efficiency - but better to get it working first. Then you have a baseline to compare against and improve.

    Wow, more answer than anyone could possible have wanted. I do get a bit carried away sometimes........


    edit - cfanatic -- style issues aside, that looks like it'll work. Does it?
    Last edited by smokeyangel; 09-09-2012 at 08:48 AM.

  14. #14
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I have always used 4096 as the buffer size for this sort of thing.
    I believe it's the fastest because it's the disk cluster size.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  15. #15
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    Quote Originally Posted by smokeyangel View Post

    Wow, more answer than anyone could possible have wanted. I do get a bit carried away sometimes........
    I do appreciate lengthy replies.


    Quote Originally Posted by smokeyangel View Post
    edit - cfanatic -- style issues aside, that looks like it'll work. Does it?
    Yes it works perfectly for combining binary files. .mpg files can be combined and the entire combined file can be played back .mp4 files will combine, but only the first video will play back.

    I think it's the .mpg format that allows it to work. I wonder what other media formats can be combined and the combined file will play in it's entirety.
    Last edited by cfanatic; 09-09-2012 at 04:57 PM.
    IDE: Code::Blocks | Compiler Suite for Windows: TDM-GCC (MingW, gdb)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. binary files: copying jpeg or video files
    By cfanatic in forum C Programming
    Replies: 5
    Last Post: 07-19-2012, 08:17 AM
  2. Binary Files
    By d_boggus in forum C Programming
    Replies: 2
    Last Post: 05-07-2010, 05:25 AM
  3. Appending to files and getting file size
    By mike_g in forum C Programming
    Replies: 4
    Last Post: 07-24-2008, 12:34 PM
  4. Opening a file for binary appending
    By spank in forum C Programming
    Replies: 1
    Last Post: 11-01-2007, 07:34 AM
  5. Binary files in C++
    By Unregistered in forum C++ Programming
    Replies: 3
    Last Post: 09-25-2001, 04:48 PM