appending binary files

**cfanatic** · 09-08-2012

Hi all.

I am trying to write a program that can combine video files. Can someone please explain why the number of characters written is not equal to the number of characters read? I am getting the "File combine error" message.

Thanks in advance.

Code:

// joinvideofiles.c: combines video files by appending them
// 2 functions: openfile(), combine()

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define FILENAMELENGTH 40
#define BUFFERLENGTH 4096

FILE *fp1, *fp2;

int openfile(const char *file1, const char *file2);
void combine(const char *file1, const char *file2);

int main(void)
{
    int x; // return value of openfile()
    char file1[FILENAMELENGTH], file2[FILENAMELENGTH];

    do
    {
        printf("Enter file 1: ");
        fgets(file1, FILENAMELENGTH, stdin);
        printf("Enter file 2: ");
        fgets(file2, FILENAMELENGTH, stdin);
        file1[strlen(file1) - 1] = '\0'; // known filenames, no overflow problems
        file2[strlen(file2) - 1] = '\0'; // known filenames, no overflow problems
    } while((x = openfile(file1, file2)) == 1);

    combine(file1, file2);

    return 0;
}

int openfile(const char *file1, const char *file2)
{
    if(((fp1 = fopen(file1, "ab")) == NULL) || ((fp2 = fopen(file2, "rb")) == NULL))
    {
        fprintf(stderr, "Error opening file.\n");
        return 1;
    }

    return 0;
}

void combine(const char *file1, const char *file2)
{
    char buffer[BUFFERLENGTH];
    size_t x; // number of characters read by fread()

    while((x = fread(buffer, 1, BUFFERLENGTH, fp2)) > 0)  // read file2 into buffer
    {
        if((fwrite(buffer, 1, BUFFERLENGTH, fp1)) != x)  // write buffer to file 1
        {
            fprintf(stderr, "File combine error.\n");
            exit(EXIT_FAILURE);
        }
    }

    fclose(fp1);
    fclose(fp2);
}

**grumpy** · 09-08-2012

Your code will always report a "file combine error" if the length of the input file is not an exact multiple of BUFFERLENGTH.

You need to check the return values of fread() and fwrite() more completely. Use ferror() and feof() to do that. fread() can return a value that is both non-zero and not equal to the count (BUFFERLENGTH in your code) if it encounters end of file.

**Subsonics** · 09-08-2012

You have the order mixed up in fread/fwrite, you are reading BUFFERLENGTH items of size 1. And if you assume that fread decided to not read all BUFFERLENGTH items, you still attempt to write BUFFERLENGTH items instead of x. Just some ideas.

**smokeyangel** · 09-08-2012

What grumpy said.

If there are only 1000 characters left in the file, fread will return 1000.
fwrite will write however much you ask it to - you fwrite the whole 4096 buffer, which is not the amount you just read.

ferror and feof are essential!

Couple of general thoughts:
If file1 and file2 are the same file and the file exists, you could run into trouble. Data written using fwrite could be written out to the file at any time, so you would end up reading it again.

Also, are you sure you really want to directly append to the first file? If something goes wrong in combine(), the first file will likely be corrupted with partial data from the second. If it were me I'd output the first and second to a new file. I guess with big files this might be painful, but less painful than corrupted videos...

Unrelated and picky:

Code:

if(((fp1 = fopen(file1, "ab")) == NULL) || ((fp2 = fopen(file2, "rb")) == NULL))

If file1 and file2 are the same and point to files that don't exist, the first fopen will create the file and the second will reopen the same empty file.

Originally Posted by Subsonics

You have the order mixed up in fread/fwrite, you are reading BUFFERLENGTH items of size 1.

Hmm? Are you saying that he should be reading 1 element of size BUFFERSIZE? I don't agree - I think reading BUFFERLENGTH items of size 1 is the right thing to do. Reading 1 item of length BUFFERLENGTH would mean the fread call would return 0 when there were less than BUFFERLENGTH chars left, which isn't what the OP wants.

**cfanatic** · 09-08-2012

Originally Posted by Subsonics

You have the order mixed up in fread/fwrite, you are reading BUFFERLENGTH items of size 1. And if you assume that fread decided to not read all BUFFERLENGTH items, you still attempt to write BUFFERLENGTH items instead of x. Just some ideas.

It does produce working code. 2 ways, although I suppose the second way would be more correct, since it is making sure that whatever number of characters is read is also written.

Code:

void combine(const char *file1, const char *file2)
{
    char buffer[BUFFERLENGTH];
    size_t x; // number of characters read by fread()

    while((x = fread(buffer, BUFFERLENGTH, 1, fp2)) > 0)  // read file2 into buffer
    {
        if((fwrite(buffer, BUFFERLENGTH, 1, fp1)) != x)  // write buffer to file1
        {
            fprintf(stderr, "File combine error.\n");
            exit(EXIT_FAILURE);
        }
    }

    puts("Files successfully combined.");

    fclose(fp1);
    fclose(fp2);
}

Code:

void combine(const char *file1, const char *file2)
{
    char buffer[BUFFERLENGTH];
    size_t x; // number of characters read by fread()

    while((x = fread(buffer, 1, BUFFERLENGTH, fp2)) > 0)  // read file2 into buffer
    {
        if((fwrite(buffer, 1, x, fp1)) != x)  // write buffer to file1
        {
            fprintf(stderr, "File combine error.\n");
            exit(EXIT_FAILURE);
        }
    }

    puts("Files successfully combined.");

    fclose(fp1);
    fclose(fp2);
}

**Subsonics** · 09-08-2012

Originally Posted by smokeyangel

Hmm? Are you saying that he should be reading 1 element of size BUFFERSIZE? I don't agree - I think reading BUFFERLENGTH items of size 1 is the right thing to do. Reading 1 item of length BUFFERLENGTH would mean the fread call would return 0 when there were less than BUFFERLENGTH chars left, which isn't what the OP wants.

Let's look at the prototype of fread.

Code:

     size_t
     fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);

As you can see from the prototype, the 2nd argument is size and the 3rd item amount. However the important point that I made was that the amount that is written needs to be the amount returned by fread, not some pre-defined constant.

**cfanatic** · 09-08-2012

Strangely enough, this doesn't work, with BUFFERLENGTH and x set as the size. No error message though. Looks like the if() loop doesn't get executed at all. Too much programming makes me C sick.

I'll try using feof() and ferror() later.

Code:

void combine(const char *file1, const char *file2)
{
    char buffer[BUFFERLENGTH];
    size_t x; // number of characters read by fread()

    while((x = fread(buffer, BUFFERLENGTH, 1, fp2)) > 0)  // read file2 into buffer
    {
        if((fwrite(buffer, x, 1, fp1)) != x)  // write buffer to file1
        {
            fprintf(stderr, "File combine error.\n");
            exit(EXIT_FAILURE);
        }
    }

    puts("Files successfully combined.");

    fclose(fp1);
    fclose(fp2);
}

**phantomotap** · 09-08-2012

Too much programming makes me C sick.

Spend a few hours with Intercal, you'll soon find C a glorious paradise.

Soma

**iMalc** · 09-09-2012

Your function is... in a word... bad.
You are providing two const char* arguments which you are then ignoring and then using some global FILE handles instead.
I would suspect that you are never opening the files and that it never goes into the loop.

Fix it to remove the dependance on globals.

**cfanatic** · 09-09-2012

I wonder, is reading and writing one character at a time inefficient to reading 4096[BUFFERLENGTH] characters at a time?

Code:

// joinvideofiles.c: combines video files by appending them
// 2 functions: openfile(), combine()

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define FILENAMELENGTH 40
#define BUFFERLENGTH 4096

FILE *fp1, *fp2;

int openfile( char *file1,  char *file2);
void combine( char *file1,  char *file2);

int main(void)
{
    int x; // return value of openfile()
    char file1[FILENAMELENGTH], file2[FILENAMELENGTH];

    do
    {
        printf("Enter file 1: ");
        fgets(file1, FILENAMELENGTH, stdin);
        printf("Enter file 2: ");
        fgets(file2, FILENAMELENGTH, stdin);
        file1[strlen(file1) - 1] = '\0'; // known filenames, no overflow problems
        file2[strlen(file2) - 1] = '\0'; // known filenames, no overflow problems
    } while((x = openfile(file1, file2)) == 1);

    combine(file1, file2);

    return 0;
}

int openfile( char *file1,  char *file2)
{
    if(((fp1 = fopen(file1, "ab")) == NULL) || ((fp2 = fopen(file2, "rb")) == NULL))
    {
        fprintf(stderr, "Error opening file.\n");
        return 1;
    }

    return 0;
}

void combine(char *file1, char *file2)
{
    char buffer[BUFFERLENGTH];

    while(fread(buffer, sizeof(char), 1, fp2), !feof(fp2) && !ferror(fp2))  // read file2 into buffer
    {
         fwrite(buffer, sizeof(char), 1, fp1); // write buffer to file1
    }

    if(feof(fp2))
    {
        puts("Files combined successfully.");
    }

    if(ferror(fp2))
    {
        puts("An error has occured");
    }

    fclose(fp1);
    fclose(fp2);
}

**grumpy** · 09-09-2012

Originally Posted by cfanatic

I wonder, is reading and writing one character at a time inefficient to reading 4096[BUFFERLENGTH] characters at a time?

I suggest you would be better off pondering techniques to get your code functioning as intended, rather than wasting your time with questions of "efficiency". Increasing efficiency does not take code that is wrong and make it right.

To answer your question, however, there is rarely any practical difference. I/O tends to be the slowest part or parts of any computer program, because it involves reading data from or writing data to slow relatively devices. For example, hard drive I/O is several orders of magnitude slower than moving data between processor registers and in memory. Even modern devices (say, a SSD) are significantly slower than a processor or RAM. With modern buffering and caching, the differences you can make by switching between reading a large buffer versus one character at a time are a few percent - if you are lucky. And that difference is insignificant.

You are therefore better off picking a technique that is easiest to code correctly, to solve your problem at hand.

**cfanatic** · 09-09-2012

Here you go grumpy.

Code:

// joinvideofiles.c: combines video files by appending them
// 2 functions: openfile(), combine()

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define FILENAMELENGTH 40
#define BUFFERLENGTH 4096

FILE *fp1, *fp2;

int openfile(char *file1, char *file2);
void combine(char *file1, char *file2);

int main(void)
{
    int x; // return value of openfile()
    char file1[FILENAMELENGTH], file2[FILENAMELENGTH];

    do
    {
        printf("Enter file 1: ");
        fgets(file1, FILENAMELENGTH, stdin);
        printf("Enter file 2: ");
        fgets(file2, FILENAMELENGTH, stdin);
        file1[strlen(file1) - 1] = '\0'; // known filenames, no overflow problems
        file2[strlen(file2) - 1] = '\0'; // known filenames, no overflow problems
    } while((x = openfile(file1, file2)) == 1);

    combine(file1, file2);

    return 0;
}

int openfile(char *file1, char *file2)
{
    if(((fp1 = fopen(file1, "ab")) == NULL) || ((fp2 = fopen(file2, "rb")) == NULL))
    {
        fprintf(stderr, "Error opening file.\n");
        return 1;
    }

    return 0;
}

void combine(char *file1, char *file2)
{
    char buffer[BUFFERLENGTH];

    while((fread(buffer, sizeof(char), 1, fp2)) > 0)  // read file2 into buffer
    {
        fwrite(buffer, sizeof(char), 1, fp1);  // write buffer to file 1
    }

    puts("Files combined successfully.");

    fclose(fp1);
    fclose(fp2);
}

**smokeyangel** · 09-09-2012

Originally Posted by Subsonics

Let's look at the prototype of fread.

Code:

     size_t
     fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);

As you can see from the prototype, the 2nd argument is size and the 3rd item amount.

Well, yes, but isn't that what the OP had already?

Code:

    while((x = fread(buffer, 1, BUFFERLENGTH, fp2)) > 0)

I think "size" is meant to be the size of a single element, not the size of the whole buffer. Like if you were reading an array of structs, I think size would be the size of a single struct.
That said, either are valid and legal. Nothing wrong with having a single, large element.

Originally Posted by Subsonics

However the important point that I made was that the amount that is written needs to be the amount returned by fread, not some pre-defined constant.

Yep, definitely. Didn't mean to detract from your point, sorry.

Originally Posted by cfrantic

Strangely enough, this doesn't work, with BUFFERLENGTH and x set as the size. No error message though. Looks like the if() loop doesn't get executed at all. Too much programming makes me C sick.

I'll try using feof() and ferror() later.

I'd put my money on it being EOF.

That said, there are some bits of your code that could easily lead to problems. Having an if statement with 2 fopen calls, in a loop..... is probably not a great idea. It can call fopen multiple times on maybe-different maybe-same files and throw away the file handles on the next call.
Is this legal C and probably not the cause of your problems, but it'd definitely help avoid future pain if you could look after your file handles better

Originally Posted by cfrantic

I wonder, is reading and writing one character at a time inefficient to reading 4096[BUFFERLENGTH] characters at a time?

The C standard is oddly specific about how these functions work:

Originally Posted by C11 draft

The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is speciﬁed by size, from the stream pointed to by stream. For each object, size calls are made to the fgetc function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. The ﬁle position indicator for the stream (if deﬁned) is advanced by the number of characters successfully read.

So that says that fgetc is called size * num times whichever way you pitch it. In reality, I expect it'll be cached and buffered and speculatively read and blah blah blah.

Of course, in a program with a tight loop being repeated many many times, any inefficiency will be probably be noticable, and of course there is an overhead of the extra calls to fread and fwrite. And the C library may be optimised differently for big buffers and little buffers.

Speculation aside I modified your program a bit an did some experiments with different buffer sizes, different placement (stack or heap). I copied a 237MB file to another file, then combined the two. I found:

4096 seems to be the best buffer size (I tried 2048 and 8192). This may be because I think MSVC has an internal 4096 buffer
Putting the buffer on the stack was faster than on the heap. Not sure why that'd be
Writing a character at a time is significantly slower - took about seconds 148, compared to 66 with buffer.
But using a larger buffer really hammered my PC. It slowed to a crawl, and the transfer rate was very lumpy with 5-10 second pauses, presumably while I/O was flushed. Transfer rate was getting gradually slower during the test
Conversely, doing 1 char at a time was smoother and seemed more predictable
I added some code to measure the processing rate (how many bytes was it managing to read and write in a second). A 2048 sized buffer peaked at processing (read and write) 16MB of data a second, with apparently a low of 2MB/s. Writing byte by byte was steady 3MB/s the whole time.
I tried this on MSVC 2010 express edition

So yes, it's slower. It's probably worth pursuing efficiency - but better to get it working first. Then you have a baseline to compare against and improve.

Wow, more answer than anyone could possible have wanted. I do get a bit carried away sometimes........

edit - cfanatic -- style issues aside, that looks like it'll work. Does it?

**iMalc** · 09-09-2012

I have always used 4096 as the buffer size for this sort of thing.
I believe it's the fastest because it's the disk cluster size.

**cfanatic** · 09-09-2012

Originally Posted by smokeyangel

Wow, more answer than anyone could possible have wanted. I do get a bit carried away sometimes........

I do appreciate lengthy replies.

Originally Posted by smokeyangel

edit - cfanatic -- style issues aside, that looks like it'll work. Does it?

Yes it works perfectly for combining binary files. .mpg files can be combined and the entire combined file can be played back .mp4 files will combine, but only the first video will play back.

I think it's the .mpg format that allows it to work. I wonder what other media formats can be combined and the combined file will play in it's entirety.

Thread: appending binary files

Thread Tools

Search Thread

Display

appending binary files

Similar Threads

binary files: copying jpeg or video files

Binary Files

Appending to files and getting file size

Opening a file for binary appending

Binary files in C++