Thread: how to copy binary files using Unix API's

  1. #1
    Registered User
    Join Date
    May 2008
    Posts
    31

    how to copy binary files using Unix API's

    Hello everyone,
    As a part of a data backup project, I need to use Unix API's for performing file copy from source to destination.I have used the API's open(),read(),write() and close() for performing this.It works fine for regular files but not for binary files(like audio or video files).The newly created binary file gets partially corrupted(I got to know that the file is corrupted when VLC media player said : "the AVI file is broken".How should I use these API's for copying binary files?Also I dont want to copy the files byte by byte for efficiency reasons.I am using a buffer of size 1MB.

    Thank You,
    Rohan

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Unless I'm missing something not-so-obvious, if you are opening both input and output in binary mode, and copying character by character, nothing should change, therefore nothing should break.

    Todd
    Mainframe assembler programmer by trade. C coder when I can.

  3. #3
    Registered User
    Join Date
    May 2008
    Posts
    31
    For the sake of efficiency I don't want to do a byte by byte copy.It woud be better to do a block read and block write.

  4. #4
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    The sendfile() call can be used to copy bytes directly from one descriptor to another, avoiding all user-space buffering. The only caveat is needing to know how many bytes will be copied up front. If you can seek on the input, there's no problem:

    Code:
    int input, output;
    off_t size;
    off_t offset;
    
    input = open(input_file, O_RDONLY);
    output = open(output_file, O_WRONLY | O_CREAT, 0666);
    size = lseek(input, 0, SEEK_END);
    offset = 0;
    sendfile(output, input, &offset, (size_t)size);
    close(output);
    close(input);
    This is the most efficient way to copy bytes from one descriptor to another on UNIX^H^H^H^HLinux, because the data never enters userspace. (Oops -- thought sendfile() was POSIX for some reason. It's widely available, though)
    Last edited by brewbuck; 05-06-2008 at 12:33 PM.

  5. #5
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Quote Originally Posted by rohan_ak1 View Post
    For the sake of efficiency I don't want to do a byte by byte copy.It woud be better to do a block read and block write.
    I apologize - you didn't say you were copying byte by byte, and I implied you were. tit-for-tat - if you are reading a block and writing the same sized block, in binary, no alteration should occur.


    Todd
    Mainframe assembler programmer by trade. C coder when I can.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    Are you writing the size of the buffer, or the answer returned by read ?

    while ( (n=read(buff,BUFSIZ)) ) write( buff, n);
    kinda thing
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    May 2008
    Posts
    31
    This is the code I have written for copying files from source to destination:
    Code:
    fps=open(source,O_RDONLY);
    		//printf("dest = %s\n\n",dest);
    		fpd=open(dest,O_WRONLY|O_CREAT,0777);
    		if(fps==-1)
    		{
    			printf("source = %s\n",source);
    			printf("error : fps\n");
    			break;
    		}
    		if(fpd==-1)
    			printf("error : fpd\n");
    		fstat(fps,&buf);
    		bzero(buffer,BSIZE);
    		filesize=buf.st_size;
    		while(filesize)
    		{
    			if(filesize>BSIZE-1)
    			{
    				filesize=filesize-BSIZE-1;
    				read(fps,buffer,BSIZE-1);
    				write(fpd,buffer,BSIZE-1);
    				bzero(buffer,BSIZE);
    			}
    			else
    			{
    				read(fps,buffer,filesize);
    				write(fpd,buffer,filesize);
    				filesize=0;
    				bzero(buffer,BSIZE);
    			}
    		}
    This works fine for text files but problems for binary files.
    Can anyone suggest changes??

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I don't see anything directly wrong with it, but it can be made simpler, less repeats of code by adding a variable for the block-size:
    Code:
               int blocksize;
               ...
    		while(filesize)
    		{
    			if(filesize>BSIZE)
    			{
                                    blocksize = BSIZE;
                            } 
                            else
                            {
                                    blocksize = filesize;
                            }
    			filesize -= blocksize;
    			read(fps,buffer,blocksize);
    			write(fpd,buffer,blocksize);
                    }
    I removed the bzero - there should be no need for that.
    Also no need to use BSIZE-1 [assuming your buffer is defined as char buffer[BSIZE] - otherwise your bzero is wrong and overwrites one byte outside buffer].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Registered User
    Join Date
    May 2008
    Posts
    31
    In this code,what should I set as the size of the buffer used for read/write.Currently am using 1MB buffer size.This code works well for text files but not for binary files.Using this to copy audio/video files makes the new copy broken(VLC player shows the video file to be broken).How should I make it work for binary files????
    Last edited by rohan_ak1; 05-07-2008 at 04:22 AM. Reason: more information

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by rohan_ak1 View Post
    In this code,what should I set as the size of the buffer used for read/write.Currently am using 1MB buffer size.Also can I use mmap and memcpy to improve performance?If so,how can I use them for large files.
    The buffer size of 1MB should be "reasonable". I don't know what give the best performance on any given system - never mind a generic solution that gives good average performance on a wide range of systems. It's safe to assume that it should be smaller than large disk-caches (they are often several megabytes), and must be larger than 4KB to make it reasonably efficient for the overhead of the kernel mapping the page(s) that the data is in, etc. Something in the order of 64KB to a few MB would be the range I'd "try out" if given a task of finding the best one.

    The biggest factor in performance would be the actual reading of the data, and since you are reading sequentially, mmap/memcpy shouldn't make any difference - since that's essentially the same as what you're already doing, using the read (which memcpy's the data to you) and fwrite() which memcpy's the data into the kernel to be written. mmap() is much better if you are trying to read/write in many different locations (scarsely) in a large file.

    Not calling bzero would have some impact.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Registered User
    Join Date
    May 2008
    Posts
    31
    I have now set the buffer size to 1MB and removed the unwanted bzero statements...but what should I do with the problem of improper copying of binary files ( they are broken).

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by rohan_ak1 View Post
    I have now set the buffer size to 1MB and removed the unwanted bzero statements...but what should I do with the problem of improper copying of binary files ( they are broken).
    Well, I would figure that out using something like "od" to see what the binary file looks like in an a more human readable form [Note: I didn't say easy readable!], and then use diff to tell see the difference.

    Are the files the same size? Where does the differences start?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  13. #13
    Registered User
    Join Date
    May 2008
    Posts
    31
    On changing the code to :
    Code:
    while(filesize)
    		{
    			if(filesize>BSIZE)
    			{
                                    blocksize = BSIZE;
                            } 
                            else
                            {
                                    blocksize = filesize;
                            }
    			filesize -= blocksize;
    			read(fps,buffer,blocksize);
    			write(fpd,buffer,blocksize);
                    }
    and removing all bzero's I am able to copy binary files perfectly.The destination file is no longer broken.I tried with an AVI file.However,when I try to backup a set of 56 mp3 files of total size 478.1 MB having 3 sub-directories the destination backup directory has all the 56 files and 3 sub-folders.But its size is 478MB.One of the directories is 0.1MB lesser than the original directory.I am copying from a FAT32 filesystem to an ext2 filesystem.

  14. #14
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You really should check EACH file's size, not the total size. Different OS's count file-sizes differently.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  15. #15
    Registered User
    Join Date
    May 2008
    Posts
    31
    I have checked file sizes on fedora 8 for both source and destination directories.I checked every file-size and it is the same.But the total file size is 0.1MB less.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Merge Binary Files
    By chinook86 in forum C Programming
    Replies: 7
    Last Post: 01-21-2008, 02:19 PM
  2. Reading Binary files
    By earth_angel in forum C++ Programming
    Replies: 10
    Last Post: 07-12-2005, 06:48 AM
  3. Using Configuration Files with C on UNIX
    By emaxham in forum C Programming
    Replies: 1
    Last Post: 10-16-2001, 02:28 PM
  4. Binary files in C++
    By Unregistered in forum C++ Programming
    Replies: 3
    Last Post: 09-25-2001, 04:48 PM
  5. C function to copy files?
    By Unregistered in forum C Programming
    Replies: 1
    Last Post: 09-11-2001, 01:10 AM