Thread: using mmap for copying large files

  1. #1
    Registered User
    Join Date
    May 2008
    Posts
    31

    using mmap for copying large files

    Hello everyone,
    I tried using mmap to copy very large files in the following way :

    Code:
    if((fps = open(source, O_RDONLY)) == -1)
    			printf("error : can't open source file for reading\n");
    		if((fpd = open(dest, O_RDWR | O_CREAT | O_TRUNC, 0777)) == -1)
    			printf("error : cant open destination file for writing\n");
    		printf("\nsource = %s\n",source);
    		
    		fstat(fps,&statbuf);
    		filesize=statbuf.st_size;
    		lseek(fpd,filesize-1,SEEK_SET);
    		write(fpd,"",1);
    		lseek(fpd,0,SEEK_SET);
    		bytes=2097152;
    		while(filesize > 0)
    		{
    			if(filesize < bytes)
    			{
    				bytes=filesize;
    				filesize=0;
    			}
    			else
    				filesize-=bytes;
    			if((src=mmap((caddr_t)0,bytes,PROT_READ,MAP_SHARED,fps,0))==MAP_FAILED)
    			{	
    				printf("mmap error : fps\n");
    				exit(1);
    			}
    			if((dst=mmap((caddr_t)0,bytes,PROT_READ | PROT_WRITE,MAP_SHARED,fpd,0))==MAP_FAILED)
    			{
    				printf("mmap error : fpd\n");
    				exit(1);
    			}
    			memcpy(dst,src,bytes);
    			if((munmap(src,bytes))==-1)
    			{
    				printf("munmap error : src\n");
    				exit(1);
    			}
    			if((munmap(dst,bytes)==-1))
    			{
    				printf("munmap error : dst\n");
    				exit(1);
    			}
    			lseek(fps,bytes,SEEK_SET);
    			lseek(fpd,bytes,SEEK_SET);
    		}
    I used the variables bytes instead of the filesize for mmap as I guess mapping wont work for very large files.This is working for text files but not for binary files.
    The problem with copying binary files especially video files using mmap is that the destination file is broken (as reported by VLC media player).How can I solve this??
    Last edited by rohan_ak1; 05-12-2008 at 10:56 AM. Reason: typing mistake

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    First, I wouldn't rely on a media player to tell you your copy worked. Make an MD5 of the source and destination to see if the copy was successful.

    You need to enable the LFS interface in glibc by defining _FILE_OFFSET_BITS to 64. Or you can define _LARGEFILE64_SOURCE and use the 64bit data types and API's explicitly.

    http://www.gnu.org/software/libc/man...st-Macros.html

    gg

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You should not use lseek() to tell where you want to start copying from, instead you need to use the off_t offset parameter that is the last thing in mmap() to tell linux where your mmap() should start, relative to the start of the file.

    I bet if you look at the file, you have the same content at every 2MB [or perhaps, it's only producing a 2MB file in the destination?]

    I agree with Codeplug, you should not use a media player to check the files, use md5sum or perhaps this sequence:
    Code:
    $ od -x file1 > file1.txt; od -x file2 > file2.txt; diff file1.txt file2.txt
    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Registered User
    Join Date
    May 2008
    Posts
    31
    I changed the code to this :

    Code:
    #define BSIZE 1048576
    .......................
    
    if((fps = open(source, O_RDONLY)) == -1)
    			printf("error : can't open source file for reading\n");
    		if((fpd = open(dest, O_RDWR | O_CREAT | O_TRUNC, 0777)) == -1)
    			printf("error : cant open destination file for writing\n");
    		
    		fstat(fps,&statbuf);
    		filesize=statbuf.st_size;
    		lseek(fpd,filesize-1,SEEK_SET);
    		
    		write(fpd,"",1);
    		bytes=BSIZE;
    		offset=0;
    		while(filesize > 0)
    		{
    			if(filesize < bytes)
    			{
    				bytes=filesize;
    				filesize=0;
    				
    			}
    			else
    			{
    				filesize-=bytes;
    				
    			}
    			if((src=mmap((caddr_t)0,bytes,PROT_READ,MAP_SHARED,fps,offset))==MAP_FAILED)
    			{	
    				printf("mmap error : fps\n");
    				exit(1);
    			}
    			if((dst=mmap((caddr_t)0,bytes,PROT_READ | PROT_WRITE,MAP_SHARED,fpd,offset))==MAP_FAILED)
    			{
    				printf("mmap error : fpd\n");
    				exit(1);
    			}
    			memcpy(dst,src,bytes);
    			if((munmap(src,bytes))==-1)
    			{
    				printf("munmap error : src\n");
    				exit(1);
    			}
    			if((munmap(dst,bytes)==-1))
    			{
    				printf("munmap error : dst\n");
    				exit(1);
    			}
    			offset=offset+bytes;
    		}
    This works fine for all kinds of files...However it is much slower than read/write.
    First I used mmap to copy all files in a directory totalling to 3GB.It took 5m48s while a read/write took only 4m48s.

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by rohan_ak1 View Post
    I changed the code to this :

    [snip]
    This works fine for all kinds of files...However it is much slower than read/write.
    First I used mmap to copy all files in a directory totalling to 3GB.It took 5m48s while a read/write took only 4m48s.
    Yes, didn't I say something like that when we discussed the copy function using read/write?

    read and write are VERY important functions in the OS, so they are optimized to as near ideal as they can be. I'm not saying that mmap is unoptimized, but it's working along a different mechanism [it uses page-faults to request the data], and it's not meant for sequential reading - the ideal scenario for mmap() is where you are randomly updating small sections of a larger file, and even better updating the same section several times.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    May 2008
    Posts
    31
    Yes we have discuddes about mmap earlier...However I thought of trying it out just to see if it is any better as I needed to implement fast file copy.Now I think I will stick onto read/write.

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by rohan_ak1 View Post
    Yes we have discuddes about mmap earlier...However I thought of trying it out just to see if it is any better as I needed to implement fast file copy.Now I think I will stick onto read/write.
    If you have a way to see page-faults in your system (top will probably do that), you will probably find that you have 1 PF per 4KB read and 4KB written in your mmap copy, and nearly no page-faults when using the regular copy function.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. fopen vs. _open (for BIG image files)
    By reversaflex in forum C Programming
    Replies: 3
    Last Post: 04-01-2007, 12:52 AM
  2. Platform-independent large files?
    By Cat in forum C++ Programming
    Replies: 2
    Last Post: 08-21-2006, 12:04 AM
  3. Saving large files faster than light!
    By ZapoTex in forum C Programming
    Replies: 19
    Last Post: 01-09-2005, 05:47 PM
  4. Replies: 0
    Last Post: 07-12-2002, 01:40 PM
  5. Reading Large Files!!!
    By jon in forum Windows Programming
    Replies: 1
    Last Post: 09-09-2001, 11:20 PM