PDA

View Full Version : Using splice() to copy 1 file to another



Angus
01-27-2009, 08:43 AM
According to http://en.wikipedia.org/wiki/Splice_(system_call) the proper way to use splice() to copy one file to another involves clever use of the pipe() system call. (This is one way to get around the requirement that one of the file descriptors given to splice() is a pipe). What I find strange is that the code example on that page has a while loop writing the entire file to the pipe, and then having another loop read the entire file from the pipe. I don't exactly know what operations are going on in the kernel. Perhaps all the data of the file is being written to the pipe, thus filling up as much kernel memory as the file has data. Or, perhaps the pipe buffer just contains a series of references to the appropriate data in the file, just making the pipe buffer an unmanageably large number of references to a file, rather than a ridiculously unmanageable amount of raw data.

In any case, wouldn't it be better that each splice() call that writes to the pipe be followed by a call that reads from the pipe, rather than packaging each type of splice call to its own while loop?

Another question I have on the side is, why is the writing splice() call not handling a return value of -1 followed by an errno value of ENOMEM ("Out of memory")--something that seems a very likely event?

Salem
01-27-2009, 11:12 AM
The example code would appear to bail out when the pipe is full.

brewbuck
01-27-2009, 12:59 PM
This is the wrong use of splice(). As you've realized, splice() only works with a pipe. You don't want to create a pipe just to copy a file.

To copy bytes from one open fd to another, use the sendfile() function.

Angus
01-27-2009, 01:30 PM
Then output file descriptor to sendfile() must be a socket.

brewbuck
01-27-2009, 02:40 PM
Then output file descriptor to sendfile() must be a socket.

I do see what you are talking about on the man page. But that isn't by design. The kernel is apparently buggy, or somebody is too lazy to make it work.

Angus
01-28-2009, 08:51 AM
I do see what you are talking about on the man page. But that isn't by design. The kernel is apparently buggy, or somebody is too lazy to make it work.

http://ilia.ws/archives/13-sendfile-syscall-and-why-the-2.6-linux-kernel-sucks!.html gives some slightly different reasons for that. What about memory-mapping the input and output files then memcpy()ing between them?

matsp
01-28-2009, 09:03 AM
http://ilia.ws/archives/13-sendfile-syscall-and-why-the-2.6-linux-kernel-sucks!.html gives some slightly different reasons for that. What about memory-mapping the input and output files then memcpy()ing between them?

In previous testing, it was found that simply reading the file with fread (or read) and writing with fwrite (or write) is faster than mmap'ing the file.

--
Mats

Angus
01-29-2009, 08:24 AM
In previous testing, it was found that simply reading the file with fread (or read) and writing with fwrite (or write) is faster than mmap'ing the file.


What about using the pipe as a buffer for the splice calls, as in the Wikipedia code example?

MK27
01-29-2009, 08:57 AM
What I find strange is that the code example on that page has a while loop writing the entire file to the pipe, and then having another loop read the entire file from the pipe.

In any case, wouldn't it be better that each splice() call that writes to the pipe be followed by a call that reads from the pipe, rather than packaging each type of splice call to its own while loop?

Have you actually tried this? Chances are each loop only iterates once anyway.


Another question I have on the side is, why is the writing splice() call not handling a return value of -1 followed by an errno value of ENOMEM ("Out of memory")--something that seems a very likely event?
It is:


if (ret < 0)
goto pipe;
pipe:
close (filedes [0]);
close (filedes [1]);
out:
if (ret < 0)
return -errno;
return 0;