You're still calling clearerr() before ferror() in one instance.
Code:
/* Read from the input file descriptor and write to the output file descriptor
* while keeping the 8k within the CPUs L1 cache */
Your reason for choosing your buffer size is bogus.
The whole of buff is written to by the fread() call.
Each element of buff is read once (if zero), or twice (if non-zero)
You have other data as well (the rest of the local stack frame for instance), not to mention the effects of
- calling other functions
- traps into the OS to physically read/write the disk
- the OS itself forcing context switches to other processes.
The elephants in the room are all those fread / fwrite / fseek calls.
Memory access times are measured in nanoseconds.
Hard disk head seek times are measured in milliseconds (that's 1M times slower).
Why bother worrying about whether something takes 1 or 2 seconds when you know there is a delay of a fortnight coming up real soon?
> char buf[8192]
You don't even use your #define value.
IMO, you would be better off to start with using BUFSIZ, which is a constant in stdio.h, and is the optimal size for file operations, as determined by the implementers of your standard C libary.
char buf[BUFSIZ];
Then there is the 'time' command.
time ./copysparse infile outfile
Depending on your system, this should print out how much time (real time, user time and system time) the process spent performing the given task.
Also try using the 'top' command (in another terminal window). If shows you the current active processes.
You might be surprised by how little CPU time your copy program takes.