The only realistic way to get high precision for such a short operation (I'm assuming that data is not hundreds of kilobytes) is to use timestamp counters.

You can use the RDTSC instruction using the "rdtscll()" function - I'm not quite sure which header file it lives in - in the kernel its include/asm-i386/msr.h or include/asm-x86_64/msr.h

--
Mats