Which one is faster?
Code:int a;
for(a = 0; a < 100; a++)
memory[a] = input[a];
Code:CopyMemory(memory, input, a);
Printable View
Which one is faster?
Code:int a;
for(a = 0; a < 100; a++)
memory[a] = input[a];
Code:CopyMemory(memory, input, a);
Probably the latter.
--
Mats
The latter. The compiler might have some tricks up its sleeve to optimize it.
Why not try both and find out?
Definately the later. Most compilers generate string instructions for block memory copies. The former solution causes it to compute the index into each array each time, thus using clock cycles. If you have some control over the araibles, there are inline assembly routines that are as fast as possible.
Code:
DWORD ByteCount = 100 * sizeof(input[0]);
__asm {
mov esi, input
mov edi, memory
mov ecx, ByteCount
rep movsb
}
Surely you would want to use MOVSD at the very least. Something like this is what the compiler usually comes up with:
That would probably execute roughly four times faster than abachler's code for anything in the "more than a dozen bytes" section.Code:MOV esi, input
MOV edi, memory
MOV ecx, ByteCount
mov edx, ecx
and edx, 3
shr ecx, 2
rep movsd
mov ecx, edx
rep movsb
But let the compiler deal with it, that's the absolutely best option - if you REALLY want to do fast memcpy, you need to do much more advanced stuff to make the most of the CPU, like using uncachable writes, [if the memory area is large - not on small copies, but we know the size, so it's easy to figure that one out], SSE registers [except in kernel mode, where saving/restoring SSE registers make a nuisance of itself].
--
Mats
Okay, good to know.
Write a set of functions, each using different methods for solving the same problem.
Then make a loop that runs for X amount of time of method [1, 2, 3, etc] (or X number of iterations), and calculate "number of loops per second". You probably want to use clock() to get a reasonably precise timing, and CLOCKS_PER_SEC to get it into a useful measure. It's a good idea to run for at least a couple of seconds on each method.
The one that runs the most number of loops per second is the fastest one.
In this case, I would also run variations with small and larger amounts of data.
--
Mats
And with compile-time-known sizes and runtime-known sizes. Also, see if VC++ supports profile-driven optimization.
And do not forget - to profile the optimized build, otherwise it has no use