Which one is faster?
Code:int a; for(a = 0; a < 100; a++) memory[a] = input[a];Code:CopyMemory(memory, input, a);
Which one is faster?
Code:int a; for(a = 0; a < 100; a++) memory[a] = input[a];Code:CopyMemory(memory, input, a);
Probably the latter.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
The latter. The compiler might have some tricks up its sleeve to optimize it.
All the buzzt!
CornedBee
"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
Why not try both and find out?
Definately the later. Most compilers generate string instructions for block memory copies. The former solution causes it to compute the index into each array each time, thus using clock cycles. If you have some control over the araibles, there are inline assembly routines that are as fast as possible.
Code:DWORD ByteCount = 100 * sizeof(input[0]); __asm { mov esi, input mov edi, memory mov ecx, ByteCount rep movsb }
Surely you would want to use MOVSD at the very least. Something like this is what the compiler usually comes up with:
That would probably execute roughly four times faster than abachler's code for anything in the "more than a dozen bytes" section.Code:MOV esi, input MOV edi, memory MOV ecx, ByteCount mov edx, ecx and edx, 3 shr ecx, 2 rep movsd mov ecx, edx rep movsb
But let the compiler deal with it, that's the absolutely best option - if you REALLY want to do fast memcpy, you need to do much more advanced stuff to make the most of the CPU, like using uncachable writes, [if the memory area is large - not on small copies, but we know the size, so it's easy to figure that one out], SSE registers [except in kernel mode, where saving/restoring SSE registers make a nuisance of itself].
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
Write a set of functions, each using different methods for solving the same problem.
Then make a loop that runs for X amount of time of method [1, 2, 3, etc] (or X number of iterations), and calculate "number of loops per second". You probably want to use clock() to get a reasonably precise timing, and CLOCKS_PER_SEC to get it into a useful measure. It's a good idea to run for at least a couple of seconds on each method.
The one that runs the most number of loops per second is the fastest one.
In this case, I would also run variations with small and larger amounts of data.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
And with compile-time-known sizes and runtime-known sizes. Also, see if VC++ supports profile-driven optimization.
All the buzzt!
CornedBee
"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
And do not forget - to profile the optimized build, otherwise it has no use
All problems in computer science can be solved by another level of indirection,
except for the problem of too many layers of indirection.
– David J. Wheeler