Which is faster?

**Yarin** · 02-04-2008

Which one is faster?

Code:

int a;
for(a = 0; a < 100; a++)
   memory[a] = input[a];

Code:

CopyMemory(memory, input, a);

**matsp** · 02-04-2008

Probably the latter.

--
Mats

**CornedBee** · 02-04-2008

The latter. The compiler might have some tricks up its sleeve to optimize it.

**cpjust** · 02-04-2008

Why not try both and find out?

**abachler** · 02-04-2008

Definately the later. Most compilers generate string instructions for block memory copies. The former solution causes it to compute the index into each array each time, thus using clock cycles. If you have some control over the araibles, there are inline assembly routines that are as fast as possible.

Code:

 
DWORD ByteCount = 100 * sizeof(input[0]);
 
__asm {
 
mov esi, input
mov edi, memory
mov ecx, ByteCount
 
rep movsb
 
}

**matsp** · 02-04-2008

Originally Posted by abachler

Definately the later. Most compilers generate string instructions for block memory copies. The former solution causes it to compute the index into each array each time, thus using clock cycles. If you have some control over the araibles, there are inline assembly routines that are as fast as possible.

Code:

 
DWORD ByteCount = 100 * sizeof(input[0]);
 
__asm {
 
MOV esi, input
MOV edi, memory
MOV ecx, ByteCount
 
REP MOVSB
 
}

Surely you would want to use MOVSD at the very least. Something like this is what the compiler usually comes up with:

Code:

MOV esi, input
MOV edi, memory
MOV ecx, ByteCount
mov  edx, ecx
and  edx, 3
shr  ecx, 2
rep movsd
mov ecx, edx
rep movsb

That would probably execute roughly four times faster than abachler's code for anything in the "more than a dozen bytes" section.

But let the compiler deal with it, that's the absolutely best option - if you REALLY want to do fast memcpy, you need to do much more advanced stuff to make the most of the CPU, like using uncachable writes, [if the memory area is large - not on small copies, but we know the size, so it's easy to figure that one out], SSE registers [except in kernel mode, where saving/restoring SSE registers make a nuisance of itself].

--
Mats

**Yarin** · 02-04-2008

Okay, good to know.

**abh!shek** · 02-05-2008

Originally Posted by cpjust

Why not try both and find out?

How exactly do I execute two codes and find out which one is faster ?

**matsp** · 02-05-2008

Originally Posted by abk

How exactly do I execute two codes and find out which one is faster ?

Write a set of functions, each using different methods for solving the same problem.
Then make a loop that runs for X amount of time of method [1, 2, 3, etc] (or X number of iterations), and calculate "number of loops per second". You probably want to use clock() to get a reasonably precise timing, and CLOCKS_PER_SEC to get it into a useful measure. It's a good idea to run for at least a couple of seconds on each method.

The one that runs the most number of loops per second is the fastest one.

In this case, I would also run variations with small and larger amounts of data.

--
Mats

**CornedBee** · 02-05-2008

And with compile-time-known sizes and runtime-known sizes. Also, see if VC++ supports profile-driven optimization.

**vart** · 02-05-2008

And do not forget - to profile the optimized build, otherwise it has no use

Thread: Which is faster?

Thread Tools

Search Thread

Display

Which is faster?

Similar Threads

Faster bitwise operator

Faster way of printing to the screen

Computations - which is faster?

does const make functions faster?

Floating point faster than fixed-point