memcpy() faster?

This is a discussion on memcpy() faster? within the C Programming forums, part of the General Programming Boards category; Hi, For copying data from a buffer to a struct, it faster to do a memcpy() or copy things manually. ...

  1. #1
    Registered User
    Join Date
    Sep 2003
    Posts
    224

    memcpy() faster?

    Hi,

    For copying data from a buffer to a struct, it faster to do a memcpy() or copy things manually. For example:
    Code:
    char buf[12];
    
    typedef struct A
    {
      int a;
      int b;
      int c;
    } A;
    
    A a;
    
    memcpy(&a, buf, sizeof(A));
    
    // or
    
    a.a = *(int *) buf;
    a.b = *(int *) (buf + 4);
    a.c = *(int *) (buf + 8);
    I think doing it manually would be faster, but I'm not sure. Moreover, when loading a 32 bit word, does the CPU do four fetches, or are the four bytes all sent in one shot?

    Thanks.

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,532
    It's certainly safer to do a memcpy.
    Casting a pointer might get you an alignment exception.

    > does the CPU do four fetches, or are the four bytes all sent in one shot?
    Well that depends on how it handles potential mis-alignments.
    If it knows it's good, it should be a single bus transfer, but then you're assuming that the processor bus width is the same as the memory bus width.

    The memcpy() implies a function call, but again that might be optimised out by the compiler.

    In short, go for correctness first.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Registered User
    Join Date
    Jun 2005
    Posts
    6,287
    memcpy() is one of those functions that is often inlined by an optimising compiler, so avoids function call overhead. It is also one of those functions that is rarely (when you get down to machine code) implemented using a loop: it's implementation often makes use of dedicated machine instructions, as a lot of machines are able to copy memory from one location to another using a fixed number of machine instructions (eg on some machines this amounts to pushing the two addresses and the size to specific machine registers, and then executing a single instruction for the actual operation) regardless of how much memory is being copied.

    As a result of all this, using memcpy() is often the fastest alternative, as well as the simplest correct one [as noted by Salem].

    The obvious caveat on this statement is that memcpy() can only be faster if the target machine supports suitable instructions -- i.e. it is not necessary to do the copying using a loop. In practice, most machines do support such instructions, but if performance matters [i.e. profiling has identified the function doing the memcpy() as a performance hotspot] , you should test to be sure which approach is faster.

  4. #4
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,596
    Memcpy is accomplished via a REP MOVSB. Of course the operation will be faster if your memory chunk is a power of 2 so you don't leave any hanging bytes. AFAIK memcpy only copies bytes and therefore does not use REP MOVSW or REP MOVSD. It is possible that memcpy is smart enough to figure out when to use certain opcodes, however, I believe it only uses REP MOVSB.
    This means that any code using REP MOVSW would be, in theory, twice as fast as REP MOVSB and anything using MOVSD would be four times as fast as using REP MOVSB. It is unfortunate that we do not have a REP MOVSQ which would move 64-bits at a time - perhaps later we will have this.


    Memcpy does have some overhead associated with it. If you know assembly it is faster at times to inline the copy code using hand-coded optimized assembly. Memcpy takes into account a lot of situations that may never occur in your application. Since it is designed for correctness and robustness as Salem has pointed out, it is by nature inherently slower due to these checks. But in order to be universally functional in nearly every situation, memcpy must perform these checks.

    You would have to look at the final assembly to be sure which was fastest.

    Also I doubt using memcpy or using inline asm is going to make enough of a difference to be noticed, unless you are doing so in a time-critical loop such as the rendering function of a 3D engine. For routine copies from A to B, memcpy is definitely the way to go.
    Last edited by VirtualAce; 04-10-2006 at 11:22 PM.

  5. #5
    Dump Truck Internet valis's Avatar
    Join Date
    Jul 2005
    Posts
    357
    I know the ia-32 linux version of memcpy only uses movsb if you don't have the right options (sadly) (or 3dnow if you have it).
    There are much speedier methods of copying data using other extensions like mmx (I think linux uses this as well) and the fpu.
    Last edited by valis; 04-11-2006 at 12:30 AM.

  6. #6
    Registered User joed's Avatar
    Join Date
    Mar 2004
    Posts
    59
    I'm stupid
    Last edited by joed; 04-11-2006 at 05:21 PM. Reason: wrong answer

  7. #7
    Registered User
    Join Date
    Sep 2003
    Posts
    224
    Thanks for all the responses.
    I'm not using GCC, but a compiler for pSOS on a Motorola CPU. I'm not sure how memcpy() is implemented or even what the compiler is called.

  8. #8
    Dump Truck Internet valis's Avatar
    Join Date
    Jul 2005
    Posts
    357
    What is that an excerpt of? Because that just saves the callers stack, makes room for some locals and then restores the caller's stack.

  9. #9
    Registered User joed's Avatar
    Join Date
    Mar 2004
    Posts
    59
    Sorry, I did it wrong:

    Code:
    	movl	_buf, %eax
    	pushl	%ebp
    	movl	%eax, _a
    	movl	%esp, %ebp
    	movl	_buf+4, %eax
    	popl	%ebp
    	movl	%eax, _a+4
    	movl	_buf+8, %eax
    	movl	%eax, _a+8
    	ret
    Memcpy with GCC -O2 produces the same code as the non-memcpy version.

  10. #10
    Registered User
    Join Date
    Jun 2005
    Posts
    6,287
    Quote Originally Posted by Bubba
    Memcpy is accomplished via a REP MOVSB. Of course the operation will be faster if your memory chunk is a power of 2 so you don't leave any hanging bytes. AFAIK memcpy only copies bytes and therefore does not use REP MOVSW or REP MOVSD. It is possible that memcpy is smart enough to figure out when to use certain opcodes, however, I believe it only uses REP MOVSB.
    This means that any code using REP MOVSW would be, in theory, twice as fast as REP MOVSB and anything using MOVSD would be four times as fast as using REP MOVSB. It is unfortunate that we do not have a REP MOVSQ which would move 64-bits at a time - perhaps later we will have this.
    The above is specific to a specific combination (or some specific combinations) of compiler, library, and target system. It is not universally true.

    One common mistake that people make is to assume that every programmer uses the same hardware, operating system, and compilers as they do.

  11. #11
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,006
    This thread reminded me of an article I'd read a while back, Optimizing Memcpy improves speed.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  12. #12
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,532
    Nice link there Dave.
    I was thinking about "Premature Optimization Disease"

    I see Yasir_Malik hasn't said what put the data into buf in the first place. If it's from some external interface, there may be an endian problem which needs solving as well.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  13. #13
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,596
    Very nice article. It follows closely with what I've experienced in my own code and thus began writing my own memcpy functions - in areas where it proved faster.

    Note that this is important to the OP. Coding your own memcpy is NOT always faster as the article also states.

    Saving cycles through memcpy() is probably your last line of defense against performance issues if you ask me. There will be plenty more areas that need addressed far before this one.

    EDIT:

    MMX cannot copy from memory to memory so I don't see how this would speed anything up.

    Code:
    void Copy32(DWORD *pSource,DWORD *pTarget,DWORD dwLength)
    {
    __asm {
      mov esi,pSource
      mov edi,pTarget
      mov ecx,dwLength
      rep movsd
      }
    }
    This is a simple 32-bit copy from pSource to pTarget for length dwLength. It assumes that dwLength is evenly divisible by 4, thus data alignment is crucial to the operation of this function.
    I have found this very simple memcpy, when coded right as pointed out by Fordy some time ago, is quite fast although lacking in robustness. It was simply hardcoded for what I needed at the time and worked quite well. Note that I cannot always rely on this function in all cases.

    Again, as the article states, the application and environment as well as context all play a part in the final outcome.
    Last edited by VirtualAce; 04-12-2006 at 12:35 AM.

  14. #14
    cwr
    cwr is offline
    Registered Luser cwr's Avatar
    Join Date
    Jul 2005
    Location
    Sydney, Australia
    Posts
    868
    And note that rep movsd is often slower than other methods to copy a chunk of data from one place to another.

    Also, that function assumes the direction flag is cleared, but something else might have set it in some other code.

  15. #15
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,596
    And note that rep movsd is often slower than other methods to copy a chunk of data from one place to another.
    Explain.

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Disagreement about memcpy
    By ch4 in forum C Programming
    Replies: 9
    Last Post: 05-28-2009, 10:12 AM
  2. Replies: 14
    Last Post: 06-28-2006, 01:58 AM
  3. Memcpy(); Errors...
    By Shamino in forum C++ Programming
    Replies: 4
    Last Post: 03-24-2006, 10:35 AM
  4. memcpy with 128 bit registers
    By grady in forum Linux Programming
    Replies: 2
    Last Post: 01-19-2004, 05:25 AM
  5. memcpy
    By doubleanti in forum C++ Programming
    Replies: 10
    Last Post: 02-28-2002, 03:44 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21