memcpy with 128 bit registers

This is a discussion on memcpy with 128 bit registers within the Linux Programming forums, part of the Platform Specific Boards category; I wanted to know if using the large sse registers in a function like memcpy would be faster than standard ...

  1. #1
    Registered User grady's Avatar
    Join Date
    Oct 2003
    Posts
    27

    memcpy with 128 bit registers

    I wanted to know if using the large sse registers in a function like memcpy would be faster than standard memcpy. It seems that it is, but not by too much. I think my benchmark is too unscientific to mean much with such small differences in the performance; the results are all over the place. At worst the sse memcpy's can be 5% slower than standard memcpy but this is rare. Usually unaligned sse memcpy is 10% faster and aligned is 30% faster. At best the sse memcpy is ~225% faster but this is very rare.

    Don't put much stock in this. This is mostly just a curiosity I wanted to post. All I can say for sure is aligned memory moving is always faster than unaligned, which it should be, and both functions are almost always faster than standard memcpy.

    The test program takes two command line arguments, the number of megabytes to allocate, and the number of times to call each function for the average time per call. There is an instruction, emms, that can be removed in both of the memcpysse commands at the bottom.

    testmemcpy.c
    memcpysseu.s
    memcpyssea.s
    Last edited by grady; 01-16-2004 at 01:48 PM.

  2. #2
    &TH of undefined behavior Fordy's Avatar
    Join Date
    Aug 2001
    Posts
    5,789
    One query would be the version of memcpy used. Is it a portable C implementation or an optimised assembler wrapper using the standard REPed memory moving instructions?

    I would think that the standard registers and standard instructions would be more optimised for memory movement than the SSE registers....but I dont know for sure and I suppose it would depend on a lot of things

  3. #3
    Registered User grady's Avatar
    Join Date
    Oct 2003
    Posts
    27
    I have been trying to make the test program better since I posted last, and I think you are right regarding the standard register and standard instructions being more optimized. My function gets worse and worse as I find ways to make the test more reasonable.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. 32 bit to 64 bit Ubuntu
    By Akkernight in forum Tech Board
    Replies: 15
    Last Post: 11-17-2008, 03:14 AM
  2. 128 bit uchar array? please help
    By brooksbp in forum C Programming
    Replies: 4
    Last Post: 07-31-2008, 02:41 AM
  3. 32 bit or 64 bit allignment ?! gcc options??
    By mynickmynick in forum C Programming
    Replies: 3
    Last Post: 07-29-2008, 03:43 AM
  4. Replies: 7
    Last Post: 12-10-2004, 08:18 AM
  5. Array of boolean
    By DMaxJ in forum C++ Programming
    Replies: 11
    Last Post: 10-26-2001, 12:45 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21