Thread: Copying constant amount of data

  1. #1
    Algorithm engineer
    Join Date
    Jun 2006
    Posts
    286

    Copying constant amount of data

    I don't really know how the cpu does to copy a number of bytes; does it take as long time to copy a whole word (4 bytes on a 32-bit system) as it takes to copy 4 bytes separatelly? Which sizes can the cpu copy at a time, can it for exampe copy exactly 3, or 5 bytes at a time? And does every copy take the same time?

    Here's an example: If n and m in known to the compiler during compilation, what is the fastest way of copying at least n bytes from one place in memory to another, and at most m bytes? In one special case (color channels using SDL), there are 3 bytes which need to be copied. One could think the fastest way is to copy byte by byte. But in this case, there's a fourth byte in both the source and at the destination (since the video mode is put to 32 bits/pixel), which makes it possible to copy 4 bytes at a time, which I guess should make a faster copy (3 times faster?). Here n = 3 and m = 4.

    The compiler won't be able to optimize this for me, or is it? Since it probably doesn't know m, just n.
    Come on, you can do it! b( ~_')

  2. #2
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    In general, this is a question that is not really C++ related as much as it is CPU related, and as such, there is no real answer to it. If you want to deal with specific CPUs and such, then yes, you can squeeze extra performance out of knowing such information. In these days, however, I would not entirely underestimate modern compilers and their ability to optimize code.

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Copying a single 32-bit integer is faster than 4 individual byte copies, yes. However it also takes more memory, and if you have enough pixels then that makes a difference in the speed too. When just plain copying a row of pixels between bitmaps, both can be done using a memcpy that may involve copying multiple bytes at once. So in some cases 24-bit can be faster.
    Brief experiments with my own software 3D engine put 24-bit rendering just 15% below 32-bit, speedwise.

    You haven't said what you're doing exactly so I can't advise you which way to go at this stage.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by TriKri View Post
    I don't really know how the cpu does to copy a number of bytes; does it take as long time to copy a whole word (4 bytes on a 32-bit system) as it takes to copy 4 bytes separatelly? Which sizes can the cpu copy at a time, can it for exampe copy exactly 3, or 5 bytes at a time? And does every copy take the same time?
    A x86 CPU can copy 1, 2, or 4 bytes at a time. Which one is used depends on the type of the data: characters are copied 1 byte at a time, short 2 bytes at a time etc. There are special also operations for copying large blocks of data.

    In one special case (color channels using SDL), there are 3 bytes which need to be copied. One could think the fastest way is to copy byte by byte. But in this case, there's a fourth byte in both the source and at the destination (since the video mode is put to 32 bits/pixel), which makes it possible to copy 4 bytes at a time, which I guess should make a faster copy (3 times faster?). Here n = 3 and m = 4.
    Depends on the function used to do the copy. If you reinterpret_cast the 4 bytes to int, then it should copy all 4 bytes at once. If you use memcpy or similar functions, then it is likely that the compiler will copy one byte at a time, but it may also do checks to see if a multi-byte operation is a better option. You can also use a POD struct for the three bytes, and rely on the compiler to decide to implement it with an extra padded bit or not. The padding will make it possible for the compiler to automatically use 4 byte copy to copy the struct.

    Here's an example: If n and m in known to the compiler during compilation, what is the fastest way of copying at least n bytes from one place in memory to another, and at most m bytes?

    The compiler won't be able to optimize this for me, or is it? Since it probably doesn't know m, just n.
    There is no built-in function to tell the compiler to preform such an operation. You can influence the operation used to copy the data by changing the type of the data. So if you want data to be copied four bytes at a time, you could cast to int array, and make sure that the first element lies on an even and possibly multiple of 4 (not sure how x86 does it).

    But the truth is for small n, it doesn't really matter, and for large n, you want to use functions like memcpy that can take advantage of the ability to copy large data blocks.
    Last edited by King Mir; 07-11-2008 at 10:44 PM.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Always use memcpy() for copying bytes. This little function is so essential that an incredible amount of optimization effort goes into it. Typically, compilers recognize a call to memcpy and use all the static flow information available to them (and profiling information, if you use profile-guided optimization) to select the best method of copying for that particular CPU. It might decided to copy the data in 16-byte blocks using the SSE load and store instructions, for example. If the particular hardware has a special memory transfer engine, the compiler might decide to call to that. Etc, etc.

    Trust the compiler.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    Algorithm engineer
    Join Date
    Jun 2006
    Posts
    286
    Yes, but the compiler cannot know everything. So in this case with SDL for example, do you mean I should use memcpy(dest, source, 3); instead of *(long*)dest = *(long*)source? The original probem is to copy a specific color, a BPP bytes big array containing the values for the different channels, into each pixel of an image. Also each pixel takes up BPP bytes of memory. The thing is that it is allowed to copy BPP bytes (4 in this case), but it only has to copy NC*sizeof(T) bytes (3 in this case, the last byte is unused), where NC is the number of channels and T the data type containing the value for each channel.

    Currently I have this inline member function to do this for me:
    Code:
    template<class T, uint NC, size_t BPP>
    inline void mp_image<T, NC, BPP>::CopyColor(byte *dest, byte *source)
    {
        uint i = 0;
        while ( i < (NC * sizeof(T))/sizeof(int)*sizeof(int)           ||
    	    i <  NC * sizeof(T) && i < BPP/sizeof(int)*sizeof(int)  ) {
    	*(int*)dest = *(int*)source;
    	dest += sizeof(int);
    	i += sizeof(int);
        }
        if ( i < (NC * sizeof(T))/sizeof(short)*sizeof(short)           ||
    	 i <  NC * sizeof(T) && i < BPP/sizeof(short)*sizeof(short)  ) {
    	*(short*)dest = *(short*)source;
    	dest += sizeof(short);
    	i += sizeof(short);
        }
        if ( i < NC * sizeof(T)) {
    	*dest = *source;
    	dest++;
    	i++;
        }
    }
    I don't know if this is good or if it is just stupid, but in the case with SDL this function would just take and copy one int. byte here is an integer datat type, same as unsigned char.
    Last edited by TriKri; 07-12-2008 at 04:30 AM.
    Come on, you can do it! b( ~_')

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    do you mean I should use memcpy(dest, source, 3); instead of *(long*)dest = *(long*)source
    Well, apart from your form copying 8 bytes on my system, you'd simply be lying to the compiler in one case. One of those two copies 3 bytes, the other 4. Yes, the compiler doesn't know it's safe to copy 4 bytes (and thus use a block transfer), but you could trivially supply it with this information by passing 4 to memcpy.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Algorithm engineer
    Join Date
    Jun 2006
    Posts
    286
    Quote Originally Posted by CornedBee View Post
    Well, apart from your form copying 8 bytes on my system...
    Oops, I meant a 32 bit integer, thought long always was that, but maybe not...

    You're right, maybe just passing BPP to memcpy, I realized that BPP will probably be optimized purely for this reason. Thanks!
    Come on, you can do it! b( ~_')

  9. #9
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Oops, I meant a 32 bit integer, thought long always was that, but maybe not...
    GCC under a 64-bit system uses 8-byte longs.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  10. #10
    Algorithm engineer
    Join Date
    Jun 2006
    Posts
    286
    It can be good to know. Is there any way to tell the word size in compile time, if you're running a 16 bit system, 32 bit system or a 64 bit system (though I guess no one uses a 16 bit system these days)? Maybe a compile time flag?
    Come on, you can do it! b( ~_')

  11. #11
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    Closest portable method I believe you can use to determine the type of system is to do a sizeof(void *). If you want it in bits, multiply by 8 obviously. This depends upon the compiler, though, more than the system on which you're compiling.

  12. #12
    Algorithm engineer
    Join Date
    Jun 2006
    Posts
    286
    I need to do it in precompiling to be able to set typedefs, but the precompiler doesn't recognize the sizeof operator...
    Come on, you can do it! b( ~_')

  13. #13
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    It does in C, and my understanding is that C++ resolves sizeof() when it can at compile-time and leaves only the ones it can't for run-time. Something like sizeof(void*) should work since it is clearly constant.

    Code:
    #include <iostream>
    
    #define TEH_SIZE sizeof(void *)
    
    int main()
    {
    	std::cout << "Architecture is " << ((TEH_SIZE)*8) << " bits." << std::endl;
    	return 0;
    }
    This produces the following output on a 32-bit Windows XP machine using MinGW:

    Code:
    Architecture is 32 bits.

  14. #14
    Algorithm engineer
    Join Date
    Jun 2006
    Posts
    286
    Yes, but sizeof is still not be interpreted by the precompiler. I need to do a typedef for intw and uintw, which has the same size as a word. What I thought of was something like

    Code:
    #if    (sizeof(void*) == 16)  //16-bit system
    typedef   int16_t   intw;
    typedef  uint16_t  uintw;
    #elif  (sizeof(void*) == 32)  //32-bit system
    typedef   int32_t   intw;
    typedef  uint32_t  uintw;
    #elif  (sizeof(void*) == 64)  //64-bit system
    typedef   int64_t   intw;
    typedef  uint64_t  uintw;
    #endif
    But this fails.
    Last edited by TriKri; 07-12-2008 at 06:12 AM.
    Come on, you can do it! b( ~_')

  15. #15
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    The number of bits in a char is, strictly speaking, able to have values other than 8. Therefore it is appropriate to remove the assumption of 8-bit chars, and express that as;
    Code:
    #include <iostream>
    #include <climits>
    
    #define THE_SIZE (sizeof(void *))
    
    int main()
    {
    	std::cout << "Architecture is " << (THE_SIZE*CHAR_BIT) << " bits." << std::endl;
    	return 0;
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. xor linked list
    By adramalech in forum C Programming
    Replies: 23
    Last Post: 10-14-2008, 10:13 AM
  2. question about a working linked list
    By cold_dog in forum C++ Programming
    Replies: 23
    Last Post: 09-13-2006, 01:00 AM
  3. Reading a file with Courier New characters
    By Noam in forum C Programming
    Replies: 3
    Last Post: 07-07-2006, 09:29 AM
  4. [question]Analyzing data in a two-dimensional array
    By burbose in forum C Programming
    Replies: 2
    Last Post: 06-13-2005, 07:31 AM
  5. All u wanted to know about data types&more
    By SAMSAM in forum Windows Programming
    Replies: 6
    Last Post: 03-11-2003, 03:22 PM