Help: About memcpy()

This is a discussion on Help: About memcpy() within the C Programming forums, part of the General Programming Boards category; In my program, there is a iteration which does loop 1000000 times, each time it calls memcpy(). The iteration can't ...

  1. #1
    Registered User
    Join Date
    Jun 2004
    Posts
    42

    Help: About memcpy()

    In my program, there is a iteration which does loop 1000000 times, each time it calls memcpy(). The iteration can't be improved or replaced for some reason, so I just want to use an optimized memcpy() to replace the standard version. I read some articles about memcpy optimization early years, but I can't remember the details. I have searched some sample code about optimization version, but almost all of them are useless in my program or just be optimized for AMD cpus. So who can give me some advice or sample whatever it is plain C code or embedded assemble code. My CPU is Intel P4. Thanks!

  2. #2
    Registered User linuxdude's Avatar
    Join Date
    Mar 2003
    Location
    Louisiana
    Posts
    926
    are you sure the iteration can't be improved or is this a task you have to do and that is why. If you want I guess instead of a character array you could have an allocated pointer then at the start set it to NULL

  3. #3
    Registered User
    Join Date
    Jun 2004
    Posts
    2

    memcpy

    hi,
    instead of copying bye by byte, you can copy 4 bytes at a time until you need to copy more than 4 bytes, after you can copy byte by byte. it will be faster.

    regards

  4. #4
    #include<xErath.h> xErath's Avatar
    Join Date
    Jun 2004
    Posts
    722

    Post

    What about this?
    Code:
    void *memcpy(void *dest, const void *src, int size){
    	char *td = (char*)dest;
    	char *ts = (char*)src;
    	while(size--!=0)
    		*td++=*ts++;
    	return dest;
    }
    //EDIT
    Or improved

    Code:
    void *memcpy(void *dest, const void *src, int size){
    	int *td = (int*)dest;
    	int *ts = (int*)src;
    	char *cd, *cs;
    	
    	for(;size>4;size-=4)
    		*td++=*ts++;
    
    	cd=(char*)td;
    	cs=(char*)ts;
    	while(size-->0)
    		*cd++=*cs++;
    	return dest;
    }
    Or even better: with 8 byte integers
    Code:
    #ifdef WIN32
    	typedef __int64 INT64;
    #else
    	typedef long long int INT64;
    #endif
    
    void *memcpy(void *dest, const void *src, int size){
    	INT64 *td = (INT64*)dest;
    	INT64 *ts = (INT64*)src;
    	char *cd, *cs;
    	
    	for(;size>8;size-=8)
    		*td++=*ts++;
    
    	cd=(char*)td;
    	cs=(char*)ts;
    	while(size-->0)
    		*cd++=*cs++;
    	return dest;
    }
    Microsoft compilers don't suport "long long int"... bah
    Last edited by xErath; 06-21-2004 at 10:52 PM.

  5. #5
    Registered User
    Join Date
    Jun 2004
    Posts
    42
    I just use MD5 algorithm, the code is from Internet. In wvMD5StoreDigest() and wvMD5Update(), there are totally three memcpy() calls. I have another function (has the large iteration) to call these MD5 function. Here is the function code:
    Code:
    int KeySearch(unsigned char salt[],  
                           unsigned char hashedsalt[],  // 16
    	       unsigned char begin[],      // 5
    	       unsigned long step,
    	       unsigned char validkey[]    //5
    			  )
    {
    	wvMD5_CTX mdContext,mdContext2;
    	U8 pwarray[64];
    	rc4_key key;
    	unsigned long *pKey = (unsigned long *)pwarray;
    	unsigned long i;
    
    	memset (pwarray, 0, 64);
    	memcpy (pwarray, begin, 5);
    	pwarray[9] = 0x80;
    	pwarray[56] = 0x48;
    	salt[16] = 0x80;
    	memset (salt + 17, 0, 47);
    	salt[56] = 0x80;
    	for (i=0; i < step; ++i) {
    
    		wvMD5Init (&mdContext);
    		wvMD5Update (&mdContext, pwarray, 64);
    		wvMD5StoreDigest (&mdContext);
            	(*pKey)++;
         		prepare_key (mdContext.digest, 16, &key);
    		rc4 (salt, 16, &key);
    		rc4 (hashedsalt, 16, &key);
    
    
    		wvMD5Init (&mdContext2);
    		wvMD5Update (&mdContext2, salt, 64);
    		wvMD5StoreDigest (&mdContext2);
    
    	  	if ((memcmp (mdContext2.digest, hashedsalt, 16)) == 0) {
    			memcpy(validkey, mdContext.digest, 5);
    			return 1;
    		}
    	}
    	return 0;
    }
    Last edited by Salem; 06-22-2004 at 12:33 AM. Reason: tagging

  6. #6
    #include<xErath.h> xErath's Avatar
    Join Date
    Jun 2004
    Posts
    722
    naruto please use code tags.
    Do you know where all those functions come from???
    Bah I find my code much simple.

  7. #7
    Registered User
    Join Date
    Jun 2004
    Posts
    42
    Quote Originally Posted by xErath
    naruto please use code tags.
    Do you know where all those functions come from???
    Bah I find my code much simple.
    Sorry, I'm new here and I have not read the readme of this forum, lazy, lazy

    I' ve use your code (8bytes and 4bytes) in my program. Before using your code, the running time of my program is 43 days; by using your code to replace the standard memcpy(), the running time is 38 days. My complier is Borland C++ 5.5.

    Would you mind give me your e-mail that I can send you the code.

  8. #8
    #include<xErath.h> xErath's Avatar
    Join Date
    Jun 2004
    Posts
    722
    Hum, it depends on what I can do... :P
    5days?? that in fact is an improval

    Still.. 38 days???
    You'll have to merge function to prevent stack push/pops and function calls, plus you should optimize the function for each diferent call context. What about assembly??
    I don't know if Borland suports this, but microsft compilers do, plus this is Microsofts assembly sintax
    Code:
    void *memcpy(void *_dest, const void *_src, int _size){
    __asm{
    	push eax
    	push ecx
    	push edi
    	push esi
    	mov edi, _dest	;edi destiny, esi source
    	mov esi, _src
    	mov ecx, _size
    
    	cmp ecx, 4	;if size<4
    	jl _lbl1
    _cicle1:		;copy 4bytes by 4 bytes
    		mov eax, dword ptr [esi]
    		mov dword ptr [edi], eax
    		add edi, 4
    		add esi, 4
    		sub ecx, 4
    		cmp ecx, 4
    		jge _cicle1
    _lbl1:
    	cmp ecx, 0	;if size<=0
    	jle _lbl2_end
    _cicle2:		;copy each byte at a time
    		mov al, byte ptr [esi]
    		mov byte ptr [edi], al
    		inc edi
    		inc esi
    		loop _cicle2
    _lbl2_end:
    	pop esi
    	pop edi
    	pop ecx
    	pop eax
    	}
    	return _dest;
    }
    This is extremetly optimized compared to C... Now you'll have to embed this with you source for Borland compilers.
    Last edited by xErath; 06-22-2004 at 12:36 AM.

  9. #9
    Registered User
    Join Date
    Jun 2004
    Posts
    42
    As I know the fastest running speed can be 20 days( The code must be different but the aim is same). All code of this program is not written by me. I have no experience on code optimzation although I'm doing this job. The motivation is that I've read Michael Abrash's book(game programming black book) and I remember that he mentioned in his book about the optimzation of memcpy() but I can't find the article of this topic in that book.

    You say you found your code much simple, what code?

  10. #10
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,484
    Mmm, brute force attack on a crypto system.
    Doesn't matter what you do, you'll never turn those days into hours unless you have a room full of PC's all working on the same problem.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  11. #11
    #include<xErath.h> xErath's Avatar
    Join Date
    Jun 2004
    Posts
    722
    Using this main, it took only 9 seconds to cpy 50*10^8 bytes of data
    Code:
    main(){
    	char s[51],d[51];
    	int i, t;
    
    	t=time(0);
    	for(i=0;i<100000000;i++)
    		memcpy(d, s, 50);
    	
    	printf("TIME: %d_\n", time(0)-t);
    }
    Do you think that it is something worth?? Things can't get ant better than assembly.

  12. #12
    #include<xErath.h> xErath's Avatar
    Join Date
    Jun 2004
    Posts
    722
    Quote Originally Posted by naruto
    You say you found your code much simple, what code?
    The copy stuff with 8 byte integers.

  13. #13
    Registered User
    Join Date
    Jun 2004
    Posts
    42

    Smile

    Quote Originally Posted by Salem
    Mmm, brute force attack on a crypto system.
    Doesn't matter what you do, you'll never turn those days into hours unless you have a room full of PC's all working on the same problem.
    Bingo! But it is a legal "attack"

    It's a distributed program.

  14. #14
    Registered User
    Join Date
    Jun 2004
    Posts
    42

    Smile

    Quote Originally Posted by xErath
    Using this main, it took only 9 seconds to cpy 50*10^8 bytes of data
    Code:
    main(){
    	char s[51],d[51];
    	int i, t;
    
    	t=time(0);
    	for(i=0;i<100000000;i++)
    		memcpy(d, s, 50);
    	
    	printf("TIME: %d_\n", time(0)-t);
    }
    Do you think that it is something worth?? Things can't get ant better than assembly.

    Some guys has modified it by using assembly to replace some parts of the code (nobody re-write memcpy), but the assembly edition is not effective than what I have done. I'm not understand the zen of code optimization, but the only thing I've learn from michael abrash's book is "do not to optimize a program which is not good design. If you use assembly in a bad design program, it may be not faster than C. The right situation to use assembly to optimize is that you have really understood the effect of each instruction which you have written.".

    I only can remember the thinking he has said, but I can't do according to his thinking

  15. #15
    #include<xErath.h> xErath's Avatar
    Join Date
    Jun 2004
    Posts
    722
    Basicly, what he's saying is "If you use assembly in a bad design program", if you write lame assemlby you get crapy code... Duh... That's for every stuff, even worse if you have a lame compiler. Believe me ... the assembly code I posted is quite small.. probably you could cut 2 or 3 instructions... It does the same stuf as the memcpy version with 4 bit integers copy. If you have any doubts, try both.
    Then again if something is pourly designed, re-design it well.

    //edit
    I did that main stuff with both the C and assembly memcpy, this time to copy 50*10^9 bytes of data (50GB): the C function did it in 134 second, the assembly in 81 seconds. Nice!
    Last edited by xErath; 06-22-2004 at 10:46 AM.

Page 1 of 3 123 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Disagreement about memcpy
    By ch4 in forum C Programming
    Replies: 9
    Last Post: 05-28-2009, 10:12 AM
  2. Replies: 14
    Last Post: 06-28-2006, 01:58 AM
  3. Memcpy(); Errors...
    By Shamino in forum C++ Programming
    Replies: 4
    Last Post: 03-24-2006, 10:35 AM
  4. memcpy with 128 bit registers
    By grady in forum Linux Programming
    Replies: 2
    Last Post: 01-19-2004, 05:25 AM
  5. memcpy
    By doubleanti in forum C++ Programming
    Replies: 10
    Last Post: 02-28-2002, 03:44 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21