Thread: optimizing bubble sort

  1. #1
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166

    Angry optimizing bubble sort

    Hello everyone,

    at first I didn't care about the speed of my bubble sort function - a working program was/is enough ... but today I found out that some newbie was able to write a faster bubble sort function than me. So, I decided to tweak the sourcecode a little bit.
    I tried to reach the performance of the bubble sort function (sourcecode still undisclosed) written by one of our tutors at my university. My function is still about 30% slower than his and I don't know why.
    Here is my current sourcecode - I hope you can help me to make it faster and find possible bugs (I guess there are some but I am too tired at the moment to think clearly):
    Code:
    #define N 10240
    ...
    int main()
    {
            int array[N];
            ...
            return 0;
    }
    ...
    int adv_bubble(int array[])
    {
            // strange but register makes a difference on cygwin gcc 3.2
    	register int i, tmp, tmp1, test, *pointer;
    	
    	counter = N-2; 
    	
    	do
    	{
    		test = 1;
    		i = 0;
    		pointer = array;
    		
    		do
    		{
    			// new code ~25% faster
    			if (*pointer > *(pointer+1))
    			{
    	  			tmp = *pointer;
    	  			*pointer = *(pointer+1);
    	  			*(pointer+1) = tmp;
    	  			test = 0;
    			}
    			// old code -> slow
    			/*if (pointer[i] > pointer[i+1])
    			{
    	  			tmp = pointer[i+1];
    	  			pointer[i+1] = pointer[i];
    	  			pointer[i] = tmp;
    	  			test = 1;
    			}*/
    			++pointer;
    		
    		//	i++ and counter = N-2 is faster than ++i and counter = N-1
    		// 	I guess I am making something wrong here ... though the array is sorted
    		//	correctly ...
    		}while(i++ < counter);
    		
    		if(test) return 1;
    		
    	}while(--counter >= -1);
    		
    	return 1;
    }
    Thank you for your help.

  2. #2
    Blank
    Join Date
    Aug 2001
    Posts
    1,034
    You really shouldn't be wasting your time optimizing bubble
    sort Are you even using -O2 with gcc? I don't think
    you would have the problem with register if you did.

    This is probably fairly fast on almost
    sorted input.

    Code:
    void bubble_sort(int A[], int n)
    {
        int i, j;
        int clean_pass = 0;
        
        for (i = 0; !clean_pass && i < n - 1; ++i) {
            clean_pass = 1;
            for (j = i + 1; j < n; ++j) {
                if (A[i] > A[j]) {
                    int t = A[i];
                    A[i] = A[j];
                    A[j] = t;
                    clean_pass = 0;
                }
    
            }
        }
    }

  3. #3
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    Thank you for the sourcecode, Nick. You're right, the code is fast, but mine is about 10% faster. I don't know what my tutor could've done to make the C code fly without using compiler optimizations or a faster computer ... and I've yet to find a tutor at my university who knows more about C and sourcecode optimizations than me ... but maybe I've found my master?
    Of course, compiler optimizations can make the program fly but I wanted to know how much more speed is possible without using them.

  4. #4
    Registered User
    Join Date
    Dec 2001
    Posts
    88
    Originally posted by Sargnagel
    Thank you for the sourcecode, Nick. You're right, the code is fast, but mine is about 10% faster. I don't know what my tutor could've done to make the C code fly without using compiler optimizations or a faster computer ... and I've yet to find a tutor at my university who knows more about C and sourcecode optimizations than me ... but maybe I've found my master?
    Of course, compiler optimizations can make the program fly but I wanted to know how much more speed is possible without using them.
    you are wasting your time.

    I tried once to optimize bubblesort but the speed depends on the compiler.

    You have to turn on optimization otherwise a
    if(a<b)
    {
    t=a;
    a=b;
    b=a;
    }

    is in assembler a
    cmp ; compare both values
    mov ; move both values in register
    mov

    mov
    mov
    mov ; switch both values

    but with optimizations you may get
    cmp ; compare both values
    xchg ; change both variables

    oder xchg is subsituted with
    mov
    mov
    because the values already were in registers because of the cmp

    you have to look at the assembler code taht your compiler produce, otherwise you don't have a chanze.
    Hope you don't mind my bad english, I'm Austrian!

  5. #5
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    Thanks alot, Shade! I will have a look at the assembler code ... but first I will have to learn assembler (hello, rafe ).

    I will check out some other compilers, too.

    ... yes, it is a waste of time optimizing bubble sort but on the other hand it is a nice little practice.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Lessons in premature optimisation
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    // no -DN=nnn on command line
    #ifndef N
    #define N 10240
    #define NO_DUMP
    #endif
    
    // read pentium fast clock with gcc3.2
    #define RDTSC(llptr) { \
            __asm__ __volatile__ ( \
            "rdtsc" \
            : "=A" (llptr) \
            ); }
    
    int qsort_cmp ( const void *a, const void *b ) {
        const int *pa = a;
        const int *pb = b;
        if ( *pa > *pb ) return +1;
        if ( *pa < *pb ) return -1;
        return 0;
    }
    
    // simple bubble sort with no attempt at optimisation
    // to act as reference for improvement comparisions
    void bubble_reference ( int arr[], int len ) {
        int sorted, scan;
        for ( sorted = len-1 ; sorted >= 0 ; sorted-- ) {
            for ( scan = 0 ; scan < sorted ; scan++ ) {
                if ( arr[scan+1] < arr[scan] ) {
                    int temp = arr[scan+1];
                    arr[scan+1] = arr[scan];
                    arr[scan] = temp;
                }
            }
        }
    }
    
    // exchange unsorted elements with the end element
    // not the adjacent element
    // not strictly a bubble sort anymore (I don't think)
    void bubble_salem ( int arr[], int len ) {
        int sorted, scan;
        for ( sorted = len-1 ; sorted >= 0 ; sorted-- ) {
            for ( scan = 0 ; scan < sorted ; scan++ ) {
                if ( arr[sorted] < arr[scan] ) {
                    int temp = arr[sorted];
                    arr[sorted] = arr[scan];
                    arr[scan] = temp;
                }
            }
        }
    }
    
    void bubble_sort_nick (int A[], int n) {
        int i, j;
        int clean_pass = 0;
       
        for (i = 0; !clean_pass && i < n - 1; ++i) {
            clean_pass = 1;
            for (j = i + 1; j < n; ++j) {
                if (A[i] > A[j]) {
                    int t = A[i];
                    A[i] = A[j];
                    A[j] = t;
                    clean_pass = 0;
                }
            }
        }
    }
    
    int adv_bubble(int array[], int len) {
        register int i, tmp, tmp1, test, *pointer;
        int counter = len-2;
        do {
            test = 1;
            i = 0;
            pointer = array;
    
            do {
                // new code ~25% faster
                if(*pointer > *(pointer+1)) {
                    tmp = *pointer;
                    *pointer = *(pointer+1);
                    *(pointer+1) = tmp;
                    test = 0;
                }
                // old code -> slow
                /*if (pointer[i] > pointer[i+1])
                {
                tmp = pointer[i+1];
                pointer[i+1] = pointer[i];
                pointer[i] = tmp;
                test = 1;
                }*/
                ++pointer;
    
                //i++ and counter = N-2 is faster than ++i and counter = N-1
                // I guess I am making something wrong here ... though the array is sorted
                //correctly ...
            } while(i++ < counter);
    
            if ( test ) return 1;
        } while(--counter >= -1);
        return 1;
    }
    
    int main ( ) {
        int array0[N], array1[N],array2[N],array3[N],array4[N],array5[N];
        unsigned long long start, end;
        unsigned long long t1, t2, t3, t4, t5;
        int i;
    
        // prepare 5 identical data sets
        for ( i = 0 ; i < N ; i++ ) array0[i] = rand() % N;
        memcpy( array1, array0, sizeof(array0) );
        memcpy( array2, array0, sizeof(array0) );
        memcpy( array3, array0, sizeof(array0) );
        memcpy( array4, array0, sizeof(array0) );
        memcpy( array5, array0, sizeof(array0) );
    
        RDTSC(start);
        qsort( array1, N, sizeof(array1[0]), qsort_cmp );
        RDTSC(end); t1 = end-start;
    
        RDTSC(start);
        bubble_reference( array2, N );
        RDTSC(end); t2 = end-start;
    
        RDTSC(start);
        bubble_sort_nick( array3, N );
        RDTSC(end); t3 = end-start;
    
        RDTSC(start);
        adv_bubble( array4, N );
        RDTSC(end); t4 = end-start;
    
        RDTSC(start);
        bubble_salem( array5, N );
        RDTSC(end); t5 = end-start;
    
        /* check results */
    #ifndef NO_DUMP
        /* if a value for N was specified on the command line, print everything */
        for ( i = 0 ; i < N ; i++ )
            printf( "%4d %4d %4d %4d %4d %4d\n",
                array0[i], array1[i], array2[i], array3[i], array4[i], array5[i] );
    #endif
        for ( i = 0 ; i < N ; i++ ) {
            if ( array1[i] != array2[i] ) printf( "Array2 diff at %d\n", i );
            if ( array1[i] != array3[i] ) printf( "Array3 diff at %d\n", i );
            if ( array1[i] != array4[i] ) printf( "Array4 diff at %d\n", i );
            if ( array1[i] != array5[i] ) printf( "Array5 diff at %d\n", i );
        }
    
        printf( "Qsort            time=%lld\n", t1 );
        printf( "bubble_reference time=%lld\n", t2 );
        printf( "bubble_sort_nick time=%lld\n", t3 );
        printf( "adv_bubble       time=%lld\n", t4 );
        printf( "bubble_salem     time=%lld\n", t5 );
    
        return 0;
    }
    I get these results
    Code:
    E:\code>gcc prog.c
    E:\code>a.exe
    Qsort            time=8190385
    bubble_reference time=1738486896
    bubble_sort_nick time=1451780046
    adv_bubble       time=1311180746
    bubble_salem     time=1109674714
    
    E:\code>gcc -O2 prog.c
    E:\code>a.exe
    Qsort            time=7998630
    bubble_reference time=614753262
    bubble_sort_nick time=658336716
    adv_bubble       time=646453845
    bubble_salem     time=416393171
    Surprise news!!!!
    When the optimiser is turned on, the reference implementation beats the two attempts to improve things.


    To Sargnagel
    I wonder if your tutor has implemented bubble_salem() and not a true bubble sort?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User Cela's Avatar
    Join Date
    Jan 2003
    Posts
    362
    Why are you guys even bothering with bubblesort? Optimizing bubblesort is an exercise in futility, there's no way to improve the algorithm beyond quadratic performance and still have bubblesort. ;-)
    *Cela*

  8. #8
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    @Salem:

    Woohooo!
    *bows before the master*
    Surprise news!!!!
    When the optimiser is turned on, the reference implementation beats the two attempts to improve things.
    That's indeed a surprise ... even more surprising to me is, that both improved functions perform about equally "bad" when 'gcc -O3' is used. And even your function runs a little slower with O3.
    I wonder if your tutor has implemented bubble_salem() and not a true bubble sort?
    Hehe ... would've been quite evil.
    I managed to check my new function a few hours ago at a Sun UltraSparc workstation at my university. The program runs about 20% faster than my tutor's version.
    Finally, everything is alright again ... at least for me.

    Thanks to everyone for helping me out.

    Cela
    Why are you guys even bothering with bubblesort? Optimizing bubblesort is an exercise in futility, there's no way to improve the algorithm beyond quadratic performance and still have bubblesort. ;-)
    It was just ment as a little excercise. It's truly pointless for a real world application (I guess). I just felt I had to write a faster function than my tutor ... and I succeeded.
    Last edited by Sargnagel; 01-22-2003 at 02:05 PM.

  9. #9
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    Just another "little" excercise

    Here is the current version of my bubble sort function:
    Code:
    #define N 10240
    
    int advbubblesort(int array[])
    {
      	register int counter = N -2, *start = array;
    	
      	do
            {
        	    register unsigned int test = 1, i = 0;
          	    array = start;
    
          	    do
                {
    	  		if(*array > *(array+1))
    	    	       {
    	    	  	        register int tmp = *array;
    	    	  	        *array = *(array+1);
    	    	  	        *(array+1) = tmp;
    	    	  	        test = 0;
    	    	       }
    	  		++feld;
    	  		
    	    }while(i++ < counter);
    
    	    if(test) return 1;
    
        }while(counter--);
    
      return 1;
    }
    What about the while condition of the inner loop? Can it be optimized? I mean, can it be replaced by a more exotic expression? Something that's faster than an addition? It would surely help alot.
    Last edited by Sargnagel; 01-22-2003 at 03:03 PM.

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well you could unroll the inner loop, so that you do more sorting and less comparing, something akin to duff's device for fast memory copies.

    You'll probably need to write the compare and swap part as a macro, to make the actual loop somewhat readable, because you'll end up with say 8 copies of the same code!
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    And again thanks for your help, Salem. I will try to implement Duff's Device over the next few days.

  12. #12
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    hehe ... okay ... I couldn't wait any longer ... here is my first try on Duff's Device:
    Code:
    #define N 10240
    
    #define COMP_SWAP(x,y)	if(*x > *(y+1))\
    				           {\
    				           tmp = *x;\
    					   *x = *(y+1);\
    	    	  		           *(y+1) = tmp;\
    	    	  		           test = 0;\
    	    	  			   }\
    	    	  			   ++x;
    
    int advbubblesort(int array[])
    {
      	int counter = N-1;
      	unsigned int *start = array;
      		
      	do
        {
        	register unsigned int test = 1, tmp;
          	array = start;
    
          	register int n=(counter+7)/8;
    		switch(counter%8)
    		{
    			case 0:	do{	COMP_SWAP(array,array)
    			case 7:		COMP_SWAP(array,array)
    			case 6:		COMP_SWAP(array,array)
    			case 5:		COMP_SWAP(array,array)
    			case 4:		COMP_SWAP(array,array)
    			case 3:		COMP_SWAP(array,array)
    			case 2:		COMP_SWAP(array,array)
    			case 1:		COMP_SWAP(array,array)
    					}while(n--);
    		}
          	
    		if(test) return 1;
    		
        }while(counter--);
    
      return 1;
    }
    It's a little bit slower than my previous version. There is still some overhead that needs to be reduced ... but I don't know how I could merge the incrementation (++)of the pointer array into the swap part to avoid one extra +1 addition. But if the if-clause is false, the pointer array must be incremented, too. With if ... else it becomes too slow. Is there another way?

    I've replaced the adv_bubble function in your program, Salem, with this new version to measure the performance correctly. Here are the results:
    • no gcc optimizations:
      Qsort time =6107803
      bubble_reference time=1362174983
      bubble_sort_nick time =1155297212
      adv_bubble time =1021675087
      duff_bubble time =1061755064
      bubble_salem time =817406709

      with gcc -O2:
      Qsort time =5352142
      bubble_reference time=446172591
      bubble_sort_nick time =493390219
      adv_bubble time =496075135
      duff_bubble time =401223056
      bubble_salem time =318042164

    The Duff's Device source code can obviously be much better optimized by cygwin gcc 3.2.

    Another problem:
    When I use the code provided by my tutor to measure the time of the calculation with the Duff's Device BubbleSort function (yes, the field is sorted correctly), the displayed floating point value is very, very large (i.e. 875246141433.192017 ms) Here is the code:
    Code:
    #include <sys/time.h>
    
    double sub_time( struct timeval start,struct timeval stop)
    {
      	double  t1,t2;
      	
      	t1=(double) start.tv_usec;
      	t1/=1000000;
      	
      	t1+=start.tv_sec;
      	t2=stop.tv_usec;
      	t2/=1000000;
      	t2+=stop.tv_sec;
      	
      	return (t2-t1)*1000;
    }
    
    int main()
    {
            struct timeval start,stop;
            ...
            gettimeofday(&start,NULL);
      
      	advbubblesort(data);
    
      	gettimeofday(&stop,NULL);
            
            printf("time: %f\n", sub_time(start,stop));
            ...
            return 0;
    }
    Last edited by Sargnagel; 01-23-2003 at 05:21 AM.

  13. #13
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    Code:
    register int n=(counter+7)>>3;
    		switch(counter%8)
    		{
                      ...
    I've replaced the division by 8 with a right shift operation that does the same but is a little bit faster.
    I'm currently trying to improve the swap part. I will edit this post, when I've been successful.

    Here you are:
    Code:
    #define N 10240
    
    #define COMP_SWAP(x,y)	if(*x > *y)\
                              {\
                                       tmp = *x;\
                                       *x = *y;\
                                       *y = tmp;\
                                       test = 0;\
                               }\
                               x = y++;
    
    int duff_bubble(int array[])
    {
      	int counter = N-1;
      	unsigned int *start = array;
      		
      	do
        {
        	register unsigned int test = 1, tmp, *next = start+1;
          	array = start;
    
          	register int n=(counter+7)>>3;
    		switch(counter%8)
    		{
    			case 0:	do{	COMP_SWAP(array,next)
    			case 7:		COMP_SWAP(array,next)
    			case 6:		COMP_SWAP(array,next)
    			case 5:		COMP_SWAP(array,next)
    			case 4:		COMP_SWAP(array,next)
    			case 3:		COMP_SWAP(array,next)
    			case 2:		COMP_SWAP(array,next)
    			case 1:		COMP_SWAP(array,next)
    					}while(--n);
    		}
          	
    		if(test) return 1;
    		
        }while(counter--);
    
      return 1;
    }
    I've modified the COMP_SWAP macro to reduce the number of pointer increments. I've introduced another pointer called 'next'. This pointer points at the element 'start+1'. So, no more increments are needed in the swap part. And afterwards I only need one address assignment and one increment.
    First tests show that this new function is maybe a little faster than my adv_bubble_sort function (no Duff's device). But I will have to modify Salem's program first to be absolutely sure.
    Although the compiler cannot optimize it (cygwin gcc -O2) as good as my previous Duff's device sort.

    [edit]
    Here are the results:
    • cygwin gcc 3.2
      Qsort time=7252293
      bubble_reference time=1341713751
      bubble_sort_nick time=1149134359
      adv_bubble time=1026198625
      duff_bubble time=1034310584
      bubble_salem time=802085318

      cygwin gcc 3.2 -O2
      Qsort time=5210652
      bubble_reference time=441168372
      bubble_sort_nick time=488427096
      adv_bubble time=486364524
      duff_bubble time=425535524
      bubble_salem time=315937090

    Sadly, duff_bubble is unoptimized still a little bit slower than adv_bubble. ... the performance hunt goes on
    [/edit]
    Last edited by Sargnagel; 01-23-2003 at 06:14 AM.

  14. #14
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > gettimeofday(&start,NULL);
    I'd check the return result, to make sure the function was successful
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  15. #15
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    [latest edit]
    A test of adv_bubble, duff_bubble and salem_bubble at the university on a Sun UltraSparc 5 was quite amazing:

    gcc 2.95.3 no optimizations:

    adv_bubble 4346 ms
    duff_bubble 3381 ms
    salem_bubble 5633 ms

    On the Intel Plattform (P3) adv_ and duff_bubble perform about the same and salem_bubble is the fastest function.

    With -O2 turned on (UltraSparc) adv_ and duff_bubble perform about the same again (~1750 ms) and salem_bubble is back on the top with ~1350ms.

    I wonder if it is the architecture or the compiler that's making the difference ?
    [/latest edit]

    > gettimeofday(&start,NULL);
    I'd check the return result, to make sure the function was successful
    The function was not successful. I don't know why. It was working before, but after implementing Duff's device the clock has gone mad. Maybe the Duff-sort now somehow changes data out of the bounds of the array. I will check this.

    Still, your program is working wonderfully as you can see in my posting above.

    [edit]
    Hmm ... I don't think, that the Duff-sort is related to the gettimeofday-problem, because the first time check is done before duff_bubble is called.
    But what could be the error source?
    [/edit]
    [edit]
    The Duff-sort function was the problem. I forgot to adjust the array length and to change from post- to preincrement. The gettimeofday function still returns an error but the time display seems to be okay.
    [/edit]

    I've found a little bug in the outer loop. It must be
    while(--counter)

    and NOT while(counter--)

    I've discovered this bug, after I've removed the variable "test" and "if(test) return 1;".
    The Duff-sort is without this testing even faster than my adv bubble sort with testing removed, too.

    • cygwin gcc 3.2 unoptimized
      Qsort time=6165706
      bubble_reference time=1333714207
      bubble_sort_nick time=1137563691
      adv_bubble time=1022387323
      duff_bubble time=986999012
      bubble_salem time=801146807
    Last edited by Sargnagel; 01-23-2003 at 06:40 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 05-13-2009, 03:25 PM
  2. bubble sort not working... wats d prob?
    By Huskar in forum C Programming
    Replies: 8
    Last Post: 03-31-2009, 11:59 PM
  3. How do I bubble sort alphabetically?
    By arih56 in forum C++ Programming
    Replies: 4
    Last Post: 02-27-2008, 02:30 AM
  4. Bubble Sort... which type?
    By gflores in forum C++ Programming
    Replies: 8
    Last Post: 08-15-2004, 04:48 AM
  5. Bubble Sort, Qucik Sort
    By insomniak in forum C Programming
    Replies: 2
    Last Post: 03-15-2003, 04:54 PM