# An exercise in optimization

• 03-11-2004
Prelude
An exercise in optimization
Okay, I'm bored so I'll give whoever is interested something to chew on. Whoever has the fastest practical code is the winner. That means that inline assembly will be a mark against you, as will increasing the size and complexity of the algorithm considerably (to restrict you stackers a bit ;)).

You've been given the task of optimizing a quicksort routine written by one of your lovely colleagues who is currently on vacation. The optimized code needs to be integrated before she returns, so you've been given the job of making her code faster. The sort has been profiled and determined to be too slow with the recent influx of new users giving the software considerably more data to work with. The routine is used extensively in the software, so you can't change the interface.

Because the maintenance programmers are typically junior engineers with little or no experience, you're forced to hold yourself back a little. The resulting code must be fast as possible without causing the maintenance people to have a heart attack trying to understand it.

At current, the code runs approximately n% slower than the standard C library function qsort. Sadly, the implementation of qsort on your machines interacts poorly with your software and cannot be used in place of the custom sort. Because your lovely colleague was so careful in her original implementation, you find a handy scaffolding program to help you out. Here it is:
Code:

```#include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_BUF 50 #define CUTOFF  18 typedef int (*Comparable) ( const void *a, const void *b ); void jw_insertion_sort ( void *base, int size, int n, Comparable cmp ) {   unsigned char save[MAX_BUF];   unsigned char *cbase = base;   int i, j;   for ( i = 1; i < n; i++ ) {     memcpy ( save, cbase + ( i * size ), size );     for ( j = i; j > 0 && cmp ( cbase + ( ( j - 1 ) * size ), save ) > 0; j-- )       memcpy ( cbase + ( j * size ), cbase + ( ( j - 1 ) * size ), size );     memcpy ( cbase + ( j * size ), save, size );   } } void swap_block ( void *a, void *b, int size ) {   unsigned char save_block[MAX_BUF];   memcpy ( save_block, a, size );   memcpy ( a, b, size );   memcpy ( b, save_block, size ); } void jw_quick_sort ( void *base, int size, int low, int high, Comparable cmp ) {   unsigned char save[MAX_BUF];   unsigned char *cbase = base;   int left = low, right = high + 1;   int r;   if ( high - low < CUTOFF )     return;   r = (int) ( ( (double)rand() * ( high - low ) ) / (double)RAND_MAX + low );   swap_block ( cbase + ( low * size ), cbase + ( r * size ), size );   memcpy ( save, cbase + ( low * size ), size );   for ( ; ; ) {     while ( ++left <= high && cmp ( cbase + ( left * size ), save ) < 0 )       ;     while ( --right >= 0 && cmp ( cbase + ( right * size ), save ) > 0 )       ;     if ( left > right )       break;     swap_block ( cbase + ( left * size ), cbase + ( right * size ), size );   }   swap_block ( cbase + ( low * size ), cbase + ( right * size ), size );   jw_quick_sort ( base, size, low, right - 1, cmp );   jw_quick_sort ( base, size, right + 1, high, cmp ); } void jw_sort ( void *base, int size, int n, Comparable cmp ) {   jw_quick_sort ( base, size, 0, n - 1, cmp );   jw_insertion_sort ( base, size, n, cmp ); } int compare ( const void *a, const void *b ) {   int *aa = (int *)a;   int *bb = (int *)b;   if ( *aa < *bb )     return -1;   else if ( *aa > *bb )     return +1;   else     return 0; } #include <time.h> int main ( void ) {   int a0[10];   int *a1 = malloc ( 1000000 * sizeof ( int ) );   int i, n;   clock_t start;   double jw_time, q_time;   for ( i = 0; i < 10; i++ )     a0[i] = rand();   for ( i = 0; i < 10; i++ )     printf ( "%d ", a0[i] );   printf ( "\n" );   jw_sort ( a0, sizeof ( int ), 10, compare );   for ( i = 0; i < 10; i++ )     printf ( "%d ", a0[i] );   printf ( "\n" );   for ( n = 0; n < 10; n++ ) {     for ( i = 0; i < 1000000; i++ )       a1[i] = rand();     start = clock();     jw_sort ( a1, sizeof ( int ), 1000000, compare );     jw_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;     printf ( "jw_sort: %f  ", jw_time );     for ( i = 0; i < 1000000; i++ )       a1[i] = rand();     start = clock();     qsort ( a1, 1000000, sizeof ( int ), compare );     q_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;     printf ( "qsort: %f  Difference: %.0f%%\n", q_time, ( 1.0 - ( q_time / jw_time ) ) * 100 );   }   free ( a1 );   return 0; }```
The scaffolding reinforces what you learned while profiling the code, jw_sort runs about n% slower than qsort. At the time that the code was written this was an acceptable difference so the author chose not to complicate the functions any more.

Use any means you want to make jw_sort as fast as possible without causing the poor maintenance programmers fits. The data that the software processes can be anything from single characters to large heterogeneous records, so a generalized routine is required. Unfortunately, your colleague trusts that her co-workers are not stupid and is sparse with her commenting. You have to figure out her algorithm on your own. Since there aren't any comments, it shouldn't be too unusual of a solution you believe.

Back to reality

This is a reasonably realistic situation, which is a departure from most of our contests. But you'll find yourself learning quite a bit about quicksort by optimizing a typical implementation that you may have to decipher first. This is not a closed entry contest, so as soon as you have a submission, feel free to post it for everyone to see. This promotes multiple submissions and a high level of competitiveness since your opponents will be able to use your ideas in their own solutions. There is no time limit. Your colleague takes long vacations. :D Enjoy! :)
• 03-22-2004
linuxdude
wow, I can't even imagine. Good luck people who attempt this feat.
• 03-31-2004
Xzyx987X
Hmm... I think I might try to do this a bit later to see what I can come up with. Are you allowed to optimise the main() routine as well?

Edit: Err... nm. Now that I looked at it closely, optimising main() wouldn't make any difference.
• 04-01-2004
Prelude
>Are you allowed to optimise the main() routine as well?
As long as the interface to the function isn't changed (because it is already in use), you can optimize whatever you want. And there is one big improvement that effects speed poorly if you choose to implement it. ;)
• 04-07-2004
jverkoey
is this still going on?
• 04-07-2004
Prelude
>is this still going on?
Yes.
• 04-08-2004
whackaxe
Quote:

Originally Posted by Prelude
Because the maintenance programmers are typically junior engineers with little or no experience

okay, that would be me, except i don't understant it at all :p what exactly does this do?
• 04-23-2004
gcn_zelda
I think Prelude was just writing a program and messed up and wants us to clean it up.

The lazy little... :P

Just kidding ;-)
• 04-29-2005
jim mcnamara
Okay - the ball is in your court I don't claim much except some minor changes in open source code, but it is better than our native qsort by along shot, so it is part of our run-time.

I get:
Quote:

kcsdev:/home/jmcnama> cc isort.c -o isort +O4
kcsdev:/home/jmcnama> isort
16838 5758 10113 17515 31051 5627 23010 7419 16212 4086
4086 5627 5758 7419 10113 16212 16838 17515 23010 31051
jw_sort: 1.580000 qsort: 5.580000 Difference: -253%
jw_sort: 1.550000 qsort: 5.460000 Difference: -252%
jw_sort: 1.600000 qsort: 5.590000 Difference: -249%
jw_sort: 1.530000 qsort: 5.420000 Difference: -254%
jw_sort: 1.570000 qsort: 5.490000 Difference: -250%
jw_sort: 1.510000 qsort: 5.570000 Difference: -269%
jw_sort: 1.580000 qsort: 5.450000 Difference: -245%
jw_sort: 1.560000 qsort: 5.530000 Difference: -254%
jw_sort: 1.570000 qsort: 5.380000 Difference: -243%
jw_sort: 1.530000 qsort: 5.640000 Difference: -269%
Code we modified over time, based on source from OSF and
G Schmidt.
Code:

```Profiler output:  %Time Seconds Cumsecs  #Calls  msec/call  Name   40.8  86.35  86.35421060654        0.00  compare```
almost all of the time is spent in compare - 40.8 %

Code:

``` #include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_BUF 64 #define CUTOFF  18 #include <limits.h> #include <stdlib.h> #include <string.h> #define SWAP(a, b, size)                    \   do                                                            \     {                                                            \       register size_t __size = (size);                        \       register char *__a = (a), *__b = (b);                \       do                                                                        \           {                                                                            \               char __tmp = *__a;                                                \               *__a++ = *__b;                                                    \               *__b++ = __tmp;                                                    \           } while (--__size > 0);                                                \     } while (0) #define INSERT_THRESH 4 typedef struct   {     char *lo;     char *hi;   } stack_node; #define min(x, y) ((x) < (y) ? (x) : (y)) #define STACK_SIZE        (CHAR_BIT * sizeof(size_t)) #define PUSH(low, high)        ((void) ((top->lo = (low)), (top->hi = (high)), ++top)) #define        POP(low, high)        ((void) (--top, (low = top->lo), (high = top->hi))) #define        STACK_NOT_EMPTY        (stack < top) /*       PNMquicksort() libc.sl.2.04.a  jmc    This incorporates improvements and   optimizations (see Sedgwick, 1997)   Non-recursive, uses stack of pointers     Quicksorts just TOTAL_ELEMS / INSERT_THRESH partitions, falls into       insertion sort to order the INSERT_THRESH items within each partition.         The larger of the two sub-partitions is always pushed onto the       stack first.         Based on code from OSF 1990 and GNU (G. Schmidt).       GNU disclaimer (aka warranty) which is required to be in source derived from   GNU:         The GNU C Library is free software; you can redistribute it and/or   modify it under the terms of the GNU Lesser General Public   License as published by the Free Software Foundation; either   version 1.1 of the License, or (at your option) any later version.   The GNU C Library is distributed in the hope that it will be useful,   but WITHOUT ANY WARRANTY; without even the implied warranty of   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU   Lesser General Public License for more details.   */       typedef int (*Comparable) ( const void *a, const void *b );   void PNMquicksort (void *const pbase,               size_t total_elems,               size_t size,                   Comparable cmp_fx) {     register char *base_ptr = (char *) pbase;     const size_t max_thresh = INSERT_THRESH * size;     if (total_elems == 0)     {          return;     }         if (total_elems > INSERT_THRESH)       {         char *lo = base_ptr;         char *hi = &lo[size * (total_elems - 1)];         stack_node stack[STACK_SIZE];         stack_node *top = stack + 1;             while (STACK_NOT_EMPTY)           {             char *left;             char *right;                            char *mid = lo + size * ((hi - lo) / size >> 1);                     if ((*cmp_fx) ((void *) mid, (void *) lo) < 0)                     SWAP (mid, lo, size);                 if ((*cmp_fx) ((void *) hi, (void *) mid) < 0)                     SWAP (mid, hi, size);                 else                     goto skip;                            /* ugh a goto ! */                 if ((*cmp_fx) ((void *) mid, (void *) lo) < 0)                     SWAP (mid, lo, size); skip:;                 left  = lo + size;                 right = hi - size;                              do                 {                     while ((*cmp_fx) ((void *) left, (void *) mid) < 0)                     {                               left += size;                 }                     while ((*cmp_fx) ((void *) mid, (void *) right) < 0)                     {                               right -= size;                 }                     if (left < right)                           {                               SWAP (left, right, size);                               if (mid == left)                                 mid = right;                               else if (mid == right)                                 mid = left;                               left += size;                               right -= size;                           }                     else if (left == right)                           {                               left += size;                               right -= size;                               break;                           }                 }  while (left <= right);                       if ((size_t) (right - lo) <= max_thresh)             {                 if ((size_t) (hi - left) <= max_thresh)                               POP (lo, hi);                 else                               lo = left;             }             else if ((size_t) (hi - left) <= max_thresh)                     hi = right;             else if ((right - lo) > (hi - left))             {                     PUSH (lo, right);                 lo = left;             }             else             {                 PUSH (left, hi);                 hi = right;             }           }       }     { /* insertion sort */       char *const end_ptr = &base_ptr[size * (total_elems - 1)];       char *tmp_ptr = base_ptr;       char *thresh = min(end_ptr, base_ptr + max_thresh);       register char *run_ptr;               for (run_ptr = tmp_ptr + size; run_ptr <= thresh; run_ptr += size)       {           if ((*cmp_fx) ((void *) run_ptr, (void *) tmp_ptr) < 0)           {               tmp_ptr = run_ptr;           }       }             if (tmp_ptr != base_ptr)       {           SWAP (tmp_ptr, base_ptr, size);       }             run_ptr = base_ptr + size;       while ((run_ptr += size) <= end_ptr)       {               tmp_ptr = run_ptr - size;               while ((*cmp_fx) ((void *) run_ptr, (void *) tmp_ptr) < 0)                   tmp_ptr -= size;                         tmp_ptr += size;           if (tmp_ptr != run_ptr)           {               char *trav;                             trav = run_ptr + size;                   while (--trav >= run_ptr)               {                   char c = *trav;                   char *hi, *lo;                             for (hi = lo = trav; (lo -= size) >= tmp_ptr; hi = lo)                   {                       *hi = *lo;                   }                   *hi = c;               }           }       }     } } void jw_sort ( void *base, const size_t total_elems, Comparable cmp ) {     PNMquicksort ( base, total_elems, sizeof(int), cmp );    } /* improved compare */ int compare ( const void *a, const void *b ) {     return  *(int *)a - *(int *)b; }    /* old slower compare */ int slow_compare ( const void *a, const void *b ) {     int *aa = (int *)a;     int *bb = (int *)b;         if ( *aa < *bb )       return -1;     else if ( *aa > *bb )       return +1;     else       return 0; } #include <time.h> int main ( void ) {     int a0[10]={0};     int *a1 = malloc ( 1000000 * sizeof ( int ) );     int i=0,         n=0;     clock_t start=0;     double jw_time=0.,           q_time=0.;         for ( i = 0; i < 10; i++ )     {         a0[i] = rand();     }     for ( i = 0; i < 10; i++ )     {         printf ( "%d ", a0[i] );     }     printf ( "\n" );     jw_sort ( a0, 10, compare );     for ( i = 0; i < 10; i++ )     {         printf ( "%d ", a0[i] );     }     printf ( "\n" );     for ( n = 0; n < 10; n++ )     {         for ( i = 0; i < 1000000; i++ )         {             a1[i] = rand();         }         start = clock();         jw_sort ( a1, 1000000, compare );         jw_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;         printf ( "jw_sort: %f  ", jw_time );         for ( i = 0; i < 1000000; i++ )         {             a1[i] = rand();         }         start = clock();         qsort ( a1, 1000000, sizeof ( int ),compare );         q_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;         printf ( "qsort: %f  Difference: %.0f%%\n",                 q_time,                 ( 1.0 - ( q_time / jw_time ) ) * 100 );                      }     free ( a1 );        return 0; }```
• 04-29-2005
pianorain
Man...it took a year for someone to crank out an answer? Forget it; I'm not entering any of Prelude's contests. That's some dedication.
• 04-29-2005
Thantos
jim mcnamara you've been here long enough to know you shouldn't bump year old threads.