Okay, I'm bored so I'll give whoever is interested something to chew on. Whoever has the fastestpracticalcode is the winner. That means that inline assembly will be a mark against you, as will increasing the size and complexity of the algorithm considerably (to restrict you stackers a bit ).

The Task

You've been given the task of optimizing a quicksort routine written by one of your lovely colleagues who is currently on vacation. The optimized code needs to be integrated before she returns, so you've been given the job of making her code faster. The sort has been profiled and determined to be too slow with the recent influx of new users giving the software considerably more data to work with. The routine is used extensively in the software, so you can't change the interface.

Because the maintenance programmers are typically junior engineers with little or no experience, you're forced to hold yourself back a little. The resulting code must be fast as possible without causing the maintenance people to have a heart attack trying to understand it.

At current, the code runs approximately n% slower than the standard C library function qsort. Sadly, the implementation of qsort on your machines interacts poorly with your software and cannot be used in place of the custom sort. Because your lovely colleague was so careful in her original implementation, you find a handy scaffolding program to help you out. Here it is:

The scaffolding reinforces what you learned while profiling the code, jw_sort runs about n% slower than qsort. At the time that the code was written this was an acceptable difference so the author chose not to complicate the functions any more.Code:#include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_BUF 50 #define CUTOFF 18 typedef int (*Comparable) ( const void *a, const void *b ); void jw_insertion_sort ( void *base, int size, int n, Comparable cmp ) { unsigned char save[MAX_BUF]; unsigned char *cbase = base; int i, j; for ( i = 1; i < n; i++ ) { memcpy ( save, cbase + ( i * size ), size ); for ( j = i; j > 0 && cmp ( cbase + ( ( j - 1 ) * size ), save ) > 0; j-- ) memcpy ( cbase + ( j * size ), cbase + ( ( j - 1 ) * size ), size ); memcpy ( cbase + ( j * size ), save, size ); } } void swap_block ( void *a, void *b, int size ) { unsigned char save_block[MAX_BUF]; memcpy ( save_block, a, size ); memcpy ( a, b, size ); memcpy ( b, save_block, size ); } void jw_quick_sort ( void *base, int size, int low, int high, Comparable cmp ) { unsigned char save[MAX_BUF]; unsigned char *cbase = base; int left = low, right = high + 1; int r; if ( high - low < CUTOFF ) return; r = (int) ( ( (double)rand() * ( high - low ) ) / (double)RAND_MAX + low ); swap_block ( cbase + ( low * size ), cbase + ( r * size ), size ); memcpy ( save, cbase + ( low * size ), size ); for ( ; ; ) { while ( ++left <= high && cmp ( cbase + ( left * size ), save ) < 0 ) ; while ( --right >= 0 && cmp ( cbase + ( right * size ), save ) > 0 ) ; if ( left > right ) break; swap_block ( cbase + ( left * size ), cbase + ( right * size ), size ); } swap_block ( cbase + ( low * size ), cbase + ( right * size ), size ); jw_quick_sort ( base, size, low, right - 1, cmp ); jw_quick_sort ( base, size, right + 1, high, cmp ); } void jw_sort ( void *base, int size, int n, Comparable cmp ) { jw_quick_sort ( base, size, 0, n - 1, cmp ); jw_insertion_sort ( base, size, n, cmp ); } int compare ( const void *a, const void *b ) { int *aa = (int *)a; int *bb = (int *)b; if ( *aa < *bb ) return -1; else if ( *aa > *bb ) return +1; else return 0; } #include <time.h> int main ( void ) { int a0[10]; int *a1 = malloc ( 1000000 * sizeof ( int ) ); int i, n; clock_t start; double jw_time, q_time; for ( i = 0; i < 10; i++ ) a0[i] = rand(); for ( i = 0; i < 10; i++ ) printf ( "%d ", a0[i] ); printf ( "\n" ); jw_sort ( a0, sizeof ( int ), 10, compare ); for ( i = 0; i < 10; i++ ) printf ( "%d ", a0[i] ); printf ( "\n" ); for ( n = 0; n < 10; n++ ) { for ( i = 0; i < 1000000; i++ ) a1[i] = rand(); start = clock(); jw_sort ( a1, sizeof ( int ), 1000000, compare ); jw_time = ( (double)clock() - start ) / CLOCKS_PER_SEC; printf ( "jw_sort: %f ", jw_time ); for ( i = 0; i < 1000000; i++ ) a1[i] = rand(); start = clock(); qsort ( a1, 1000000, sizeof ( int ), compare ); q_time = ( (double)clock() - start ) / CLOCKS_PER_SEC; printf ( "qsort: %f Difference: %.0f%%\n", q_time, ( 1.0 - ( q_time / jw_time ) ) * 100 ); } free ( a1 ); return 0; }

Use any means you want to make jw_sort as fast as possible without causing the poor maintenance programmers fits. The data that the software processes can be anything from single characters to large heterogeneous records, so a generalized routine is required. Unfortunately, your colleague trusts that her co-workers are not stupid and is sparse with her commenting. You have to figure out her algorithm on your own. Since there aren't any comments, it shouldn't be too unusual of a solution you believe.

Back to reality

This is a reasonably realistic situation, which is a departure from most of our contests. But you'll find yourself learning quite a bit about quicksort by optimizing a typical implementation that you may have to decipher first. This is not a closed entry contest, so as soon as you have a submission, feel free to post it for everyone to see. This promotes multiple submissions and a high level of competitiveness since your opponents will be able to use your ideas in their own solutions. There is no time limit. Your colleague takes long vacations. Enjoy!