![]() |
| | #1 |
| Code Goddess Join Date: Sep 2001
Posts: 9,661
| An exercise in optimization ).The Task You've been given the task of optimizing a quicksort routine written by one of your lovely colleagues who is currently on vacation. The optimized code needs to be integrated before she returns, so you've been given the job of making her code faster. The sort has been profiled and determined to be too slow with the recent influx of new users giving the software considerably more data to work with. The routine is used extensively in the software, so you can't change the interface. Because the maintenance programmers are typically junior engineers with little or no experience, you're forced to hold yourself back a little. The resulting code must be fast as possible without causing the maintenance people to have a heart attack trying to understand it. At current, the code runs approximately n% slower than the standard C library function qsort. Sadly, the implementation of qsort on your machines interacts poorly with your software and cannot be used in place of the custom sort. Because your lovely colleague was so careful in her original implementation, you find a handy scaffolding program to help you out. Here it is: Code: #include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_BUF 50
#define CUTOFF 18
typedef int (*Comparable) ( const void *a, const void *b );
void jw_insertion_sort ( void *base, int size, int n, Comparable cmp )
{
unsigned char save[MAX_BUF];
unsigned char *cbase = base;
int i, j;
for ( i = 1; i < n; i++ ) {
memcpy ( save, cbase + ( i * size ), size );
for ( j = i; j > 0 && cmp ( cbase + ( ( j - 1 ) * size ), save ) > 0; j-- )
memcpy ( cbase + ( j * size ), cbase + ( ( j - 1 ) * size ), size );
memcpy ( cbase + ( j * size ), save, size );
}
}
void swap_block ( void *a, void *b, int size )
{
unsigned char save_block[MAX_BUF];
memcpy ( save_block, a, size );
memcpy ( a, b, size );
memcpy ( b, save_block, size );
}
void jw_quick_sort ( void *base, int size, int low, int high, Comparable cmp )
{
unsigned char save[MAX_BUF];
unsigned char *cbase = base;
int left = low, right = high + 1;
int r;
if ( high - low < CUTOFF )
return;
r = (int) ( ( (double)rand() * ( high - low ) ) / (double)RAND_MAX + low );
swap_block ( cbase + ( low * size ), cbase + ( r * size ), size );
memcpy ( save, cbase + ( low * size ), size );
for ( ; ; ) {
while ( ++left <= high && cmp ( cbase + ( left * size ), save ) < 0 )
;
while ( --right >= 0 && cmp ( cbase + ( right * size ), save ) > 0 )
;
if ( left > right )
break;
swap_block ( cbase + ( left * size ), cbase + ( right * size ), size );
}
swap_block ( cbase + ( low * size ), cbase + ( right * size ), size );
jw_quick_sort ( base, size, low, right - 1, cmp );
jw_quick_sort ( base, size, right + 1, high, cmp );
}
void jw_sort ( void *base, int size, int n, Comparable cmp )
{
jw_quick_sort ( base, size, 0, n - 1, cmp );
jw_insertion_sort ( base, size, n, cmp );
}
int compare ( const void *a, const void *b )
{
int *aa = (int *)a;
int *bb = (int *)b;
if ( *aa < *bb )
return -1;
else if ( *aa > *bb )
return +1;
else
return 0;
}
#include <time.h>
int main ( void )
{
int a0[10];
int *a1 = malloc ( 1000000 * sizeof ( int ) );
int i, n;
clock_t start;
double jw_time, q_time;
for ( i = 0; i < 10; i++ )
a0[i] = rand();
for ( i = 0; i < 10; i++ )
printf ( "%d ", a0[i] );
printf ( "\n" );
jw_sort ( a0, sizeof ( int ), 10, compare );
for ( i = 0; i < 10; i++ )
printf ( "%d ", a0[i] );
printf ( "\n" );
for ( n = 0; n < 10; n++ ) {
for ( i = 0; i < 1000000; i++ )
a1[i] = rand();
start = clock();
jw_sort ( a1, sizeof ( int ), 1000000, compare );
jw_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
printf ( "jw_sort: %f ", jw_time );
for ( i = 0; i < 1000000; i++ )
a1[i] = rand();
start = clock();
qsort ( a1, 1000000, sizeof ( int ), compare );
q_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
printf ( "qsort: %f Difference: %.0f%%\n", q_time, ( 1.0 - ( q_time / jw_time ) ) * 100 );
}
free ( a1 );
return 0;
}
Use any means you want to make jw_sort as fast as possible without causing the poor maintenance programmers fits. The data that the software processes can be anything from single characters to large heterogeneous records, so a generalized routine is required. Unfortunately, your colleague trusts that her co-workers are not stupid and is sparse with her commenting. You have to figure out her algorithm on your own. Since there aren't any comments, it shouldn't be too unusual of a solution you believe. Back to reality This is a reasonably realistic situation, which is a departure from most of our contests. But you'll find yourself learning quite a bit about quicksort by optimizing a typical implementation that you may have to decipher first. This is not a closed entry contest, so as soon as you have a submission, feel free to post it for everyone to see. This promotes multiple submissions and a high level of competitiveness since your opponents will be able to use your ideas in their own solutions. There is no time limit. Your colleague takes long vacations. Enjoy!
__________________ My best code is written with the delete key. |
| Prelude is offline |
| | #2 |
| Registered User Join Date: Mar 2003 Location: Louisiana
Posts: 926
| wow, I can't even imagine. Good luck people who attempt this feat. |
| linuxdude is offline |
| | #3 |
| Registered User Join Date: Sep 2003
Posts: 107
| Hmm... I think I might try to do this a bit later to see what I can come up with. Are you allowed to optimise the main() routine as well? Edit: Err... nm. Now that I looked at it closely, optimising main() wouldn't make any difference. Last edited by Xzyx987X; 03-31-2004 at 01:12 PM. |
| Xzyx987X is offline |
| | #4 |
| Code Goddess Join Date: Sep 2001
Posts: 9,661
| >Are you allowed to optimise the main() routine as well? As long as the interface to the function isn't changed (because it is already in use), you can optimize whatever you want. And there is one big improvement that effects speed poorly if you choose to implement it.
__________________ My best code is written with the delete key. |
| Prelude is offline |
| | #5 |
| Software Developer Join Date: Feb 2003 Location: University of Waterloo
Posts: 1,916
| is this still going on? |
| jverkoey is offline |
| | #6 |
| Code Goddess Join Date: Sep 2001
Posts: 9,661
| >is this still going on? Yes.
__________________ My best code is written with the delete key. |
| Prelude is offline |
| | #7 | |
| Registered User Join Date: Mar 2004
Posts: 332
| Quote:
what exactly does this do? | |
| whackaxe is offline |
| | #8 |
| Rad Join Date: Mar 2003
Posts: 942
| I think Prelude was just writing a program and messed up and wants us to clean it up. The lazy little... :P Just kidding ;-)
__________________ -Stephen Cope smc42190@gmail.com http://purevolume.com/step http://myspace.com/stephencope |
| gcn_zelda is offline |
| | #9 | |
| . Join Date: Nov 2003
Posts: 293
| Okay - the ball is in your court I don't claim much except some minor changes in open source code, but it is better than our native qsort by along shot, so it is part of our run-time. I get: Quote:
G Schmidt. Code: Profiler output: %Time Seconds Cumsecs #Calls msec/call Name 40.8 86.35 86.35421060654 0.00 compare Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_BUF 64
#define CUTOFF 18
#include <limits.h>
#include <stdlib.h>
#include <string.h>
#define SWAP(a, b, size) \
do \
{ \
register size_t __size = (size); \
register char *__a = (a), *__b = (b); \
do \
{ \
char __tmp = *__a; \
*__a++ = *__b; \
*__b++ = __tmp; \
} while (--__size > 0); \
} while (0)
#define INSERT_THRESH 4
typedef struct
{
char *lo;
char *hi;
} stack_node;
#define min(x, y) ((x) < (y) ? (x) : (y))
#define STACK_SIZE (CHAR_BIT * sizeof(size_t))
#define PUSH(low, high) ((void) ((top->lo = (low)), (top->hi = (high)), ++top))
#define POP(low, high) ((void) (--top, (low = top->lo), (high = top->hi)))
#define STACK_NOT_EMPTY (stack < top)
/*
PNMquicksort() libc.sl.2.04.a jmc
This incorporates improvements and
optimizations (see Sedgwick, 1997)
Non-recursive, uses stack of pointers
Quicksorts just TOTAL_ELEMS / INSERT_THRESH partitions, falls into
insertion sort to order the INSERT_THRESH items within each partition.
The larger of the two sub-partitions is always pushed onto the
stack first.
Based on code from OSF 1990 and GNU (G. Schmidt).
GNU disclaimer (aka warranty) which is required to be in source derived from
GNU:
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 1.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
*/
typedef int (*Comparable) ( const void *a, const void *b );
void
PNMquicksort (void *const pbase,
size_t total_elems,
size_t size,
Comparable cmp_fx)
{
register char *base_ptr = (char *) pbase;
const size_t max_thresh = INSERT_THRESH * size;
if (total_elems == 0)
{
return;
}
if (total_elems > INSERT_THRESH)
{
char *lo = base_ptr;
char *hi = &lo[size * (total_elems - 1)];
stack_node stack[STACK_SIZE];
stack_node *top = stack + 1;
while (STACK_NOT_EMPTY)
{
char *left;
char *right;
char *mid = lo + size * ((hi - lo) / size >> 1);
if ((*cmp_fx) ((void *) mid, (void *) lo) < 0)
SWAP (mid, lo, size);
if ((*cmp_fx) ((void *) hi, (void *) mid) < 0)
SWAP (mid, hi, size);
else
goto skip; /* ugh a goto ! */
if ((*cmp_fx) ((void *) mid, (void *) lo) < 0)
SWAP (mid, lo, size);
skip:;
left = lo + size;
right = hi - size;
do
{
while ((*cmp_fx) ((void *) left, (void *) mid) < 0)
{
left += size;
}
while ((*cmp_fx) ((void *) mid, (void *) right) < 0)
{
right -= size;
}
if (left < right)
{
SWAP (left, right, size);
if (mid == left)
mid = right;
else if (mid == right)
mid = left;
left += size;
right -= size;
}
else if (left == right)
{
left += size;
right -= size;
break;
}
} while (left <= right);
if ((size_t) (right - lo) <= max_thresh)
{
if ((size_t) (hi - left) <= max_thresh)
POP (lo, hi);
else
lo = left;
}
else if ((size_t) (hi - left) <= max_thresh)
hi = right;
else if ((right - lo) > (hi - left))
{
PUSH (lo, right);
lo = left;
}
else
{
PUSH (left, hi);
hi = right;
}
}
}
{ /* insertion sort */
char *const end_ptr = &base_ptr[size * (total_elems - 1)];
char *tmp_ptr = base_ptr;
char *thresh = min(end_ptr, base_ptr + max_thresh);
register char *run_ptr;
for (run_ptr = tmp_ptr + size; run_ptr <= thresh; run_ptr += size)
{
if ((*cmp_fx) ((void *) run_ptr, (void *) tmp_ptr) < 0)
{
tmp_ptr = run_ptr;
}
}
if (tmp_ptr != base_ptr)
{
SWAP (tmp_ptr, base_ptr, size);
}
run_ptr = base_ptr + size;
while ((run_ptr += size) <= end_ptr)
{
tmp_ptr = run_ptr - size;
while ((*cmp_fx) ((void *) run_ptr, (void *) tmp_ptr) < 0)
tmp_ptr -= size;
tmp_ptr += size;
if (tmp_ptr != run_ptr)
{
char *trav;
trav = run_ptr + size;
while (--trav >= run_ptr)
{
char c = *trav;
char *hi, *lo;
for (hi = lo = trav; (lo -= size) >= tmp_ptr; hi = lo)
{
*hi = *lo;
}
*hi = c;
}
}
}
}
}
void jw_sort ( void *base, const size_t total_elems, Comparable cmp )
{
PNMquicksort ( base, total_elems, sizeof(int), cmp );
}
/* improved compare */
int compare ( const void *a, const void *b )
{
return *(int *)a - *(int *)b;
}
/* old slower compare */
int slow_compare ( const void *a, const void *b )
{
int *aa = (int *)a;
int *bb = (int *)b;
if ( *aa < *bb )
return -1;
else if ( *aa > *bb )
return +1;
else
return 0;
}
#include <time.h>
int main ( void )
{
int a0[10]={0};
int *a1 = malloc ( 1000000 * sizeof ( int ) );
int i=0,
n=0;
clock_t start=0;
double jw_time=0.,
q_time=0.;
for ( i = 0; i < 10; i++ )
{
a0[i] = rand();
}
for ( i = 0; i < 10; i++ )
{
printf ( "%d ", a0[i] );
}
printf ( "\n" );
jw_sort ( a0, 10, compare );
for ( i = 0; i < 10; i++ )
{
printf ( "%d ", a0[i] );
}
printf ( "\n" );
for ( n = 0; n < 10; n++ )
{
for ( i = 0; i < 1000000; i++ )
{
a1[i] = rand();
}
start = clock();
jw_sort ( a1, 1000000, compare );
jw_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
printf ( "jw_sort: %f ", jw_time );
for ( i = 0; i < 1000000; i++ )
{
a1[i] = rand();
}
start = clock();
qsort ( a1, 1000000, sizeof ( int ),compare );
q_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
printf ( "qsort: %f Difference: %.0f%%\n",
q_time,
( 1.0 - ( q_time / jw_time ) ) * 100 );
}
free ( a1 );
return 0;
}
Last edited by jim mcnamara; 04-29-2005 at 01:33 PM. | |
| jim mcnamara is offline |
| | #10 |
| Anti-Poster Join Date: Feb 2002
Posts: 1,212
| Man...it took a year for someone to crank out an answer? Forget it; I'm not entering any of Prelude's contests. That's some dedication.
__________________ Rule #1: Every rule has exceptions Traveller's Dilemma Contest Site - Results posted! |
| pianorain is offline |
| | #11 |
| & the hat of GPL slaying Join Date: Sep 2001
Posts: 5,730
| jim mcnamara you've been here long enough to know you shouldn't bump year old threads. |
| Thantos is offline |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Line of data input method | larry_2k4 | C Programming | 2 | 04-28-2009 11:34 PM |
| Turn Off Optimization? | danlee58 | C Programming | 6 | 12-10-2008 03:52 AM |
| need reading material for c++ database optimization | elninio | C++ Programming | 0 | 07-24-2008 11:32 PM |
| Tutorial review | Prelude | A Brief History of Cprogramming.com | 11 | 03-22-2004 09:40 PM |
| Request for comments | Prelude | A Brief History of Cprogramming.com | 15 | 01-02-2004 10:33 AM |