C Board  

Go Back   C Board > Community Boards > Contests Board

Closed Thread
 
LinkBack Thread Tools Display Modes
Old 03-11-2004, 10:43 PM   #1
Code Goddess
 
Prelude's Avatar
 
Join Date: Sep 2001
Posts: 9,661
An exercise in optimization

Okay, I'm bored so I'll give whoever is interested something to chew on. Whoever has the fastest practical code is the winner. That means that inline assembly will be a mark against you, as will increasing the size and complexity of the algorithm considerably (to restrict you stackers a bit ).

The Task

You've been given the task of optimizing a quicksort routine written by one of your lovely colleagues who is currently on vacation. The optimized code needs to be integrated before she returns, so you've been given the job of making her code faster. The sort has been profiled and determined to be too slow with the recent influx of new users giving the software considerably more data to work with. The routine is used extensively in the software, so you can't change the interface.

Because the maintenance programmers are typically junior engineers with little or no experience, you're forced to hold yourself back a little. The resulting code must be fast as possible without causing the maintenance people to have a heart attack trying to understand it.

At current, the code runs approximately n% slower than the standard C library function qsort. Sadly, the implementation of qsort on your machines interacts poorly with your software and cannot be used in place of the custom sort. Because your lovely colleague was so careful in her original implementation, you find a handy scaffolding program to help you out. Here it is:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_BUF 50
#define CUTOFF  18

typedef int (*Comparable) ( const void *a, const void *b );

void jw_insertion_sort ( void *base, int size, int n, Comparable cmp )
{
  unsigned char save[MAX_BUF];
  unsigned char *cbase = base;
  int i, j;

  for ( i = 1; i < n; i++ ) {
    memcpy ( save, cbase + ( i * size ), size );
    for ( j = i; j > 0 && cmp ( cbase + ( ( j - 1 ) * size ), save ) > 0; j-- )
      memcpy ( cbase + ( j * size ), cbase + ( ( j - 1 ) * size ), size );
    memcpy ( cbase + ( j * size ), save, size );
  }
}

void swap_block ( void *a, void *b, int size )
{
  unsigned char save_block[MAX_BUF];

  memcpy ( save_block, a, size );
  memcpy ( a, b, size );
  memcpy ( b, save_block, size );
}

void jw_quick_sort ( void *base, int size, int low, int high, Comparable cmp )
{
  unsigned char save[MAX_BUF];
  unsigned char *cbase = base;
  int left = low, right = high + 1;
  int r;

  if ( high - low < CUTOFF )
    return;
  r = (int) ( ( (double)rand() * ( high - low ) ) / (double)RAND_MAX + low );
  swap_block ( cbase + ( low * size ), cbase + ( r * size ), size );
  memcpy ( save, cbase + ( low * size ), size );
  for ( ; ; ) {
    while ( ++left <= high && cmp ( cbase + ( left * size ), save ) < 0 )
      ;
    while ( --right >= 0 && cmp ( cbase + ( right * size ), save ) > 0 )
      ;
    if ( left > right )
      break;
    swap_block ( cbase + ( left * size ), cbase + ( right * size ), size );
  }
  swap_block ( cbase + ( low * size ), cbase + ( right * size ), size );
  jw_quick_sort ( base, size, low, right - 1, cmp );
  jw_quick_sort ( base, size, right + 1, high, cmp );
}

void jw_sort ( void *base, int size, int n, Comparable cmp )
{
  jw_quick_sort ( base, size, 0, n - 1, cmp );
  jw_insertion_sort ( base, size, n, cmp );
}

int compare ( const void *a, const void *b )
{
  int *aa = (int *)a;
  int *bb = (int *)b;

  if ( *aa < *bb )
    return -1;
  else if ( *aa > *bb )
    return +1;
  else
    return 0;
}

#include <time.h>

int main ( void )
{
  int a0[10];
  int *a1 = malloc ( 1000000 * sizeof ( int ) );
  int i, n;
  clock_t start;
  double jw_time, q_time;

  for ( i = 0; i < 10; i++ )
    a0[i] = rand();
  for ( i = 0; i < 10; i++ )
    printf ( "%d ", a0[i] );
  printf ( "\n" );
  jw_sort ( a0, sizeof ( int ), 10, compare );
  for ( i = 0; i < 10; i++ )
    printf ( "%d ", a0[i] );
  printf ( "\n" );
  for ( n = 0; n < 10; n++ ) {
    for ( i = 0; i < 1000000; i++ )
      a1[i] = rand();
    start = clock();
    jw_sort ( a1, sizeof ( int ), 1000000, compare );
    jw_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
    printf ( "jw_sort: %f  ", jw_time );
    for ( i = 0; i < 1000000; i++ )
      a1[i] = rand();
    start = clock();
    qsort ( a1, 1000000, sizeof ( int ), compare );
    q_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
    printf ( "qsort: %f  Difference: %.0f%%\n", q_time, ( 1.0 - ( q_time / jw_time ) ) * 100 );
  }
  free ( a1 );

  return 0;
}
The scaffolding reinforces what you learned while profiling the code, jw_sort runs about n% slower than qsort. At the time that the code was written this was an acceptable difference so the author chose not to complicate the functions any more.

Use any means you want to make jw_sort as fast as possible without causing the poor maintenance programmers fits. The data that the software processes can be anything from single characters to large heterogeneous records, so a generalized routine is required. Unfortunately, your colleague trusts that her co-workers are not stupid and is sparse with her commenting. You have to figure out her algorithm on your own. Since there aren't any comments, it shouldn't be too unusual of a solution you believe.

Back to reality

This is a reasonably realistic situation, which is a departure from most of our contests. But you'll find yourself learning quite a bit about quicksort by optimizing a typical implementation that you may have to decipher first. This is not a closed entry contest, so as soon as you have a submission, feel free to post it for everyone to see. This promotes multiple submissions and a high level of competitiveness since your opponents will be able to use your ideas in their own solutions. There is no time limit. Your colleague takes long vacations. Enjoy!
__________________
My best code is written with the delete key.
Prelude is offline  
Old 03-22-2004, 09:47 PM   #2
Registered User
 
linuxdude's Avatar
 
Join Date: Mar 2003
Location: Louisiana
Posts: 926
wow, I can't even imagine. Good luck people who attempt this feat.
linuxdude is offline  
Old 03-31-2004, 11:32 AM   #3
Registered User
 
Xzyx987X's Avatar
 
Join Date: Sep 2003
Posts: 107
Hmm... I think I might try to do this a bit later to see what I can come up with. Are you allowed to optimise the main() routine as well?

Edit: Err... nm. Now that I looked at it closely, optimising main() wouldn't make any difference.

Last edited by Xzyx987X; 03-31-2004 at 01:12 PM.
Xzyx987X is offline  
Old 04-01-2004, 07:25 AM   #4
Code Goddess
 
Prelude's Avatar
 
Join Date: Sep 2001
Posts: 9,661
>Are you allowed to optimise the main() routine as well?
As long as the interface to the function isn't changed (because it is already in use), you can optimize whatever you want. And there is one big improvement that effects speed poorly if you choose to implement it.
__________________
My best code is written with the delete key.
Prelude is offline  
Old 04-07-2004, 05:43 PM   #5
Software Developer
 
jverkoey's Avatar
 
Join Date: Feb 2003
Location: University of Waterloo
Posts: 1,916
is this still going on?
jverkoey is offline  
Old 04-07-2004, 05:44 PM   #6
Code Goddess
 
Prelude's Avatar
 
Join Date: Sep 2001
Posts: 9,661
>is this still going on?
Yes.
__________________
My best code is written with the delete key.
Prelude is offline  
Old 04-08-2004, 02:39 PM   #7
Registered User
 
whackaxe's Avatar
 
Join Date: Mar 2004
Posts: 332
Quote:
Originally Posted by Prelude
Because the maintenance programmers are typically junior engineers with little or no experience
okay, that would be me, except i don't understant it at all what exactly does this do?
whackaxe is offline  
Old 04-23-2004, 06:15 PM   #8
Rad
 
gcn_zelda's Avatar
 
Join Date: Mar 2003
Posts: 942
I think Prelude was just writing a program and messed up and wants us to clean it up.

The lazy little... :P

Just kidding ;-)
gcn_zelda is offline  
Old 04-29-2005, 01:31 PM   #9
.
 
Join Date: Nov 2003
Posts: 293
Okay - the ball is in your court I don't claim much except some minor changes in open source code, but it is better than our native qsort by along shot, so it is part of our run-time.

I get:
Quote:
kcsdev:/home/jmcnama> cc isort.c -o isort +O4
kcsdev:/home/jmcnama> isort
16838 5758 10113 17515 31051 5627 23010 7419 16212 4086
4086 5627 5758 7419 10113 16212 16838 17515 23010 31051
jw_sort: 1.580000 qsort: 5.580000 Difference: -253%
jw_sort: 1.550000 qsort: 5.460000 Difference: -252%
jw_sort: 1.600000 qsort: 5.590000 Difference: -249%
jw_sort: 1.530000 qsort: 5.420000 Difference: -254%
jw_sort: 1.570000 qsort: 5.490000 Difference: -250%
jw_sort: 1.510000 qsort: 5.570000 Difference: -269%
jw_sort: 1.580000 qsort: 5.450000 Difference: -245%
jw_sort: 1.560000 qsort: 5.530000 Difference: -254%
jw_sort: 1.570000 qsort: 5.380000 Difference: -243%
jw_sort: 1.530000 qsort: 5.640000 Difference: -269%
Code we modified over time, based on source from OSF and
G Schmidt.
Code:
Profiler output:
 %Time Seconds Cumsecs  #Calls   msec/call  Name

  40.8   86.35   86.35421060654        0.00  compare
almost all of the time is spent in compare - 40.8 %

Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_BUF 64
#define CUTOFF  18


#include <limits.h>
#include <stdlib.h>
#include <string.h>

#define SWAP(a, b, size)		    \
  do							    \
    {							    \
      register size_t __size = (size);			\
      register char *__a = (a), *__b = (b);		\
      do								        \
	  {								            \
	      char __tmp = *__a;						\
	      *__a++ = *__b;						    \
	      *__b++ = __tmp;						    \
	  } while (--__size > 0);						\
    } while (0)

#define INSERT_THRESH 4

typedef struct
  {
    char *lo;
    char *hi;
  } stack_node;


#define min(x, y) ((x) < (y) ? (x) : (y))
#define STACK_SIZE	(CHAR_BIT * sizeof(size_t))
#define PUSH(low, high)	((void) ((top->lo = (low)), (top->hi = (high)), ++top))
#define	POP(low, high)	((void) (--top, (low = top->lo), (high = top->hi)))
#define	STACK_NOT_EMPTY	(stack < top)


/* 
    
   PNMquicksort() libc.sl.2.04.a   jmc   
   This incorporates improvements and 
   optimizations (see Sedgwick, 1997)

   Non-recursive, uses stack of pointers 
  
   Quicksorts just TOTAL_ELEMS / INSERT_THRESH partitions, falls into
      insertion sort to order the INSERT_THRESH items within each partition.
      
   The larger of the two sub-partitions is always pushed onto the
      stack first.
      
   Based on code from OSF 1990 and GNU (G. Schmidt).
    
   GNU disclaimer (aka warranty) which is required to be in source derived from 
   GNU:
   
      The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 1.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   
*/
      

typedef int (*Comparable) ( const void *a, const void *b );
  
void
PNMquicksort (void *const pbase, 
              size_t total_elems, 
              size_t size,
	          Comparable cmp_fx)
{
    register char *base_ptr = (char *) pbase;
    const size_t max_thresh = INSERT_THRESH * size;

    if (total_elems == 0)
    {  
        return;
    }
    
    if (total_elems > INSERT_THRESH)
      {
        char *lo = base_ptr;
        char *hi = &lo[size * (total_elems - 1)];
        stack_node stack[STACK_SIZE];
        stack_node *top = stack + 1;
    
        while (STACK_NOT_EMPTY)
          {
            char *left;
            char *right;	    
	        char *mid = lo + size * ((hi - lo) / size >> 1);
    
    	    if ((*cmp_fx) ((void *) mid, (void *) lo) < 0)
    	        SWAP (mid, lo, size);
    	    if ((*cmp_fx) ((void *) hi, (void *) mid) < 0)
    	        SWAP (mid, hi, size);
    	    else
    	        goto skip;                            /* ugh a goto ! */
    	    if ((*cmp_fx) ((void *) mid, (void *) lo) < 0)
    	        SWAP (mid, lo, size);
skip:;
	        left  = lo + size;
	        right = hi - size;    
	   
	        do
	        {
	            while ((*cmp_fx) ((void *) left, (void *) mid) < 0)
	            {
	  	            left += size;
                }
	            while ((*cmp_fx) ((void *) mid, (void *) right) < 0)
	            {
	  	            right -= size;
                }
	            if (left < right)
	  	        {
	  	            SWAP (left, right, size);
	  	            if (mid == left)
	  	              mid = right;
	  	            else if (mid == right)
	  	              mid = left;
	  	            left += size;
	  	            right -= size;
	  	        }
	            else if (left == right)
	  	        {
	  	            left += size;
	  	            right -= size;
	  	            break;
	  	        }
	        }  while (left <= right);
    
      
            if ((size_t) (right - lo) <= max_thresh)
            {
                if ((size_t) (hi - left) <= max_thresh)
	  	            POP (lo, hi);
                else
	  	            lo = left;
            }
            else if ((size_t) (hi - left) <= max_thresh)
	            hi = right;
            else if ((right - lo) > (hi - left))
            {
	            PUSH (lo, right);
                lo = left;
            }
            else
            {
                PUSH (left, hi);
                hi = right;
            }
          }
      }

    { /* insertion sort */
      char *const end_ptr = &base_ptr[size * (total_elems - 1)];
      char *tmp_ptr = base_ptr;
      char *thresh = min(end_ptr, base_ptr + max_thresh);
      register char *run_ptr;
        
      for (run_ptr = tmp_ptr + size; run_ptr <= thresh; run_ptr += size)
      {
          if ((*cmp_fx) ((void *) run_ptr, (void *) tmp_ptr) < 0)
          {
              tmp_ptr = run_ptr;
          }
      }
      
      if (tmp_ptr != base_ptr)
      {
          SWAP (tmp_ptr, base_ptr, size);
      }
      
      run_ptr = base_ptr + size;
      while ((run_ptr += size) <= end_ptr)
      {
	      tmp_ptr = run_ptr - size;
	      while ((*cmp_fx) ((void *) run_ptr, (void *) tmp_ptr) < 0)
	          tmp_ptr -= size;
          
	      tmp_ptr += size;
          if (tmp_ptr != run_ptr)
          {
              char *trav;
          
	          trav = run_ptr + size;
	          while (--trav >= run_ptr)
              {
                  char c = *trav;
                  char *hi, *lo;
          
                  for (hi = lo = trav; (lo -= size) >= tmp_ptr; hi = lo)
                  {
                      *hi = *lo;
                  }
                  *hi = c;
              }
          }
      }
    }
}


void jw_sort ( void *base, const size_t total_elems, Comparable cmp )
{
    PNMquicksort ( base, total_elems, sizeof(int), cmp );    
}
/* improved compare */

int compare ( const void *a, const void *b )
{
    return  *(int *)a - *(int *)b;
}     

/* old slower compare */

int slow_compare ( const void *a, const void *b )
{
    int *aa = (int *)a;
    int *bb = (int *)b;
    
    if ( *aa < *bb )
      return -1;
    else if ( *aa > *bb )
      return +1;
    else
      return 0;
}


#include <time.h>

int main ( void )
{
    int a0[10]={0};
    int *a1 = malloc ( 1000000 * sizeof ( int ) );
    int i=0, 
        n=0;
    clock_t start=0;
    double jw_time=0., 
           q_time=0.;
    
    for ( i = 0; i < 10; i++ )
    {
        a0[i] = rand();
    }
    for ( i = 0; i < 10; i++ )
    {
        printf ( "%d ", a0[i] );
    }
    printf ( "\n" );
    jw_sort ( a0, 10, compare );
    for ( i = 0; i < 10; i++ )
    {
        printf ( "%d ", a0[i] );
    }
    printf ( "\n" );
    for ( n = 0; n < 10; n++ ) 
    {
        for ( i = 0; i < 1000000; i++ )
        {
            a1[i] = rand();
        }
        start = clock();
        jw_sort ( a1, 1000000, compare );
        jw_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
        printf ( "jw_sort: %f  ", jw_time );
        for ( i = 0; i < 1000000; i++ )
        {
            a1[i] = rand();
        }
        start = clock();
        qsort ( a1, 1000000, sizeof ( int ),compare );
        q_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
        printf ( "qsort: %f  Difference: %.0f%%\n", 
                 q_time, 
                 ( 1.0 - ( q_time / jw_time ) ) * 100 );                  

    }
    free ( a1 );    
    return 0;
}

Last edited by jim mcnamara; 04-29-2005 at 01:33 PM.
jim mcnamara is offline  
Old 04-29-2005, 01:53 PM   #10
Anti-Poster
 
Join Date: Feb 2002
Posts: 1,212
Man...it took a year for someone to crank out an answer? Forget it; I'm not entering any of Prelude's contests. That's some dedication.
__________________
Rule #1: Every rule has exceptions

Traveller's Dilemma Contest Site - Results posted!
pianorain is offline  
Old 04-29-2005, 03:06 PM   #11
& the hat of GPL slaying
 
Thantos's Avatar
 
Join Date: Sep 2001
Posts: 5,730
jim mcnamara you've been here long enough to know you shouldn't bump year old threads.
Thantos is offline  
Closed Thread

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Line of data input method larry_2k4 C Programming 2 04-28-2009 11:34 PM
Turn Off Optimization? danlee58 C Programming 6 12-10-2008 03:52 AM
need reading material for c++ database optimization elninio C++ Programming 0 07-24-2008 11:32 PM
Tutorial review Prelude A Brief History of Cprogramming.com 11 03-22-2004 09:40 PM
Request for comments Prelude A Brief History of Cprogramming.com 15 01-02-2004 10:33 AM


All times are GMT -6. The time now is 05:52 AM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.0 RC2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22