Thread: An exercise in optimization

  1. #1
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897

    An exercise in optimization

    Okay, I'm bored so I'll give whoever is interested something to chew on. Whoever has the fastest practical code is the winner. That means that inline assembly will be a mark against you, as will increasing the size and complexity of the algorithm considerably (to restrict you stackers a bit ).

    The Task

    You've been given the task of optimizing a quicksort routine written by one of your lovely colleagues who is currently on vacation. The optimized code needs to be integrated before she returns, so you've been given the job of making her code faster. The sort has been profiled and determined to be too slow with the recent influx of new users giving the software considerably more data to work with. The routine is used extensively in the software, so you can't change the interface.

    Because the maintenance programmers are typically junior engineers with little or no experience, you're forced to hold yourself back a little. The resulting code must be fast as possible without causing the maintenance people to have a heart attack trying to understand it.

    At current, the code runs approximately n% slower than the standard C library function qsort. Sadly, the implementation of qsort on your machines interacts poorly with your software and cannot be used in place of the custom sort. Because your lovely colleague was so careful in her original implementation, you find a handy scaffolding program to help you out. Here it is:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define MAX_BUF 50
    #define CUTOFF  18
    
    typedef int (*Comparable) ( const void *a, const void *b );
    
    void jw_insertion_sort ( void *base, int size, int n, Comparable cmp )
    {
      unsigned char save[MAX_BUF];
      unsigned char *cbase = base;
      int i, j;
    
      for ( i = 1; i < n; i++ ) {
        memcpy ( save, cbase + ( i * size ), size );
        for ( j = i; j > 0 && cmp ( cbase + ( ( j - 1 ) * size ), save ) > 0; j-- )
          memcpy ( cbase + ( j * size ), cbase + ( ( j - 1 ) * size ), size );
        memcpy ( cbase + ( j * size ), save, size );
      }
    }
    
    void swap_block ( void *a, void *b, int size )
    {
      unsigned char save_block[MAX_BUF];
    
      memcpy ( save_block, a, size );
      memcpy ( a, b, size );
      memcpy ( b, save_block, size );
    }
    
    void jw_quick_sort ( void *base, int size, int low, int high, Comparable cmp )
    {
      unsigned char save[MAX_BUF];
      unsigned char *cbase = base;
      int left = low, right = high + 1;
      int r;
    
      if ( high - low < CUTOFF )
        return;
      r = (int) ( ( (double)rand() * ( high - low ) ) / (double)RAND_MAX + low );
      swap_block ( cbase + ( low * size ), cbase + ( r * size ), size );
      memcpy ( save, cbase + ( low * size ), size );
      for ( ; ; ) {
        while ( ++left <= high && cmp ( cbase + ( left * size ), save ) < 0 )
          ;
        while ( --right >= 0 && cmp ( cbase + ( right * size ), save ) > 0 )
          ;
        if ( left > right )
          break;
        swap_block ( cbase + ( left * size ), cbase + ( right * size ), size );
      }
      swap_block ( cbase + ( low * size ), cbase + ( right * size ), size );
      jw_quick_sort ( base, size, low, right - 1, cmp );
      jw_quick_sort ( base, size, right + 1, high, cmp );
    }
    
    void jw_sort ( void *base, int size, int n, Comparable cmp )
    {
      jw_quick_sort ( base, size, 0, n - 1, cmp );
      jw_insertion_sort ( base, size, n, cmp );
    }
    
    int compare ( const void *a, const void *b )
    {
      int *aa = (int *)a;
      int *bb = (int *)b;
    
      if ( *aa < *bb )
        return -1;
      else if ( *aa > *bb )
        return +1;
      else
        return 0;
    }
    
    #include <time.h>
    
    int main ( void )
    {
      int a0[10];
      int *a1 = malloc ( 1000000 * sizeof ( int ) );
      int i, n;
      clock_t start;
      double jw_time, q_time;
    
      for ( i = 0; i < 10; i++ )
        a0[i] = rand();
      for ( i = 0; i < 10; i++ )
        printf ( "%d ", a0[i] );
      printf ( "\n" );
      jw_sort ( a0, sizeof ( int ), 10, compare );
      for ( i = 0; i < 10; i++ )
        printf ( "%d ", a0[i] );
      printf ( "\n" );
      for ( n = 0; n < 10; n++ ) {
        for ( i = 0; i < 1000000; i++ )
          a1[i] = rand();
        start = clock();
        jw_sort ( a1, sizeof ( int ), 1000000, compare );
        jw_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
        printf ( "jw_sort: %f  ", jw_time );
        for ( i = 0; i < 1000000; i++ )
          a1[i] = rand();
        start = clock();
        qsort ( a1, 1000000, sizeof ( int ), compare );
        q_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
        printf ( "qsort: %f  Difference: %.0f%%\n", q_time, ( 1.0 - ( q_time / jw_time ) ) * 100 );
      }
      free ( a1 );
    
      return 0;
    }
    The scaffolding reinforces what you learned while profiling the code, jw_sort runs about n% slower than qsort. At the time that the code was written this was an acceptable difference so the author chose not to complicate the functions any more.

    Use any means you want to make jw_sort as fast as possible without causing the poor maintenance programmers fits. The data that the software processes can be anything from single characters to large heterogeneous records, so a generalized routine is required. Unfortunately, your colleague trusts that her co-workers are not stupid and is sparse with her commenting. You have to figure out her algorithm on your own. Since there aren't any comments, it shouldn't be too unusual of a solution you believe.

    Back to reality

    This is a reasonably realistic situation, which is a departure from most of our contests. But you'll find yourself learning quite a bit about quicksort by optimizing a typical implementation that you may have to decipher first. This is not a closed entry contest, so as soon as you have a submission, feel free to post it for everyone to see. This promotes multiple submissions and a high level of competitiveness since your opponents will be able to use your ideas in their own solutions. There is no time limit. Your colleague takes long vacations. Enjoy!
    My best code is written with the delete key.

  2. #2
    Registered User linuxdude's Avatar
    Join Date
    Mar 2003
    Location
    Louisiana
    Posts
    926
    wow, I can't even imagine. Good luck people who attempt this feat.

  3. #3
    Registered User Xzyx987X's Avatar
    Join Date
    Sep 2003
    Posts
    107
    Hmm... I think I might try to do this a bit later to see what I can come up with. Are you allowed to optimise the main() routine as well?

    Edit: Err... nm. Now that I looked at it closely, optimising main() wouldn't make any difference.
    Last edited by Xzyx987X; 03-31-2004 at 01:12 PM.

  4. #4
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >Are you allowed to optimise the main() routine as well?
    As long as the interface to the function isn't changed (because it is already in use), you can optimize whatever you want. And there is one big improvement that effects speed poorly if you choose to implement it.
    My best code is written with the delete key.

  5. #5
    Software Developer jverkoey's Avatar
    Join Date
    Feb 2003
    Location
    New York
    Posts
    1,905
    is this still going on?

  6. #6
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >is this still going on?
    Yes.
    My best code is written with the delete key.

  7. #7
    Registered User whackaxe's Avatar
    Join Date
    Mar 2004
    Posts
    332
    Quote Originally Posted by Prelude
    Because the maintenance programmers are typically junior engineers with little or no experience
    okay, that would be me, except i don't understant it at all what exactly does this do?

  8. #8
    Rad gcn_zelda's Avatar
    Join Date
    Mar 2003
    Posts
    942
    I think Prelude was just writing a program and messed up and wants us to clean it up.

    The lazy little... :P

    Just kidding ;-)

  9. #9
    .
    Join Date
    Nov 2003
    Posts
    307
    Okay - the ball is in your court I don't claim much except some minor changes in open source code, but it is better than our native qsort by along shot, so it is part of our run-time.

    I get:
    kcsdev:/home/jmcnama> cc isort.c -o isort +O4
    kcsdev:/home/jmcnama> isort
    16838 5758 10113 17515 31051 5627 23010 7419 16212 4086
    4086 5627 5758 7419 10113 16212 16838 17515 23010 31051
    jw_sort: 1.580000 qsort: 5.580000 Difference: -253%
    jw_sort: 1.550000 qsort: 5.460000 Difference: -252%
    jw_sort: 1.600000 qsort: 5.590000 Difference: -249%
    jw_sort: 1.530000 qsort: 5.420000 Difference: -254%
    jw_sort: 1.570000 qsort: 5.490000 Difference: -250%
    jw_sort: 1.510000 qsort: 5.570000 Difference: -269%
    jw_sort: 1.580000 qsort: 5.450000 Difference: -245%
    jw_sort: 1.560000 qsort: 5.530000 Difference: -254%
    jw_sort: 1.570000 qsort: 5.380000 Difference: -243%
    jw_sort: 1.530000 qsort: 5.640000 Difference: -269%
    Code we modified over time, based on source from OSF and
    G Schmidt.
    Code:
    Profiler output:
     %Time Seconds Cumsecs  #Calls   msec/call  Name
    
      40.8   86.35   86.35421060654        0.00  compare
    almost all of the time is spent in compare - 40.8 %

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define MAX_BUF 64
    #define CUTOFF  18
    
    
    #include <limits.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define SWAP(a, b, size)		    \
      do							    \
        {							    \
          register size_t __size = (size);			\
          register char *__a = (a), *__b = (b);		\
          do								        \
    	  {								            \
    	      char __tmp = *__a;						\
    	      *__a++ = *__b;						    \
    	      *__b++ = __tmp;						    \
    	  } while (--__size > 0);						\
        } while (0)
    
    #define INSERT_THRESH 4
    
    typedef struct
      {
        char *lo;
        char *hi;
      } stack_node;
    
    
    #define min(x, y) ((x) < (y) ? (x) : (y))
    #define STACK_SIZE	(CHAR_BIT * sizeof(size_t))
    #define PUSH(low, high)	((void) ((top->lo = (low)), (top->hi = (high)), ++top))
    #define	POP(low, high)	((void) (--top, (low = top->lo), (high = top->hi)))
    #define	STACK_NOT_EMPTY	(stack < top)
    
    
    /* 
        
       PNMquicksort() libc.sl.2.04.a   jmc   
       This incorporates improvements and 
       optimizations (see Sedgwick, 1997)
    
       Non-recursive, uses stack of pointers 
      
       Quicksorts just TOTAL_ELEMS / INSERT_THRESH partitions, falls into
          insertion sort to order the INSERT_THRESH items within each partition.
          
       The larger of the two sub-partitions is always pushed onto the
          stack first.
          
       Based on code from OSF 1990 and GNU (G. Schmidt).
        
       GNU disclaimer (aka warranty) which is required to be in source derived from 
       GNU:
       
          The GNU C Library is free software; you can redistribute it and/or
       modify it under the terms of the GNU Lesser General Public
       License as published by the Free Software Foundation; either
       version 1.1 of the License, or (at your option) any later version.
    
       The GNU C Library is distributed in the hope that it will be useful,
       but WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
       Lesser General Public License for more details.
    
       
    */
          
    
    typedef int (*Comparable) ( const void *a, const void *b );
      
    void
    PNMquicksort (void *const pbase, 
                  size_t total_elems, 
                  size_t size,
    	          Comparable cmp_fx)
    {
        register char *base_ptr = (char *) pbase;
        const size_t max_thresh = INSERT_THRESH * size;
    
        if (total_elems == 0)
        {  
            return;
        }
        
        if (total_elems > INSERT_THRESH)
          {
            char *lo = base_ptr;
            char *hi = &lo[size * (total_elems - 1)];
            stack_node stack[STACK_SIZE];
            stack_node *top = stack + 1;
        
            while (STACK_NOT_EMPTY)
              {
                char *left;
                char *right;	    
    	        char *mid = lo + size * ((hi - lo) / size >> 1);
        
        	    if ((*cmp_fx) ((void *) mid, (void *) lo) < 0)
        	        SWAP (mid, lo, size);
        	    if ((*cmp_fx) ((void *) hi, (void *) mid) < 0)
        	        SWAP (mid, hi, size);
        	    else
        	        goto skip;                            /* ugh a goto ! */
        	    if ((*cmp_fx) ((void *) mid, (void *) lo) < 0)
        	        SWAP (mid, lo, size);
    skip:;
    	        left  = lo + size;
    	        right = hi - size;    
    	   
    	        do
    	        {
    	            while ((*cmp_fx) ((void *) left, (void *) mid) < 0)
    	            {
    	  	            left += size;
                    }
    	            while ((*cmp_fx) ((void *) mid, (void *) right) < 0)
    	            {
    	  	            right -= size;
                    }
    	            if (left < right)
    	  	        {
    	  	            SWAP (left, right, size);
    	  	            if (mid == left)
    	  	              mid = right;
    	  	            else if (mid == right)
    	  	              mid = left;
    	  	            left += size;
    	  	            right -= size;
    	  	        }
    	            else if (left == right)
    	  	        {
    	  	            left += size;
    	  	            right -= size;
    	  	            break;
    	  	        }
    	        }  while (left <= right);
        
          
                if ((size_t) (right - lo) <= max_thresh)
                {
                    if ((size_t) (hi - left) <= max_thresh)
    	  	            POP (lo, hi);
                    else
    	  	            lo = left;
                }
                else if ((size_t) (hi - left) <= max_thresh)
    	            hi = right;
                else if ((right - lo) > (hi - left))
                {
    	            PUSH (lo, right);
                    lo = left;
                }
                else
                {
                    PUSH (left, hi);
                    hi = right;
                }
              }
          }
    
        { /* insertion sort */
          char *const end_ptr = &base_ptr[size * (total_elems - 1)];
          char *tmp_ptr = base_ptr;
          char *thresh = min(end_ptr, base_ptr + max_thresh);
          register char *run_ptr;
            
          for (run_ptr = tmp_ptr + size; run_ptr <= thresh; run_ptr += size)
          {
              if ((*cmp_fx) ((void *) run_ptr, (void *) tmp_ptr) < 0)
              {
                  tmp_ptr = run_ptr;
              }
          }
          
          if (tmp_ptr != base_ptr)
          {
              SWAP (tmp_ptr, base_ptr, size);
          }
          
          run_ptr = base_ptr + size;
          while ((run_ptr += size) <= end_ptr)
          {
    	      tmp_ptr = run_ptr - size;
    	      while ((*cmp_fx) ((void *) run_ptr, (void *) tmp_ptr) < 0)
    	          tmp_ptr -= size;
              
    	      tmp_ptr += size;
              if (tmp_ptr != run_ptr)
              {
                  char *trav;
              
    	          trav = run_ptr + size;
    	          while (--trav >= run_ptr)
                  {
                      char c = *trav;
                      char *hi, *lo;
              
                      for (hi = lo = trav; (lo -= size) >= tmp_ptr; hi = lo)
                      {
                          *hi = *lo;
                      }
                      *hi = c;
                  }
              }
          }
        }
    }
    
    
    void jw_sort ( void *base, const size_t total_elems, Comparable cmp )
    {
        PNMquicksort ( base, total_elems, sizeof(int), cmp );    
    }
    /* improved compare */
    
    int compare ( const void *a, const void *b )
    {
        return  *(int *)a - *(int *)b;
    }     
    
    /* old slower compare */
    
    int slow_compare ( const void *a, const void *b )
    {
        int *aa = (int *)a;
        int *bb = (int *)b;
        
        if ( *aa < *bb )
          return -1;
        else if ( *aa > *bb )
          return +1;
        else
          return 0;
    }
    
    
    #include <time.h>
    
    int main ( void )
    {
        int a0[10]={0};
        int *a1 = malloc ( 1000000 * sizeof ( int ) );
        int i=0, 
            n=0;
        clock_t start=0;
        double jw_time=0., 
               q_time=0.;
        
        for ( i = 0; i < 10; i++ )
        {
            a0[i] = rand();
        }
        for ( i = 0; i < 10; i++ )
        {
            printf ( "%d ", a0[i] );
        }
        printf ( "\n" );
        jw_sort ( a0, 10, compare );
        for ( i = 0; i < 10; i++ )
        {
            printf ( "%d ", a0[i] );
        }
        printf ( "\n" );
        for ( n = 0; n < 10; n++ ) 
        {
            for ( i = 0; i < 1000000; i++ )
            {
                a1[i] = rand();
            }
            start = clock();
            jw_sort ( a1, 1000000, compare );
            jw_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
            printf ( "jw_sort: %f  ", jw_time );
            for ( i = 0; i < 1000000; i++ )
            {
                a1[i] = rand();
            }
            start = clock();
            qsort ( a1, 1000000, sizeof ( int ),compare );
            q_time = ( (double)clock() - start ) / CLOCKS_PER_SEC;
            printf ( "qsort: %f  Difference: %.0f%%\n", 
                     q_time, 
                     ( 1.0 - ( q_time / jw_time ) ) * 100 );                  
    
        }
        free ( a1 );    
        return 0;
    }
    Last edited by jim mcnamara; 04-29-2005 at 01:33 PM.

  10. #10
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Man...it took a year for someone to crank out an answer? Forget it; I'm not entering any of Prelude's contests. That's some dedication.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  11. #11
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    jim mcnamara you've been here long enough to know you shouldn't bump year old threads.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Line of data input method
    By larry_2k4 in forum C Programming
    Replies: 2
    Last Post: 04-28-2009, 11:34 PM
  2. Turn Off Optimization?
    By danlee58 in forum C Programming
    Replies: 6
    Last Post: 12-10-2008, 03:52 AM
  3. need reading material for c++ database optimization
    By elninio in forum C++ Programming
    Replies: 0
    Last Post: 07-24-2008, 11:32 PM
  4. Tutorial review
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 11
    Last Post: 03-22-2004, 09:40 PM
  5. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 10:33 AM