Thread: quick sort

  1. #1
    Registered User
    Join Date
    Apr 2019
    Posts
    808

    quick sort

    following my previous posts i have been trying to follow the logic of quick sort here is the example
    Code:
    void quicksort(int data[], int first, int last, int *count)
    {
        int i, j, pivot, temp;
    
        *count += 1;
        if(first<last)
        {
            pivot=first;
            i=first;
            j=last;
    
            while(i<j)
            {
                while(data[i] <= data[pivot] && i < last)
                    i++;
                while(data[j] > data[pivot])
                    j--;
                if(i<j)
                {
                    temp=data[i];
                    data[i]=data[j];
                    data[j]=temp;
                }
            }
    
            temp=data[pivot];
            data[pivot]=data[j];
            data[j]=temp;
            quicksort(data,first,j-1, count);
            quicksort(data,j+1,last, count);
    
        }
    }
    looking at the code i am wondering if this is a good example as there seems to be unneeded if statement ( if (i < j) ) in the while loop while (i < j) however that aside im a little confused as to how it decides which recursive call to quicksort it uses. is it a case of it tries the first one and if the if statement fails it does nothing and goes to the second one.

    sorry for what is probably a silly question
    coop

  2. #2
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    This is a case where both recursive calls are normally taken.
    But, I am not a quicksort algorithm expert or even intermediate level person.
    The problem is broken into two parts as the way to make the problem smaller.
    I believe several recursive algorithms use this tactic.

    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  3. #3
    Registered User
    Join Date
    Apr 2019
    Posts
    808
    ok you do need the if statement in the while loop otherwise it takes aprox 600 more steps and it isn't sorted. on closer inspection i guess i could become greater than j before the condition of the while loop is tested.

  4. #4
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    You do know there is a qsort() function in stdlib, don't ya? And it is pretty fast.

    Code:
    #include <stdlib.h>
    #include <stdio.h>
    #include <inttypes.h>
    #include <time.h>
    
    // My clock cycles counting routines!
    #include "cycle_counting.h"
    
    #define ARRAY_ELEMENTS 32
    
    static int compare( const void *, const void * );
    static void fill_array( int *, size_t );
    static void show_array( int *, size_t );
    
    int main( void )
    {
      counter_T c1, c2;
      static int array[ARRAY_ELEMENTS];
    
      // to be fair, we need to prefetch the array before measuring fill_array
      __builtin_prefetch( array, 1 ); // prefetch for writing...
    
      c1 = BEGIN_TSC();
      fill_array( array, ARRAY_ELEMENTS );
      END_TSC(&c1);
    
      show_array( array, ARRAY_ELEMENTS );
    
      // array already on cache L1d, no need to prefetch.
    
      c2 = BEGIN_TSC();
      qsort( array, ARRAY_ELEMENTS, sizeof array[0], compare );
      END_TSC(&c2);
    
      show_array( array, ARRAY_ELEMENTS );
    
      // To compare I'm showing the cycles taken from fill_array and qsort.
      // You'll see qsort is almost 3 times FASTER!
      printf( "fill took %" PRIu64 " clock cycles.\n"
              "qsort took %" PRIu64 " clock cycles.\n", 
              c1, c2 );
    }
    
    int compare( const void *ap, const void *bp )
    {
      const int *aip, *bip;
    
      aip = ap; bip = bp;
      return ( *aip > *bip ) - ( *aip < *bip );
    }
    
    void fill_array( int *p, size_t elements )
    {
      srand(time(NULL));
      
      while ( elements-- )
        *p++ = rand() % 100;  // limit from 0 to 99.
    }
    
    void show_array( int *p, size_t elements )
    {
      fputs( "{ ", stdout );
      while ( elements-- )
        printf( "%d ", *p++ );
      fputs( "}\n", stdout );
    }
    In one of my machines (i5-3570 @ 3.40GHz):
    Code:
    $ cc -O2 -o qs qs.c
    $ ./qs
    { 24 31 94 97 93 82 66 74 37 8 89 72 89 53 11 17 56 38 10 16 96 82 9 12 4 22 16 98 22 46 96 98 }
    { 4 8 9 10 11 12 16 16 17 22 22 24 31 37 38 46 53 56 66 72 74 82 82 89 89 93 94 96 96 97 98 98 }
    fill took 44250 clock cycles.
    qsort took 15394 clock cycles.
    Last edited by flp1969; 05-29-2019 at 07:33 AM.

  5. #5
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    You do know there is a qsort() function in stdlib, don't ya? And it is pretty fast.
    That's my go to when sorting.

    But I think that this is more of a learning exercise - And sorting is a great subject in computer science
    Fact - Beethoven wrote his first symphony in C

  6. #6
    Registered User
    Join Date
    May 2019
    Posts
    214
    Here's one of my favorites (and I'm certain @flp1969 that this is a study of the algorithm).

    Ok, @cooper1200, buckle in.

    First, about ( i < j ) - it's probably ok. It is a "loop termination" check. Set that aside for a moment (like you did), it should be ok. There are several versions which may do this differently, but the overall design is the same thing.

    In many years of study (we're perpetual students, right), I've never seen a book or professor actually inform students the moment sorting has actually happened. It is actually subtle and wonderful.

    There were a few PhD's earned on optimizations and details about this algorithm. Choosing the initial pivot value is it's own "thing", but IT IS NOT FIRST as your code suggests. It is supposed to be a value we might assume is near the center of the data (so this is somewhat equally dividing the data into two parts), and is nominally "median", as in not the lowest or highest, but a value near the middle. You can just choose any value, but if that HAPPENS to be the least or greatest, or close to it, the efficiency degrades highly. The "median of 3" is typically chosen - that is, 3 random samples are taken, the center of the 3 taken, and most often that is assume to belong near the middle of the array being sorted (and swapped into that position if required).

    There's a "tail" of sorts in professional builds which resort to a different type of sort (usually insert sort) when the size of the data involved is small (somewhere below 15 items).

    ...but we'll assume the algorithm is about to get going, and a suitable Pivot will be selected.

    Now, imagine you have dealt cards from a deck on the table to be sorted, set left to right as an array in memory, and in random order. Let's say there's 20 cards.

    Look at entry 10 and ask, is this obviously very low (A, 2) or very high (K,Q)? Maybe that's not a good pivot choice. Find a 5, a 6 nearby and swap it. Move this center card upward to note it is the center pivot card.

    Now, point your left finger on the leftmost card, and your right on the rightmost card.

    Rest.

    Now, compare the card on your left (you're pointing to) with the pivot card. Which is less? If the Pivot is greater, move your finger to the next card going toward the right. Repeat the comparison, which is less, the pivot or your fingered card? Keep doing this until the pivot is lower than the card under your finger. When that happens, swap those cards, but leave your finger pointing to the next card.

    Now, check your right finger and compare to the pivot. Which is greater? If the fingered card is greater, move toward the left and repeat. When the fingered card is lesser, swap with the pivot (then return to your left hand sequence). Keep this up, bouncing between left and right processing, until your hands cross.

    What just happened?

    You've moved through all of the cards on the left side. Every swap you made ensure that those on the left are less than the pivot. When you moved through the cards on your right side you've made sure all of those on the right are greater than the pivot card. You switch between left and right just so a card switched to the pivot card has a chance to jump into the right side of the collection (a K on the left would have made it into the right section, or an A or 2 on the right would have made into the left side).

    At that moment, before you get to the two recursive calls, the pivot card is in it's final, sorted position. This is the moment the first item to be sorted has been sorted into position. It is the moment I never see texts or teachers ever pause to make clear to students.

    At this point, though, that's the only card that is placed. It is so because you know, because of the process, all of the cards on the left are less than the pivot. You know the card is in it's final position, and won't be moved again, because ALL OF THE CARDS BELOW IT ARE ON THE LEFT - it is a count of all cards lower than the pivot. That establishes the position for the pivot card.

    That's really the point of the algorithm. You're tossing cards around to find which one belongs in that pivot position.

    With that step completed, the left and right sides are treated as two, smaller arrays. You choose those arrays to ignore the pivot (it should not be considered again, it's done).

    The left and right side are both processed the same way, only smaller in size. The entire operation repeats on each partition, and each step places one pivot card, until the result is a partition size of 2 (which only requires a quick look and possibly a swap if they're reversed)....and it's done.

    I have slightly over simplified the explanation. Edge cases can exist, sometimes the pivot position must drift (when badly chosen at the start of a recursion).

    It is know this performs poorly when fed data already in order (or largely in order). Some data, loosely ordered, play havok with the algorithm, and professional implementations may take effort to note the condition and recover.

    Questions?

    A few other related points:

    There are experimental versions of this which use two pivot points (making 3 sections) and 3 pivot points (making 4 sections). It turns out these aren't all that helpful usually. The dual pivot version SEEMS like it should be faster, but it really isn't most of the time. Oddly (well, researchers know why, but we're not full fledged scientists) the 3 pivot point version (creating 4 sections) can be faster, but only under the situation where the comparison function is heavy. If the sort is working with, say, localized strings (multiple languages), the performance may place a high priority on reducing the number of comparisons performed. This is where the 3 pivot version excels. However, the algorithm to do it is more complex, so when the comparison is simple (like integers), it is actually slower.

    On the other hand, it is possible to schedule the recursive calls to the left and right partitions as threads. This can lead to an explosion of threads (there will be many, many sections to sort in large sets), so a task scheduler should be used. This does not give "linear" results, but it does improve performance. It is roughly about 2.5 to 3 times faster in 4 threads, for example. There is an issue with CPU cache and RAM which impacts what benefits may be observed, so the scheduling of the sections may well need to acknowledge adjacent groups so they're cache friendly when threaded (running in parallel).
    Last edited by Niccolo; 05-29-2019 at 06:58 PM.

  7. #7
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    Quote Originally Posted by Niccolo View Post
    Here's one of my favorites (and I'm certain @flp1969 that this is a study of the algorithm).

    Ok, @cooper1200, buckle in.

    First, about ( i < j ) - it's probably ok. It is a "loop termination" check. Set that aside for a moment (like you did), it should be ok. There are several versions which may do this differently, but the overall design is the same thing.

    In many years of study (we're perpetual students, right), I've never seen a book or professor actually inform students the moment sorting has actually happened. It is actually subtle and wonderful.

    There were a few PhD's earned on optimizations and details about this algorithm. Choosing the initial pivot value is it's own "thing", but IT IS NOT FIRST as your code suggests. It is supposed to be a value we might assume is near the center of the data (so this is somewhat equally dividing the data into two parts), and is nominally "median", as in not the lowest or highest, but a value near the middle. You can just choose any value, but if that HAPPENS to be the least or greatest, or close to it, the efficiency degrades highly. The "median of 3" is typically chosen - that is, 3 random samples are taken, the center of the 3 taken, and most often that is assume to belong near the middle of the array being sorted (and swapped into that position if required).

    There's a "tail" of sorts in professional builds which resort to a different type of sort (usually insert sort) when the size of the data involved is small (somewhere below 15 items).

    ...but we'll assume the algorithm is about to get going, and a suitable Pivot will be selected.

    Now, imagine you have dealt cards from a deck on the table to be sorted, set left to right as an array in memory, and in random order. Let's say there's 20 cards.

    Look at entry 10 and ask, is this obviously very low (A, 2) or very high (K,Q)? Maybe that's not a good pivot choice. Find a 5, a 6 nearby and swap it. Move this center card upward to note it is the center pivot card.

    Now, point your left finger on the leftmost card, and your right on the rightmost card.

    Rest.

    Now, compare the card on your left (you're pointing to) with the pivot card. Which is less? If the Pivot is greater, move your finger to the next card going toward the right. Repeat the comparison, which is less, the pivot or your fingered card? Keep doing this until the pivot is lower than the card under your finger. When that happens, swap those cards, but leave your finger pointing to the next card.

    Now, check your right finger and compare to the pivot. Which is greater? If the fingered card is greater, move toward the left and repeat. When the fingered card is lesser, swap with the pivot (then return to your left hand sequence). Keep this up, bouncing between left and right processing, until your hands cross.

    What just happened?

    You've moved through all of the cards on the left side. Every swap you made ensure that those on the left are less than the pivot. When you moved through the cards on your right side you've made sure all of those on the right are greater than the pivot card. You switch between left and right just so a card switched to the pivot card has a chance to jump into the right side of the collection (a K on the left would have made it into the right section, or an A or 2 on the right would have made into the left side).

    At that moment, before you get to the two recursive calls, the pivot card is in it's final, sorted position. This is the moment the first item to be sorted has been sorted into position. It is the moment I never see texts or teachers ever pause to make clear to students.

    At this point, though, that's the only card that is placed. It is so because you know, because of the process, all of the cards on the left are less than the pivot. You know the card is in it's final position, and won't be moved again, because ALL OF THE CARDS BELOW IT ARE ON THE LEFT - it is a count of all cards lower than the pivot. That establishes the position for the pivot card.

    That's really the point of the algorithm. You're tossing cards around to find which one belongs in that pivot position.

    With that step completed, the left and right sides are treated as two, smaller arrays. You choose those arrays to ignore the pivot (it should not be considered again, it's done).

    The left and right side are both processed the same way, only smaller in size. The entire operation repeats on each partition, and each step places one pivot card, until the result is a partition size of 2 (which only requires a quick look and possibly a swap if they're reversed)....and it's done.

    I have slightly over simplified the explanation. Edge cases can exist, sometimes the pivot position must drift (when badly chosen at the start of a recursion).

    It is know this performs poorly when fed data already in order (or largely in order). Some data, loosely ordered, play havok with the algorithm, and professional implementations may take effort to note the condition and recover.

    Questions?

    A few other related points:

    There are experimental versions of this which use two pivot points (making 3 sections) and 3 pivot points (making 4 sections). It turns out these aren't all that helpful usually. The dual pivot version SEEMS like it should be faster, but it really isn't most of the time. Oddly (well, researchers know why, but we're not full fledged scientists) the 3 pivot point version (creating 4 sections) can be faster, but only under the situation where the comparison function is heavy. If the sort is working with, say, localized strings (multiple languages), the performance may place a high priority on reducing the number of comparisons performed. This is where the 3 pivot version excels. However, the algorithm to do it is more complex, so when the comparison is simple (like integers), it is actually slower.

    On the other hand, it is possible to schedule the recursive calls to the left and right partitions as threads. This can lead to an explosion of threads (there will be many, many sections to sort in large sets), so a task scheduler should be used. This does not give "linear" results, but it does improve performance. It is roughly about 2.5 to 3 times faster in 4 threads, for example. There is an issue with CPU cache and RAM which impacts what benefits may be observed, so the scheduling of the sections may well need to acknowledge adjacent groups so they're cache friendly when threaded (running in parallel).
    *Card falls out of my sleeve*...

    *Runs away*
    Fact - Beethoven wrote his first symphony in C

  8. #8
    Registered User
    Join Date
    May 2019
    Posts
    214
    Quote Originally Posted by Click_here View Post
    *Card falls out of my sleeve*...

    *Runs away*
    There's one in every casino.

  9. #9
    Registered User
    Join Date
    Apr 2019
    Posts
    808
    thanks for the explanation what happens if the pivot point is one of a group of same numbers

    i take it that sorting around the pivot is a basic bubble sort

  10. #10
    Registered User
    Join Date
    May 2019
    Posts
    214
    The actual tests in the partition can be altered slightly (and the partition passes themselves have various versions), but the comparisons are usually < or !<, meaning several equalities don't matter.

    The bubble sort isn't, itself, involved (it doesn't perform that multiple pass swap thing typical of bubble). Sorting on either side of the pivot happens because of the recursion, as the same process repeats on each of the two left/right sections just as described for the first one. When the tails are small the typical choice is the insert sort, sometimes the merge sort.

  11. #11
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    You could change the code to the example shown below, but a compiler may already be optimizing the original code to essentially the same thing.

    Code:
            while(1)
            {
                while(data[i] <= data[pivot] && i < last)
                    i++;
                while(data[j] > data[pivot])
                    j--;
                if(i >= j)
                    break;
                temp=data[i];
                data[i]=data[j];
                data[j]=temp;
            }
    One more conditional can be removed from one of the inner loops by using the middle value for pivot.

    Code:
    void quicksort(int arr[], size_t lo, size_t hi)
    {
    int tmp;
        if (lo < hi)
        {
            int pivot = arr[lo + (hi - lo) / 2];
            size_t i = lo - 1;
            size_t j = hi + 1;
            while (1)
            {
                while (arr[++i] < pivot);
                while (arr[--j] > pivot);
                if (i >= j)
                    break;
            tmp=arr[i];
            arr[i]=arr[j];
            arr[j]=tmp;
            }
            quicksort(arr, lo, j);    /* note it is j and not j-1 for this variation */
            quicksort(arr, j + 1, hi);
        }
    }
    Last edited by rcgldr; 05-30-2019 at 07:17 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. sort linked list using quick sort in C
    By Babita Sandooja in forum C Programming
    Replies: 1
    Last Post: 03-04-2014, 09:24 AM
  2. Quick Sort Help
    By sunny` in forum C Programming
    Replies: 11
    Last Post: 06-20-2012, 01:54 AM
  3. Quick Sort or Merge Sort???
    By swanley007 in forum C++ Programming
    Replies: 6
    Last Post: 11-10-2005, 06:48 PM
  4. Quick sort VS Merge Sort
    By sachitha in forum Tech Board
    Replies: 7
    Last Post: 09-03-2004, 11:57 PM
  5. Shell Sort vs Heap Sort vs Quick Sort
    By mackol in forum C Programming
    Replies: 6
    Last Post: 11-22-2002, 08:05 PM

Tags for this Thread