Quicksort's recursion

**Adak** · 11-26-2012

Originally Posted by thames

I was studying the function checkIt and I understood it's evaluating if the array was already sorted. However what are the situations in which the array wouldn't be sorted?

edit:

is selection sort too slow for small arrays? and bubble sort ?

It is used to check whether the sort program sorted correctly or not. It wouldn't sort correctly because when I wrote the code, I made zee leetle airrare, eh? <smile>

I have a Combsort that is quite "brittle", also an iterative Quicksort that is just like glass I swear - used to run fine, but on the newer cpu's and chipsets, it's WAY too touchy. Also, a FAST sort program that was published with an error in it, in Byte Magazine - things like that.

I have no use whatsoever for Selection sort. Never liked it, and on my PC's, it was always the slowest sorter. I view it's algorithm as the worst of any (somewhat popular) sorting algorithm. Pathetic about covers it. (sorry Nominal!).

Bubble sort is OK for small number of things, but mostly, it just has a catchy name. IMO Insertion sort should be the "low end" sorting algorithm, because:

1) It's stable, by design
2) It's very fast with a small number of things to sort, and with values that are already sorted (it happens).
3) It's shorter than Bubble sort after Bubble sort is optimized.
4) It's about 3.5 times faster on random integers, with small arrays, than Bubble sort. (With optimizations, the sorting speeds even out as the size of the array grows larger). By that time, you'll definitely want something better!

**Nominal Animal** · 11-26-2012

Originally Posted by Adak

I have no use whatsoever for Selection sort. Never liked it, and on my PC's, it was always the slowest sorter.

Really? My measurements disagree.

Using Athlon II X4 640 (with a stable time stamp counter), median cycle counts for test sorts for uniform random input were

Code:

#  N ints, no padding, 100000 tests, +66 cycles overhead
#  N    Bubble Selection Insertion
   4        59        41       116
   6       149       167       284
   8       321       335       459
  12       804       750       882
  16      1391      1224      1373
  24      2901      2436      2522
  32      4850      3969      3858
  40      7253      5804      5358
  48     10127      7923      7012
  56     13493     10321      8799
  64     17363     12996     10750

#  N ints, 12 chars padding, 100000 tests, +66 cycles overhead
#  N    Bubble Selection Insertion
   4        75        60       119
   6       171       189       269
   8       321       375       422
  12       845       809       778
  16      1514      1313      1203
  24      3120      2588      2154
  32      5092      4208      3282
  40      7430      6182      4562
  48     10145      8499      5959
  56     13230     11160      7500
  64     16847     14145      9160

#  N ints, 124 chars padding, 100000 tests, +66 cycles overhead
#  N    Bubble Selection Insertion
   4       240       167       224
   6       573       352       488
   8      1074       600       794
  12      2624      1134      1593
  16      4748      1748      2408
  24     10862      3228      4733
  32     18602      4925      7566
  40     28238      7031     11066
  48     40846      9410     15161
  56     55593     12045     19851
  64     71924     14864     25133

Here is the array element the above functions sort:

Code:

typedef int  key_t;

struct data {
    key_t key;
#ifdef PADDING
#if PADDING > 0
    char  padding[PADDING];
#endif
#endif
};

with PADDING defined at compile time as per comments in the above results. The actual sort functions were compiled optimized (gcc -W -Wall -O3 -fomit-frame-pointer) separately from other parts. The sort functions themselves, should you care to test, are:

Code:

/* Swap two data entries.
*/
static inline void data_swap(struct data *const data, const size_t dst, const size_t src)
{
    if (dst != src) {
        struct data  temp;
        temp      = data[src];
        data[src] = data[dst];
        data[dst] = temp;
    }
}

/* Insert src into dst, if src > dst.
*/
static inline void data_insert(struct data *const data, const size_t dst, const size_t src)
{
    if (dst < src) {
        struct data  temp;
        temp = data[src];
        memmove(data + dst + 1, data + dst, (src - dst) * sizeof *data);
        data[dst] = temp;
    }
}


void no_sort(struct data *const data __attribute__((unused)), size_t const elements __attribute__((unused)))
{
    return;
}


void bubble_sort(struct data *const data, size_t const elements)
{
    size_t  n = elements + 1;

    while (n-- > 0) {
        size_t  i = 1;

        while (i < n && data[i-1].key <= data[i].key)
            i++;

        if (i >= n)
            return;

        while (i < n) {
            if (data[i-1].key > data[i].key)
                data_swap(data, i-1, i);
            i++;
        }
    }
}


void selection_sort(struct data *const data, size_t const elements)
{
    size_t  i = elements;

    while (i-- > 1) {
        size_t  max_ref = i;
        key_t   max_key = data[i].key;

        size_t  k = i;
        while (k-- > 0)
            if (data[k].key > max_key) {
                max_key = data[k].key;
                max_ref = k;
            }

        data_swap(data, max_ref, i);
    }
}


void insertion_sort(struct data *const data, size_t const elements)
{
    size_t  i;

    for (i = 1; i < elements; i++) {
        const key_t  key_i = data[i].key;

        if (data[0].key > key_i)
            data_insert(data, 0, i);
        else {
            size_t  k = i;

            while (data[k - 1].key > key_i)
                k--;

            data_insert(data, k, i);
        }
    }
}

I did try to avoid cache effects -- each function sorted their own arrays, with identical new data generated for each before each timing iteration. As mentioned above, the median cycle counts were collected from hundred thousand iterations.

You could say that sorting small arrays of large structures (124 chars or 31 ints of payload per sort key) is silly, but I won't rule it out. That's where I said selection sort works well, and the data above seems to back me up on that.

My insertion sort routine could do with a bit more finesse, too. If the cutoff for quicksort is around the 32 element mark, as I've used, then selection sort does not look too bad. Remember that the idea is to use the least number of cycles overall, not pick the one that gives the minimum in some corner case. So you need to consider the relative differences, and the typical array size.

These results are not conclusive in any way, as only uniformly random source data was even tested: bubble sort and insertion sort would have much more of an advantage with nearly-sorted or even partially-sorted inputs. However, these are the basis for my earlier statements, and I definitely refuse to discard any of the sorting algorithms outright.

**iMalc** · 11-27-2012

Your bubble sort has a separate pass to determine the sortedness, that's probably what's slowing it down. Normally one would optimise over the naieve bubblesort by either using a flag to detect if the pass made any swaps, or even better you'd remember the position of the highest swap in that pass, allowing you to not only detect when no more passes are required, but also when you can skip a whole lot of longer passes and go pick up from one of the shorter ones.
This makes bubble sort also able to perform fewer compares than selection sort, and beat the performance of selection sort in many cases.
It's why the two most commonly used algorithms for small N are those two, with selection sort probably coming in third.

**Adak** · 11-27-2012

Really?

There are two optimizations that I know of, commonly used with Bubble sort. Your Bubble sort, has neither of them.

In Insertion sort - although I haven't seen the assembly code it generates, it appears you've stretched out the tight inner loop that is it's best feature.

Why don't you just add your favorite version of Selection sort, into the program I posted previously?

I've tested Selection sort hundreds of times, on several PC's, because it's part of my larger program that tests sorting algorithms on random int's.

It sucks.

But I have not tested it on miniscule quantities of 5 or 12, or anything less than 50 integers.

**iMalc** · 11-27-2012

Adapting the best bubble sort version off my website:

Code:

void bubble_sort(struct data *const a, size_t const n) {
    size_t i = n-1;
    do {
        size_t k = 0;
        for (size_t j = 0; j < i; ++j) {
            if (a[j + 1].key < a[j].key) {
                data_swap(j, j + 1);
                k = j;
            }
        }
        i = k;
    } while (i > 0);
}

I write every sorting algroithm on my website that I can, in terms of the less-than operator for comparing items.

I would Remove the unecessary test for swapping with self inside data_swap. It only saves time in very very few cases, and adds extra effort in every case.

I don't particularly like using the same identifier as the type of the struct and the name of the struct variable, since that is not legal in C++.

Oh and just to be clear, I had found that of the O(n*n) sorting algorithms used for values of small N, within other algorithms wuch as merge sort, that insertion sort was most common, followed closely by bubble sort, and finally by insertion sort.

**Nominal Animal** · 11-27-2012

Originally Posted by iMalc

Normally one would optimise over the naieve bubblesort by either using a flag to detect if the pass made any swaps

Obviously.

My code is a two-state state machine, where the first inner while loop locates the initial pair to swap. If none, it is done. Otherwise it continues into the actual swapping loop. The second inner while loop, the swapping loop, is only executed when swaps are done, thus it also doubles as the swapped flag.

Originally Posted by Adak

Why don't you just add your favorite version of Selection sort, into the program I posted previously?

clock() does not have enough precision on my machine. It reports zero execution times.

Also, I'd rather have the sort functions compiled separately, and work on argument arrays, not on global arrays.

Originally Posted by Adak

I have not tested it on miniscule quantities of 5 or 12, or anything less than 50 integers.

It seems that quicksort is faster for uniform random data at much smaller sizes, roughly 30 on my machine. I don't use bubblesort/selection sort/insertion sort on large arrays, only on these minuscule ones.

(My data tends to be clustered, nearly quantized, but rarely sorted.)

Attached is a zip file containing a simple C/C++ harness for testing in-place sort algorithms. Any improvements, fixes, et cetera are warmly welcome. You can even run with it as your own, since I consider the contents of the zip file either public domain, CC-0, or non-copyrightable (due to obviousness and so on).

I haven't tested it on Windows, but I included code that I think should use the performance counters Windows provides. On other systems it uses the realtime POSIX.1-2008 clock_gettime(CLOCK_PROCESS_CPUTIME_ID) clocks. You can add others by modifying the cputime.h file.

The idea of the harness is that you only need to add your sort function to the sort.c file, add its name and function to the array of structures at the end of the file, and recompile and run the program. I use GCC-4.6.3, and it compiles also on GNU G++, but again, I haven't tested other compilers.

I expect results to differ on different compilers, but it would be useful if the harness produced reproducible results, so the sort functions could be compared against each other on different compilers and architectures. It might provide interesting insights on the implementation specifics wrt. different C/C++ compilers, too.

Running make will compile and run an example set of sort tests. It will output verbose data, and save tabular data in result files.

(The harness program itself takes one or more command-line parameters, defining the number of iterations, the size of the array, and optionally the amount of sorted data in the array -- actually I was just lazy, and shuffle the data instead; that should be fixed to using insertion shuffle to get short sorted runs too. The Makefile contains a default set of commands. Run ./main without parameters to see the usage details. Tabular data is to standard output, verbose to standard error.)

If you have Bash and Gnuplot, run ./plots.sh to display the plots of the run times. It will also save them as SVG files in the working directory. (I thought of attaching them to this post, but I decided against that: I'd prefer to collaborate, not compete.)

I would quite much enjoy discussing these algorithms and implementations constructively. Perhaps in another thread, as we're pretty far from the original topic here.

Unless we have something hands-on we can test, we're just expressing our opinions -- which is not that useful; I'm more interested in hard facts and results and theories that can be tested. (I don't care if we disagree, but I care why we disagree. If that makes sense.)

In particular, my result sets do not match iMalc's and Adak's comments. I'd be interested to discuss why, but we'd need to be able to run the sorts in a comparative environment, so we can compare our results, and fine-tune the harness, so the results can be repeated and demonstrated by others. Opinions and results from other tests are not comparable, as there are too many variables that affect the results.

At minimum, if you could extract and compile the attached files, and report any errors (preferably with fixes, please!) when compiled on Windows, I'd be very grateful.

**Adak** · 11-28-2012

Originally Posted by iMalc

Oh and just to be clear, I had found that of the O(n*n) sorting algorithms used for values of small N, within other algorithms wuch as merge sort, that insertion sort was most common, followed closely by bubble sort, and finally by insertion sort. (you meant Selection sort).

I couldn't get your Selection sort to work Nominal Animal, but I grabbed one from Wiki, and added it onto my new comparative sort program I posted above.

And I can see that Selection sort has benefited greatly from the old DOS days on a (286).

It handily beats Bubble sort on every sorting run except where the data is nearly sorted/already sorted. It even beats Insertion sort by a length, on two out of the three tests - which is a big surprise!

Any sorting program that can beat Insertion sort on randomized data, is OK by me - Selection sort is back from the ashes!

The specialty of Insertion sort still give it the better marks (inherently a stable sort, and absolutely blinding speed with near sorted or sorted data), but there is no doubt that Selection sort is capable, and still has it's unique gift of being THE most intuitive sorting algorithm ever discovered imo, AND makes far fewer swaps than most algorithms, on random data.

I don't know if it's the larger cache sizes in the chips today, or better compilers (probably both), but two sorting algorithms have really benefited:

Shell sort. Used to be an ugly step-sister to the big three (Merge, Heap, and Quicksort), but now it's really stepped up. In my tests, it sorts 100,000 random int's, neck and neck with Quicksort.

Only with a larger number of integers, does Shell sort begin to fall behind.

and of course,

Selection sort. I grew to really dislike Selection sort because it was such a weiner dog in the tests that I made. In fact, I had deleted it from my test programs.

Thanks for the tip on this, Nominal Animal!

I'll check out your "harness" for sorting.

**thames** · 11-28-2012

is the classic Data Structures Using C (the blue book) a good Data Structures book ?

**Nominal Animal** · 11-28-2012

Originally Posted by Adak

I can see that Selection sort has benefited greatly from the old DOS days on a (286).

The CPU architectures have undergone some serious changes. We now have branch prediction, efficient caching algorithms, even speculative execution, on typical processors. I suspect the relative cost of comparison has significantly decreased compared to the other operations, making Selection sort useful at least for some cases.

Which is exactly my point with regards to sorting algorithms: investigating them, and looking at their features, strengths, and weaknesses, you can find scenarios where any one will beat the others. Some of the scenarios are not realistic -- or are not realistic right now --, but as a programmer, it is very good to have a varied tool box to pick your tools from.

I cannot find a reference to it right now, but I remember an article about algorithms that were first discovered in the early 1900s, and are now being re-discovered, as they become very useful again -- only this time implemented in hardware, and not by humans.

Originally Posted by Adak

Shell sort. Used to be an ugly step-sister to the big three (Merge, Heap, and Quicksort), but now it's really stepped up.

Yes. As the Wikipedia article on Shellsort describes, there has been a lot of theoretical (and practical) research into the different gap sequences. While many of them have either the same or nearly the same theoretical complexity or bounds, each sequence tends to have different real-world behaviour wrt. different types of input.

In particular, different Shellsorts implemented into my harness will very likely exhibit different relative strengths/weaknesses for different types of data -- as if they were different algorithms altogether.

Current CPUs tend to be able to do rather complicated computations in the same time they can access a random memory location; given sufficient amounts of data, the memory bandwidth is surprisingly often the bottleneck[*]. This has moved more complex algorithms, and algorithms like Shellsort that need to e.g. compute an integer expression to determine what to do next, from theoretical curiosities to very practical tools.
[*] The converse is also true. If you have an array with just a few items, that fits in a cache line (32 bytes is very common currently), a linear search over sorted data may be faster than a binary search. This is due to the CPU architecture; long pipelines, speculative execution, fast local caches -- basically, the CPU has features that allow it to execute the "slow dumb" code faster than it would the "fast smart" code. The only way to keep your code at peak efficiency, really, is to not assume too much, and to test and verify any assumptions you do make.

Originally Posted by Adak

I'll check out your "harness" for sorting.

If you have any fixes or enhancements or anything, I'd very much like to have your input. And as I said before, I'd like to have the code splayed out here somewhere, so we could try out different solutions, and compare them against each other. Collaboration, not competition.

Theoretical discussions about the number of comparisons or swaps or insertions or the complexity or bounds is fine, but it does not a program make. Real-world results depend on much more details, some of which are out of our control -- which means testing on different compilers and hardware is really the only way to find out which real-world effects are due to the algorithm, and infer the real-world causes for the rest of the effects.

The way I try to do my measurements is such that the lower 80% to 95% of measurements are reliable, and errors are likely clustered to the upper end of the spectrum. (In some cases I can see CPU usage spikes in the measurements, for example.) It also means that if the curves don't look smooth, better retry the tests a few times (with different concurrent loads on the machine), to see if the results are true, or just an artefact of extraneous events.

Since minimum really maps to "optimum" conditions, I like to use median, as you know that you'll achieve the median (or faster) at least 50% of the time. (In real world usage, you'll have other overheads, of course, but those overheads don't usually depend on the algorithm at all.)

If we could tell what fraction of the measurements were erroneous -- they're almost certainly at the end, so we could just forget about those by truncating the sorted elapsed CPU times -- we could compute the expected times for each sort function. Since I don't have any real basis (other than inspecting the timing histograms) for any specific figure, I don't do that.

Finally, the "harness" does not generate the partially sorted data as it should. Now, it just does half the unsorted number of swaps. It really should do insertions instead of swaps, or perhaps swap two consecutive nonoverlapping sequences of data. So, where the results say "50% sorted" or some such, it is just for some weird definition on "50% sorted"

It should be easy to fix for any interested person; just modify the data.c file, shuffle_data()function.

**thames** · 11-28-2012

0 1 2 3 4 5 6 7 8 9

Bubble sort: 0 swaps in 0.0000 seconds
Insertion sort: 0 swaps in 0.0000 seconds
Quicksort: 65535 swaps in 0.0100 seconds
Optimized Quicksort: 2047 swaps in 0.0000 seconds
Shell sort: 0 swaps in 0.0100 seconds

why is quicksort sorting if the numbers are already sorted ?

**Nominal Animal** · 11-28-2012

Originally Posted by thames

why is quicksort sorting if the numbers are already sorted ?

Is that the version of quicksort that moves (swaps) the pivot to the start/end first?

**Adak** · 11-28-2012

Quicksort is a partitioning sort. Unlike Bubble sort or Insertion sort, it has no initial test to see if the data is already sorted or not. In the course of it's looping, the pivot it picks will usually not be the ideal one, and whenever it is NOT the ideal one, it will begin making swaps. Only when it's done will stop.

You can see how the optimized version (which uses insertion sort for the smaller sub-arrays), knocks down the number of swaps needed, and saves just a bit of time, as well.

That is a problem with Quicksort, btw - if the data is already sorted, then the time needed to finish re-sorting, is far longer than if the data is NOT sorted! Just the way partitioning works, in Quicksort. You don't want to send sorted data to Quicksort - it's a real problem, especially if it's an dumb version of Quicksort, in the way it chooses it's pivot. (Say, by taking the first data value as the pivot - that would be a disaster).

No, neither version moves the pivot. I was taught that moving the pivot was "old fashioned", and a hindrance to Quicksort. In my tests of several versions, there was no version which moved the pivot, and was as fast as the unoptimized version, I posted, and as clear.

**Nominal Animal** · 11-28-2012

Originally Posted by Adak

No, neither version moves the pivot.

That's not why I asked.

The reason I asked was because there are many variants of quicksort that do not cause any swaps to happen if the data is linear and ascending: ith data element being A*i+B, where A > 0, and A and B are constants over the entire dataset.

In this thread, I've seen iMalc's median pivot, and the median-of-three pivot scheme, which both cause a quicksort to behave that way.

Those quicksort variants happen to pick the actual median for the pivot for every range (because the median is in the middle of every range; it's a direct result of the above data rule). Therefore, those variants should have zero swaps for such consecutive sorted input.

Therefore, in my mind, there were two possible causes for the swaps: either the pivot selection was random/non-middle/non-median-of-three (probably random), or the swap count counted the pivot swaps and either the swaps were summed over a number or calls, or there was something seriously wrong about the recursion part in the quicksort function. 2047 = 2·2¹⁰-1 swaps for a ten-entry array seemed suspicious to me.

**Adak** · 11-28-2012

10 entry array??

My array sizes were either 100,000 or 50,000 numbers. I have it only show the first 10 digits of each sort.

**Nominal Animal** · 11-28-2012

Originally Posted by Adak

I have it only show the first 10 digits of each sort.

Ah, so you have. I only now realized the output thames showed was from your program. Okay, the swap count makes sense then, because your code uses random pivot selection. This thread has meandered so much I've lost track, sorry

Thread: Quicksort's recursion

Thread Tools

Search Thread

Display

Similar Threads

Quicksort help!

Help with quicksort

quicksort

Quicksort

quicksort

Tags for this Thread