Comparison of Sorting algorithm doubt

This is a discussion on Comparison of Sorting algorithm doubt within the C Programming forums, part of the General Programming Boards category; I had gone through the Tutorial uploaded on this website about the sorting algorithms, but I have one doubt Which ...

1. Comparison of Sorting algorithm doubt

I had gone through the Tutorial uploaded on this website about the sorting algorithms, but I have one doubt

Which of the following sort procedures takes the lowest average time
a) Merge sort b) Heap sort c) Quick sort d) Selection sort

Here form the tutorial I came to know that average time of all three options a , b , c are same , it is O(nlogn), so what is the correct answer to this MCQ

2. O_o

How many elements do you have?
How expensive is copying?
How expensive is swapping?
How expensive is comparison?
How fast is your node allocator?

Is the data completely randomized?
Is the data mostly sorted?
Is the data composed of runs of already sorted data?

Do you need a stable sort?

Is the "Merge Sort" optimized for batched allocations?
Is the "Merge Sort" optimized for copying the optimum number of elements as spans for dumb data?

Is the "Heap Sort" optimized for batched allocations?
Is the "Heap Sort" optimized with a tree structure or a span array?
Is the "Heap Sort" optimized to move runs of sorted data?

Is the "Quicksort" pivot optimized for any particular balance?
Is the "Quicksort" optimized to "decay" into a different algorithm when the number of elements remaining is sufficiently small?
Is the "Quicksort" actually "Introsort"?

Soma

3. Originally Posted by shyjuu
I had gone through the Tutorial uploaded on this website about the sorting algorithms, but I have one doubt

Which of the following sort procedures takes the lowest average time
a) Merge sort b) Heap sort c) Quick sort d) Selection sort

Here form the tutorial I came to know that average time of all three options a , b , c are same , it is O(nlogn), so what is the correct answer to this MCQ

The answer is Quicksort. It has the same complexity, but better "data locality". Which is to say, that with the common PC architecture, it's able to use the cache a little better.

Using the cache the best, is key, since the speed it can use to access that data, is WAY faster, than regular RAM memory access speeds.

Insertion sort should always be used to speed up Quicksort, by handling the smaller sub arrays that are almost sorted by Quicksort, in any serious sorting effort. There is NO program that handles small or already sorted data (it happens more often than you'd think that already sorted data, is resorted), faster than Insertion sort.

There are lots of ways to "guard" Quicksort from ever sorting an already sorted array of data, that still use very little run-time.

Selection sort should NOT be included in a list of fast sorters.

For Quicksort to work at top speed, the data needs to be in an array type of data structure which allows random access to the data, not a linked list (serial access only).

The above is meant for general sorting. The more you know about your data, the better you can tailor the sorting code to work with it.

4. Originally Posted by shyjuu
Note that the merge sort algorithm shown at cproggramming.com:

Merge Sort - Cprogramming.com

is a top down merge sort, which involves unneeded overhead in recursively splitting a list into sub-lists until list size is 1. A bottom up merge sort avoids this overhead by simply considering a list of size n to be n sub-lists of size 1 as the initial state of a list. Link to a "minimalist" (uses C++ STL vectors) bottom up merge sort in a thread I posted here:

minimalist bottom up merge sort

Originally Posted by shyjuu
Which of the following sort procedures takes the lowest average time

a) Merge sort b) Heap sort c) Quick sort d) Selection sort
As mentioned by phantomotap, it depends on a variety of factors. In general, quick sort involves more comparisons but fewer moves than merge sort. If sorting an array of integers, then quick sort is faster. If sorting a list of pointers to records or structures and if the average time to compare records is longer than the average time it takes to copy pointers, then merge sort is faster.

If the ordering of "equal" elements is to be preserved, then a "stable" (preserves the order of "equal" elements) sort like merge sort is needed. Quick sort is not stable.

5. Thanks much for your post - I have to say that I've never heard of "bottom up" merge sort before.

Now I'd like to test it out on both integer and record sorting, against an optimized Quicksort.

Can you recommend a Bottom Up merge sort, that is optimized?

I would optimize a version of it myself, but seriously doubt I could do a good job of it. I'm afraid my Quicksort version, would bury it.

The test includes sorted data, random data, reverse sorted data, and sorting data which has many duplicates, in a narrow range.

6. Originally Posted by phantomotap
O_o

How many elements do you have?
How expensive is copying?
How expensive is swapping?
How expensive is comparison?
How fast is your node allocator?

Is the data completely randomized?
Is the data mostly sorted?
Is the data composed of runs of already sorted data?

Do you need a stable sort?

Is the "Merge Sort" optimized for batched allocations?
Is the "Merge Sort" optimized for copying the optimum number of elements as spans for dumb data?

Is the "Heap Sort" optimized for batched allocations?
Is the "Heap Sort" optimized with a tree structure or a span array?
Is the "Heap Sort" optimized to move runs of sorted data?

Is the "Quicksort" pivot optimized for any particular balance?
Is the "Quicksort" optimized to "decay" into a different algorithm when the number of elements remaining is sufficiently small?
Is the "Quicksort" actually "Introsort"?

Soma
Thanks a lot for such a descriptive answer, let us just consider a normal values , or a normal way, now which is the quickest according to you

7. Originally Posted by shyjuu
Thanks a lot for such a descriptive answer, let us just consider a normal values , or a normal way, now which is the quickest according to you
Normally, the quickest is the one that sorts the fastest and meets the requirements given the various factors that phantomotap listed (and maybe more).

8. As you can see from the comments in the thread, sorting is not a simple one-size fits all, job. Far from it.

Given your original question, Quicksort is the overall general answer - but if it's not optimized, and has no code to prevent it, Quicksort can give very poor performance if it's tasked with sorting already sorted data.

9. Thanks a lot for such a descriptive answer, let us just consider a normal values , or a normal way, now which is the quickest according to you
O_o

Unless the algorithms in questions have certain optimizations, I would not use any of them as a general purpose sort.

The truth is, the simple approach to implementing all of the algorithms in question have performance and memory characteristics that I would find unacceptable without knowing a lot of details about the data.

If you don't have significant statistical information about your data and performance characteristics about operations on that data, you should be reaching for something with more stable performance in the face of diverse data sets.

You'll find that virtually all libraries and languages implement a modified "Quicksort" with a balanced median pivot "decaying" into a different form of sort according to some criteria for the default unordered sort algorithm. (The algorithm known as "Introsort" is a specific flavor of exactly these modifications and optimizations thought a "Radix Sort" flavor is also very popular.)

You'll find that virtually all libraries and languages implement a modified "Insertion Sort" with a speculative "single shot" allocation of pointers "decaying" into a flavor of "Merge Sort" according to some criteria for the default stable sort algorithm. (The algorithm as commonly implemented known as "Timsort" is a specific flavor of exactly these modifications and optimizations.)

In both cases, the modifications and optimization push the boundaries for comparison based sorts thanks to tweaking behavior during execution giving an extremely stable average case performance over extremely diverse data sets. The "trade off" for using these algorithms is that you don't need to know anything about your data; you throw the data at the algorithms and the algorithms do the best they can do for the particulars of that data. In a very real way, you've traded a small performance hit, the cost of adaptive behavior, for not needing to care too much about the particulars of your data.

The point is, without knowing a lot about the situation, I would not speculate on the performance of simple implementations of those algorithms. I'd simply run some tests through my testsuite and profiler with some reasonable data sets. So, my answer would be, as most everyone else will have assumed, dependent on what answers I might give in speculation to those questions I've asked.

In other words, I can not personally answer the real question you've asked because I would not speculate on the performance of those algorithms; I would speculate on the nature of the data and use that information with what I know about the algorithms, from the benchmarks, to choose; they all have their place, and I would do my best to choose according to my data.

Building on what Adak says, choosing one sorting algorithm over another should be a carefully considered and preferably measured process.

Soma

Can you recommend a bottom up merge sort, that is optimized?
I have example programs, but they're setup for C++ vectors (except the pointer sort switches to using pointers) and Visual Studio. I don't know how difficult it will be to convert these over to use with other compilers.

- - -

Link to zip of a set of source files for sorts to sort 4,194,304 64 bit elements of psuedo random data via vectors.

http://rcgldr.net/misc/sortv.zip

sortv.cpp is microsoft vector sort, and you can edit it to use sort (quicksort) or stable_sort (merge sort after using insertion sort for groups of 32 elements).

msortv.cpp is my merge sort.

hsortv.cpp is a hybrid radix sort that uses 1GB of ram to sort 32 MB of data. It separates the data into 32 bins based on the upper 5 bits of data, then does a merge sort on each bin, then concantenates the bins.

results.txt compares the time it took to run the sort programs.

- - -

Link to zip of a set of source files for sorts to sort a text file by sorting pointers to records. There's a 10 MB test file included with 1048576 records 10 bytes each, of random data (8 bytes data, cr, lf). The random data is the binary from a large pi calculation, restricted to ascii character range hex 20 to hex 9f (128 possible values).

http://rcgldr.net/misc/sortp.zip

sortp.cpp - microsoft vector sort, edit to change it to sort() or stable_sort().

msort.cpp - merge sort setup to be a "natural" sort, so there's additional overhead (list of group counts) not required for a basic merge sort, but it's not that much overhead. Change #define VARSZGRP 0 to #define VARSZGRP 1 for "natural" sort (this takes advantage of any pre-ordering of data). Note that msort.cpp "cheats": instead of directly using the vector of pointers to records, it uses standard C type pointers instead of using vector iterators.

hsort.cpp - a hybrid radix sort, that separates the pointers to be sorted based on the first byte of each record. It merge sorts the bins of pointers, then concatenates the bins of pointers. It creates space for 161 bins, but with the test source file only 128 of those bins are used (the test file "text" bytes range from hex 20 to hex 9f).

results.txt compares the time it took to run the sort programs.

11. O_o

What sort (heh) of optimizations have you employed?

Alternating buffers? (If I recall correctly, you have used this variant.)

"Decaying" into "Insertion Sort" for smaller runs?

Striping runs for in the hopes of better cache effectiveness?

Soma

12. Originally Posted by rcgldr
example programs
pointer sorts:
http://rcgldr.net/misc/sortp.zip

The actual names are sortp.cpp, msortp.cpp, and hsortp.cpp.

hsortp.cpp = To make this "generic", change #define NUMBIN 161 to #define NUMBIN 256 to use an entire byte as a bin index. Otherwise, all elements with bin indexes > NUMBIN-1 are placed into bin[NUMBIN-1].

Originally Posted by phantomotap
What sort of optimizations have you employed?
The sorts I wrote are fairly generic, no cache oriented optimization. As mentioned, I use actual pointers instead of vector iterators in msortv.cpp, hsortv.cpp, msortp.cpp and hsortp.cpp, so there's no iterator checking which is probably why they are faster than sortv.cpp and sortp.cpp, which use standard template library sort() or stable_sort().

Also as mentioned, msortp.cpp is setup to be a "natural" bottom up merge sort.

I haven't looked at the microsoft quick sort. I'm using visual studio 2005, and it appears the standard template library is based on code written by HP back in 1994. The microsoft merge_sort() is a partial top down sort, allocating a second vector 1/2 the size of the original, to sort two halves of the original vector, then merging the second vector with the second half of the original vector back into the original vector. There's remnants of code to allow further splitting, but the initial allocation is for 1/2 the size of the original vector, so that part of the remnant code isn't being used. I haven't tried changing the stable_sort() template code in algorithm to allocate a vector the same size as the original vector to see how much the performance would be affected.

The microsoft merge sort stable_sort also splits a list into n/32 groups of 32 elements, uses insertion sort for the 32 elements (this should take advantage of cache), then uses a standard bottom up merge sort after that.

A merge sort normally uses double the memory space, so that each merge pass goes from one buffer to the other and back. If it's important to end up with the sorted data back in the first buffer, a merge sort can calculate the number of passes required, and if it's odd, then on the first pass, it can swap pairs in place in the first buffer instead of merging to the second buffer.

A radix sort uses n time the memory space, where n is the number of bins. There's one additional bin buffer used to merge sort each bin. A multi-threaded radix sort could use additional bins, merging in parallel, but since all the threads share the same memory space, it doesn't seem that trying to merge in parallel would help (it could end up being slower).

There's a point of diminishing returns, as all of these sorts take less than 1/2 second to run on my system. The amount of data would have to be quite large before there would be a practical difference between the sort algorithms.

13. Originally Posted by shyjuu
Thanks a lot for such a descriptive answer, let us just consider a normal values , or a normal way, now which is the quickest according to you
Let me show you the problem with your question:

It's like you are asking: Which is faster, a formula one car, a dirt bike, a boat, or a snowmobile?

One obvious question that comes to mind is:
Does your trip take you across a lake?
Does your trip take you across snow?
Does your trip take you through a narrow alleyway?
Will there be a lot of slow traffic?

You can see that it would be pretty stupid to outright pick any one of these answers without knowing the answers to such questions.
E.g. if you said a formula one car was fastest, but the trip was on a road with lots of other slow traffic, then you'd burn out your engine because they have to go fast all the time to stay cool. Or if the trip was across water then your car wold be toast.

So, stop asking us which is faster without answering such questions. If you do not have such answers then we can only say "it depends", and you wil have to just accept that.

14. Originally Posted by rcgldr
A radix sort uses n time the memory space, where n is the number of bins. There's one additional bin buffer used to merge sort each bin. A multi-threaded radix sort could use additional bins, merging in parallel, but since all the threads share the same memory space, it doesn't seem that trying to merge in parallel would help (it could end up being slower).
I agree.

Multithreading is useful if you split the input into chunks, and sort each chunk separately in the threads. On NUMA architectures (where memory is faster from some cores) and when you have dozens of threads or more, you can merge local chunks first, before merging them, but on currently typical workstations (up to 16 cores), it is usually faster to use a single thread to merge all chunks. (This is because the merge is limited by memory bandwidth, and minimizing the bandwidth used will yield best results. Using one thread uses the minimum bandwidth, of course, at least if it caches one (the next) value from each chunk locally.)

I have (somewhere) a radix sort for binary64 (double-precision IEEE floating-point) values in the same byte order as for 64-bit integers (so for most platforms), with various bin sizes. (The trick is to treat it as 64-bit unsigned integer data, which works if you handle the sign bit with an extra pass over all data once before and once after sorting.)

Optimal bin sizes are very sensitive to cache sizes, so a sensible implementation would have three or more different bin sizes, and select the best one at runtime.

It does really yield O(N]) behaviour (for large enough data, otherwise the normal cache effects apply), but you need a lot of data to beat Quicksort. (On my machine, I think it was ten million doubles or so? And it of course depends on the Quicksort implementation compared to, too. I doubt the one I tested is the fastest one known.)

If the data is too large to fit in memory, we need external sorting algorithms. That means sorting each chunk in memory first, then saving the chunk in external storage, and finally merging the chunks. Usually the external storage input/output bandwidth is the limiting factor, so the merging policy (and minimizing the amount of data needed when merging -- i.e. splitting payload and sort-key data) is more important than the sort algorithm used on each chunk. (In the cases I've had, the sort times are minuscule fractions of the input/output time used.)

If the sort keys are strings or other variable-length identifiers, that changes the game once again. For those, the comparison operator may be relatively expensive -- especially so for external sorts.

Originally Posted by rcgldr
There's a point of diminishing returns
Again, agreed.

Even on my worstation that's capable of ~ 200MB/s reads and writes (for >= 32k chunks), it's the I/O that is usually the bottleneck. (Reading numerical data from text files is much, *much* slower, mainly because standard text-to-floating-point conversion functions are very slow; made for accuracy and robustness, not for speed.)

My solution is to use sorting algorithms that minimize the wall clock time used, not the CPU time used.

In other words, I prefer algorithms that insert new data keeping it always sorted, so I can interleave the insertion work and the I/O. Balanced trees are very good for this, and easy to implement. Instead of sorting, you concentrate on tree insert and traversal.
Otherwise you do all I/O first, and only then work on the data, which is bound to use more wall clock time, no matter how fast your sorting algorithm -- as long as the insertions take less time than I/O does, which like I said, is usually the case. My time is worth more than what the amortized computer use costs, so it makes sense for me to prioritize wall clock time over CPU time.

For data centers, most companies prefer wall clock speed, too, but not always. Consider low-priority jobs like offline backups, volunteer processing like the various @homes.. There is much more to consider when selecting an algorithm; the resource usage is just one detail.

15. Originally Posted by shyjuu
I had gone through the Tutorial uploaded on this website about the sorting algorithms, but I have one doubt

Which of the following sort procedures takes the lowest average time
a) Merge sort b) Heap sort c) Quick sort d) Selection sort

Here form the tutorial I came to know that average time of all three options a , b , c are same , it is O(nlogn), so what is the correct answer to this MCQ