Fisher-Yates huh? I had to lok that one up. It turns out that I'm using Durstenfeld's version of the algorithm, which is the fast, modern way of implementing that algorithm. But as I've tried to show previously with other stuff I've posted about, I wasn't told about this algorithm, or been asked to implement it. I simply needed to shuffle an array and I wrote code that would do it in such a way that to me would obviously produce no bias.
Originally Posted by MK27
As others say, Big Oh notation isn't exact operation counts. The implementation could do anything from 1000*n*log(n) operations to 0.001*n*log(n) operations or even outside that range. For example, using an SIMD instruction set. you could perform multiple compares or swaps in one operation.
2^21 is 2097152, so we would expect 43830192 here as the the big O (n*log(n)) best/average score here. Initially, I could not write a quick sort faster than a merge sort and am not sure why anyone would think it should be faster, except that it does hold out some possibilities for optimization with the pivot value.
See my comments below.
Quicksort suffers this way with ties for the same reason it suffers sorting an already ordered list, which is it's worst case scenario. Because mergesort does not spend time putting a list in order after it's been split off, and instead does it when merging them together, it will benefit if the original list was already in some kind of order, whereas this will only hurt quicksort depending on how the pivot value is choosen. To establish a mean or median value, we must iterate through the list before splitting it -- although in discussions of quicksort I have seen, this process apparently "does not count" as part of it's big O notation. Additional comparisons at each of these iterations would be required as well. Taking the pivot from an already ordered tied list will usually end up on the low side, creating an imbalanced list. Putting pivot ties on the right (opposite the pivot) may improve performance slightly.
You may notice that the number of comparisons is bounded by a factor of two. The case where all items in one list are less than the other gives say k cmparisons where both lists are of length k. Incidentally the concatenation of the two lists can be done in constant time. Then the worse case is that 2k comparisons are required. This can be taken form the fact that each comparison reduces the number of items by 1, there are two lists of length k, items never move between the two lists being merged, and only the head items are compared each time. So Merge Sort performs between n*log(n) and 2*n*log(n) comparisons.
Now, apparently merge sort will perform the same in either case. This is true if we just measure iterations, but if we take more reality into account, the number of comparisons can, will, and must change. This is because when two split lists come back to be merged, if one list already contains all smaller values, all of the values on the left will be compared one at a time to the first element of the right, and then (since the returned lists have an internal order) no further comparisons need be done and the right is simply iterated onto the end. That is the normative and perhaps only way to implement a merge sort:
Taking the mean isn't an option for every data type, but something like that has been done before, for integers. Someone has previously called this Artificial Sort. My prior work to this only requires that two items be subtractable, not averagable.
To give quicksort one last chance, I decided to add a function which could select a better pivot, despite my misgivings about the additional iterations and comparisons required. So now, the quicksort compares the head with an element about half way thru, and if these are identical, goes thru the whole list until it finds one that isn't. Then we take a mean of the two different values. This helps to ensure that that we will actually have two lists.
My own findings with sorting on linked lists also agree that quicksort for linked lists doesn't win out until the list length gets really really long.
In the second instance the list is already sorted. Contrast this with the cruder quicksort, above, which made the n*n on an already sorted list. While it still hasn't quite caught up to the mergesort, it very nearly has -- the time for the much tied sort of one million records went from over 4 minutes to 7 seconds.
Let me introduce you to the smart splitter choosing techinque of my SuperQuickSort for linked-lists. The already sorted and reverse sorted cases become O(n). The trick is that the work done in hunting for a splitter isn't simply additional work that reduces the avarage amount of work required later by picking a beter splitter. Instead it actually replaces a portion of the splitting process (potentially the entire splitting process, if the list was already sorted). Have a look at it on my website. It's one of the things that I've actually gone into much more detail with. (Link in sig)
There is still obviously room to optimize the selection of the pivot value further, particularly if we were only dealing with unique numbers, since then taking a true mean of a sublist would be more feasible. But if quicksort and mergesort still both have a best case of n*log(n) (88 million here) anyway, I think I will just stick with merge sort.
I agree that the standard quicksort probably isn't worthwhile for lists.
A final interesting point: both the quicksort and the insertion sort demonstrated the variation inherent in their design; with insertion sort this variation is toward a "best case" (improving on n^2), whereas with quicksort this is toward a "worst case" (away from n*log(n)). A quick sort cannot really do better than a merge sort although the implementation of a quicksort may be easier to tweak closer to the best case, but in any case would be better known as the "split sort" or "quack sort".
Incidentally I've recently become of the opinion that Splay-Tree-Sort is in many ways the ultimate doubly-linked-list sorting algorithm.
It's just so amazingly adaptive, and the list links can be reused as tree child pointer links during the sorting. Any initial sort order you can throw at it, other than random order, gives astoundingly low comparison counts. If this interests you, I highly recommend trying it out.
I put up a nice syntax-colored version of the code which produced this nonsense, in case someone wants to argue some silly point: MK27's linked list comparative sort demo program!
I believe this is fairly portable. It could almost be used as a crude API, since the sort functions use external comparison functions with offsets to criteria (meaning you could plug any kind of struct into it without changes).
Mine is implemented in C++ rather than C and this means I don't have to make any modifications whatsoever to the algorithms to animate then and count comparisons etc. The same code you would use in a real program is the exact code I use in my demo app. I recommend you download it and check it out as I'm sure you'll enjoy it. It's linked at the bottom of many of my pages.