I'm leaning towards Timsort since it kicks ash in Python
What are your thoughts? Pros & Cons?
I'm leaning towards Timsort since it kicks ash in Python
What are your thoughts? Pros & Cons?
The constraints make it sound like counting sort is the way to go.
Definitely counting sort, runs in O(n + m), where n is a million and m is 99 in your case. The con is that it also uses up O(n + m) space, which is quite alot when n = 1.000.000, but it will run in linear time.
If you use Quicksort you can get away with O(log n) space usage, but then you would have O(n log n) running time on average.
There is no perfect algorithm for the job. Which is most constrained in your case? Space or speed?
I'm including the size of the output because counting sort cannot be performed in-place, so the OP would need an extra array of size 1.000.000 for the output, whereas with various other sorting algorithms you wouldn't need an output array at all.
Wikipedia seems to have a different opinion than me though :-) It seems there is a version of counting sort that can be performed in-place, however it won't be stable then. In this case the space requirements would be just O(m). I didn't know of this special version of counting sort when i first replied.
O_o
The space requirement for the generic "Counting Sort" is 'O(n + m)'. Period. The output array is memory that is required by the algorithm. Pretending that memory doesn't count because "it is an output" is wrong. Pure and simple. If I tell you that an algorithm requires 'O(m)' extra space when it requires 'O(n + m)' I'm lying. You have to account for that space because choosing between two algorithms may come down to space requirements and some algorithms do only use 'O(m)' space.
A dozen or so special purpose variations of "Counting Sort" exist. They each have different "trade-offs". You can't properly choose between these variations if you start excluding the details.
Soma
I would do it using a 100 element array of integers.
Note: The solution is NOT really a sorting algorithm; but, the output is based on the input.
It is more like an data compression and sort algorithm combined.
Tim S.
If you are curious, this is an instance of a "distribution sort".Quote:
It is more like an data compression and sort algorithm combined.
It is conceptually the "first half" of "Counting Sort" optimized for integers values as keys.
Soma
Choosing between an in-place and an out-of-place sort does not require an ivory tower analysis.
Precisely.
And just to be clear, the array of 100 counts allows you to reconstruct the sorted data over top of the original data, should you not need to retain the original data.
If you really really needed maximum speed, then a counting sort with parameters such as those that are proposed here would lead itself to parallelism fairly well. Say four threads each processing a quarter of the input, and four arrays of 100 items that can be added together prior to data reconstruction, would kick ass.
Of course having such perfect parameters such as the stated problem practically never happens in practice.