I'm leaning towards Timsort since it kicks ash in Python
What are your thoughts? Pros & Cons?
This is a discussion on Which sort algorithm would U use to sort an array of million integers between 0 & 99 within the C++ Programming forums, part of the General Programming Boards category; I'm leaning towards Timsort since it kicks ash in Python What are your thoughts? Pros & Cons?...
I'm leaning towards Timsort since it kicks ash in Python
What are your thoughts? Pros & Cons?
The constraints make it sound like counting sort is the way to go.
C + C++ Compiler: MinGW port of GCC
Version Control System: Bazaar
Look up a C++ Reference and learn How To Ask Questions The Smart Way
Definitely counting sort, runs in O(n + m), where n is a million and m is 99 in your case. The con is that it also uses up O(n + m) space, which is quite alot when n = 1.000.000, but it will run in linear time.
If you use Quicksort you can get away with O(log n) space usage, but then you would have O(n log n) running time on average.
There is no perfect algorithm for the job. Which is most constrained in your case? Space or speed?
How I need a drink, alcoholic in nature, after the heavy lectures involving quantum mechanics.
Code://try //{ if (a) do { f( b); } while(1); else do { f(!b); } while(1); //}
I'm including the size of the output because counting sort cannot be performed in-place, so the OP would need an extra array of size 1.000.000 for the output, whereas with various other sorting algorithms you wouldn't need an output array at all.
Wikipedia seems to have a different opinion than me though :-) It seems there is a version of counting sort that can be performed in-place, however it won't be stable then. In this case the space requirements would be just O(m). I didn't know of this special version of counting sort when i first replied.
How I need a drink, alcoholic in nature, after the heavy lectures involving quantum mechanics.
O_o
The space requirement for the generic "Counting Sort" is 'O(n + m)'. Period. The output array is memory that is required by the algorithm. Pretending that memory doesn't count because "it is an output" is wrong. Pure and simple. If I tell you that an algorithm requires 'O(m)' extra space when it requires 'O(n + m)' I'm lying. You have to account for that space because choosing between two algorithms may come down to space requirements and some algorithms do only use 'O(m)' space.
A dozen or so special purpose variations of "Counting Sort" exist. They each have different "trade-offs". You can't properly choose between these variations if you start excluding the details.
Soma
I would do it using a 100 element array of integers.
Note: The solution is NOT really a sorting algorithm; but, the output is based on the input.
It is more like an data compression and sort algorithm combined.
Tim S.
Last edited by stahta01; 05-30-2012 at 07:48 PM.
"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the Universe is winning." Rick Cook
Manasij Mukherjee | gcc-4.9.2 @Arch Linux
Slow and Steady wins the race... if and only if :
1.None of the other participants are fast and steady.
2.The fast and unsteady suddenly falls asleep while running !
If you are curious, this is an instance of a "distribution sort".It is more like an data compression and sort algorithm combined.
It is conceptually the "first half" of "Counting Sort" optimized for integers values as keys.
Soma
Choosing between an in-place and an out-of-place sort does not require an ivory tower analysis.
Code://try //{ if (a) do { f( b); } while(1); else do { f(!b); } while(1); //}
Precisely.
And just to be clear, the array of 100 counts allows you to reconstruct the sorted data over top of the original data, should you not need to retain the original data.
If you really really needed maximum speed, then a counting sort with parameters such as those that are proposed here would lead itself to parallelism fairly well. Say four threads each processing a quarter of the input, and four arrays of 100 items that can be added together prior to data reconstruction, would kick ass.
Of course having such perfect parameters such as the stated problem practically never happens in practice.
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"