Patience Sorting is described as an O(nlogn) sorting algorithm in Wikipedia.
Patience sorting - Wikipedia, the free encyclopedia
But my Patience Sorting is running O(nsqrtn). I don't know why. Would someone have a look on my implementation?
This is a discussion on Confused about "Patience Sorting" within the C++ Programming forums, part of the General Programming Boards category; Patience Sorting is described as an O(nlogn) sorting algorithm in Wikipedia. Patience sorting - Wikipedia, the free encyclopedia But my ...
Patience Sorting is described as an O(nlogn) sorting algorithm in Wikipedia.
Patience sorting - Wikipedia, the free encyclopedia
But my Patience Sorting is running O(nsqrtn). I don't know why. Would someone have a look on my implementation?
This is the original implementation, causing O(nsqrtn) time complexity.
Code:template<typename DataType> class PatienceSorter { public: static void sort(vector<DataType>& elements) { int insertPos, currentPile, pileCount = 0; vector<vector<DataType> > pileList; pileList.resize(elements.size()); for(insertPos = 0; insertPos < elements.size(); insertPos++) { currentPile = 0; while(!(pileList.at(currentPile).empty()) && pileList.at(currentPile).back() < elements.at(insertPos)) currentPile++; pileList.at(currentPile).push_back(elements.at(insertPos)); pileCount += (currentPile >= pileCount); } for(insertPos = 0; insertPos < elements.size(); insertPos++) { currentPile = -1; for(int i = 0; i < pileCount; i++) if(!pileList.at(i).empty()) if(currentPile == -1 || pileList.at(currentPile).back() > pileList.at(i).back()) currentPile = i; elements.at(insertPos) = pileList.at(currentPile).back(); pileList.at(currentPile).pop_back(); } return; } };
Then I use a binary search according to the wiki page. The first part of the algorithm is working O(nlogn), but the second part of the algorithm is still O(nsqrtn).
Code:template<typename DataType> class PatienceSorter { public: static void sort(vector<DataType>& elements) { int insertPos, currentPile, pileCount, emptyPileCount; vector<vector<DataType> > pileList; pileList.resize(elements.size()); pileList.front().push_back(elements.front()); pileCount = 1; for(insertPos = 1; insertPos < elements.size(); insertPos++) { int middle, left = 0, right = pileCount - 1; while(left <= right) { middle = (left + right) / 2; elements.at(insertPos) > pileList.at(middle).back() ? left = middle + 1 : right = middle - 1; } pileCount += (left >= pileCount); pileList.at(left).push_back(elements.at(insertPos)); } emptyPileCount = 0; for(insertPos = 0; insertPos < elements.size(); insertPos++) { currentPile = emptyPileCount; elements.at(insertPos) = pileList.at(currentPile).back(); pileList.at(currentPile).pop_back(); if(pileList.at(currentPile).size() == 0) { emptyPileCount++; continue; } if(currentPile + 1 < pileCount && pileList.at(currentPile).back() > pileList.at(currentPile + 1).back()) { int rightBound = pileCount - 1; int middle, left = currentPile + 1, right = pileCount - 1; while(left <= right) { middle = (left + right) / 2; pileList.at(currentPile).back() > pileList.at(middle).back() ? left = middle + 1 : right = middle - 1; } if(left >= pileCount) { left = pileCount - 1; } mergePiles(pileList.at(currentPile), pileList.at(left)); emptyPileCount++; } } return; } private: static void mergePiles(vector<DataType>& sourcePile, vector<DataType>& destinationPile) { vector<DataType> newPile; int sourceScanPos = 0, destinationScanPos = 0; while(sourceScanPos < sourcePile.size() && destinationScanPos < destinationPile.size()) { if(sourcePile.at(sourceScanPos) > destinationPile.at(destinationScanPos)) newPile.push_back(sourcePile.at(sourceScanPos++)); else newPile.push_back(destinationPile.at(destinationScanPos++)); } newPile.insert(newPile.end(), sourcePile.begin() + sourceScanPos, sourcePile.end()); newPile.insert(newPile.end(), destinationPile.begin() + destinationScanPos, destinationPile.end()); destinationPile = newPile; return; } };
The same thing happens to "Strand Sort". It seems Strand Sort is an O(nsqrtn) sorting algorithm but wikipedia says it is an O(nlogn) algorithm.
Strand sort - Wikipedia, the free encyclopedia
How are you determine the algorithmic complexity of your implementation?
You do realize that it is the largest overall growth potential that is measured? An example might be a linear, O(n), function to preprocess input before the main processing phase. The preprocessing phase might be expensive, but it quickly becomes a negligible portion of the overall performance characteristics if the main processing phase is O(n^2). If you count every aspect of the implementation, or fail to consider the phase with the largest growth potential, you get skewed results.
(I don't have the time or the inclination just now to read your source.)
Soma
I determined the time complexity mainly by their performance. This is my test result:
Code:Sorting 100000 numbers ranged from 0 to 2147483647 ================ Test Result ================ Size Filename Time Space 2509 A-Quick.cpp 0.176s 728k 2556 A-Median.cpp 0.192s 728k 2379 A-Weakheap.cpp 0.204s 739k 2450 A-Merge.cpp 0.208s 809k 1665 A-Comb.cpp 0.212s 728k 1930 A-Shell.cpp 0.212s 728k 2259 A-Heap.cpp 0.216s 728k 3232 A-Tree2.cpp 0.228s 729k 3324 A-Bookmark2.cpp 0.240s 736k 2560 A-Tree.cpp 0.244s 2258k 5344 A-Library.cpp 0.264s 748k 2964 A-Bookmark.cpp 0.292s 2415k 3997 A-Tournament.cpp 0.328s 2789k 4214 A-Skiplist.cpp 0.396s 3390k 3936 B-Patience2.cpp 0.568s 736k 2464 B-Pool.cpp 0.612s 1310k 2255 B-Patience.cpp 0.800s 768k 2555 B-Beap.cpp 0.864s 728k 2619 B-Strand.cpp 1.480s 729kI also counted the repeated times of the inner loop and I'm sure patience and strand sorting(my implementation) is O(nsqrtn).Code:Sorting 250000 numbers ranged from 0 to 2147483647 ================ Test Result ================ Size Filename Time Space 2509 A-Quick.cpp 0.444s 728k 2556 A-Median.cpp 0.464s 728k 2379 A-Weakheap.cpp 0.528s 738k 1665 A-Comb.cpp 0.552s 728k 2259 A-Heap.cpp 0.560s 730k 2450 A-Merge.cpp 0.564s 819k 1930 A-Shell.cpp 0.584s 728k 2560 A-Tree.cpp 0.700s 2229k 3232 A-Tree2.cpp 0.768s 729k 3324 A-Bookmark2.cpp 0.772s 734k 5344 A-Library.cpp 0.840s 834k 2964 A-Bookmark.cpp 0.888s 2291k 3997 A-Tournament.cpp 0.928s 2751k 4214 A-Skiplist.cpp 1.148s 3336k 4065 B-Patience2.cpp 1.928s 30390k 2464 B-Pool.cpp 2.488s 1769k 2255 B-Patience.cpp 3.464s 864k 2619 B-Strand.cpp 5.576s 729k 2555 B-Beap.cpp 7.072s 728k
No one can help me?
I've looked over your source.
I can not, at a glance, determine what you are seeing. A cursory look at those times does not relate to O(n * log(n)) or O(n * sqrt(n)). It looks to me as if you are paying a O(c) cost, a O(n * log(n)) cost, a potential O(n + (n - 1) + (n - 2) + [...] + 0) cost, and a O(log n) cost. This "totals" to O(n * log(n)) regardless of the final result.
I've never worked specifically with "Patience Sort". If I have time I may build a "purpose built" version for you to compare your work with. I imagine that the problem is as simple as your use of a general purpose expandable array structure when you need, at least, a general purpose expandable array and simple stack. The `std::vector<std::vector<???>>' setup you are using would be costly, depending on the implementation, if you ever need to "move" a stack--as would be needed for O(n * log(n)) performance. A "purpose built" version would perform trivially better, but the O(n * log(n)) growth is still more relevant.
My advice, in the intervening time, would be to use a `std::vector<???>' (where ??? is the element type to be sorted) of size `$$$' (where $$$ is the number of elements to be sorted), a `std::vector<std::vector<??? *> *>' of size `$$$', and a `std::vector<std::vector<??? *>>' of size `$$$'. The cost would still be O(n). (It is just a larger factor than normal.) This would allow you to allocate memory only once, trivially reorder stacks, and trivially "grow" the stacks. The problem with this sort of implementation is indirection. The cost "per n" is higher, but this cost is constant and never scales. With your solution the cost is lower, variable, and potentially scales.
The reason people may not be helping you may be a simple thing indeed. It doesn't look like you have a bad implementation of the core algorithm. It seems like you've only chosen "bad" primitives for the implementation of the core algorithm. You could probably get closer to O(n * log(n)) by only replace these primitives with "purpose built" primitives.
Soma
I may not have posted help with this one yet, but I've been watching this thread with interest!
I tried implementing patience sorting about a month ago. I didn't do the binary search bit to find the best pile though, so the runtime of mine would be even worse than yours.
I was under the impression that a van Emde Boas tree was necessary to reach the theoretical O(n*logn) upper bound, but I could be wrong about that.
Anyway, besides timing the runs, have you considered simply measuring the number of comparisons and assignments? To do this, simply provide a templated type whose overloaded operators increment global counters. If the number of assignments and comparisons is or isn't n*logn, then that might point to where the problem lies.
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"
Building piles can be O(nlogn) now. The problem is how to take numbers from the piles in O(nlogn) time.
The "stack" placement bit is obviously O(e * log(e)) if you have a O(log(p)) search available. (Where `e' is the number of elements and `p' is the number of piles and `p' is never larger than, but may be equal to, `e'.) For each `e' you have a `log(p)'.I was under the impression that a van Emde Boas tree was necessary to reach the theoretical O(n*logn) upper bound, but I could be wrong about that.
With the noted modification, as explained in the "Wikipedia", the "recovery" bit is, not as obviously, O(e * log(e)) if you have a O(log(p)) search available. First you "recover" the `n' lowest value elements of the first "pile". (This is obviously O(n) since the "piles" are already "sorted".) After recovery of the top `n' elements from the first pile, you remove, conceptually or actually, the first "pile" from the set of "piles". (This is either O(c) or uninterested because you don't actually need to do it.) You then search, using your O(log(p)) algorithm, for where to put the "pile" you've just removed. (You are effectively using your O(log(p)) search as an O(log(p)) insert algorithm for an already sorted set.) After moving the removed "pile" to its proper place, the "piles" are, again, sorted from lowest to highest value. You have `p' "piles" that must remain sorted and a O(p) algorithm for keeping them sorted. For each `p' you have a `log(p)'.
[Edit]
This was an obvious mistake.For each `p' you have a `log(p)'.
It should be this. (And `p' is never larger than `e'.)For each `e' you have a `log(p)'.
[/Edit]
I'm not going to post the "lump sum" "big O" for the fiddly bits because they are irrelevant in the face of the largest growth potential which is O(n * log(n)).
My understanding is that the van Emde Boas tree is only useful for implementations with keys that have a "prefixed" state. (A key state that can be divided into smaller keys.) With such an implementation, using the van Emde Boas tree, pulling the data off the tree in order, with an overall O(n * log(n)) time, becomes a trivial part of the second stage. You can still "hit" the overall O(n * log(n)) time without the tree, but the second stage is no longer trivial.
*shrug*If the number of assignments and comparisons is or isn't n*logn, then that might point to where the problem lies.
The "top" values in the stack, assuming the first part of the algorithm is successful, is guaranteed to be in order. A modified binary search, with an extra comparison, would give a false reading with such a simple metric but would perform better with data sets with generally random values having many sorted subsets. (I'm not positive. It is just a strong "feeling".)
The question is, how are you not getting O(n * log(n)) time?The problem is how to take numbers from the piles in O(nlogn) time.
I've looked over your source again and I see nothing wrong with the implementation. You are almost certainly not getting the true O(n * log(n)) because of the primitives you are using. Even the traditional, and most common, iterative implementation of "Binary sort" is not truly O(n * log(n)) because of exactly two operations overhead per invocation. It is small overhead, but in any simple metric, accumulated over millions of times, the algorithm will appear skewed--not really O(n * log(n)).
I have no interest in dissecting your source. You are obviously capable of doing it yourself. Insert some additional checks, such as what iMalc suggested, and post a complete testbed. If you do that and still have no idea where to begin I'll take another look. But I'm not promising anything because those times are reasonably O(n * log(n)) even if not truly O(n * log(n)).
Soma
Last edited by phantomotap; 03-22-2010 at 03:09 AM.
You mean testing the code like this?
This is the result when sorting 10000 numbers:Code:template<typename DataType> class PatienceSorter { public: static void sort(vector<DataType>& elements) { int insertPos, pileCount; vector<vector<DataType> > pileList; int testPart1 = 0, testPart2 = 0; pileList.resize(elements.size()); pileList.front().push_back(elements.front()); pileCount = 1; for(insertPos = 1; insertPos < elements.size(); insertPos++) { int middle, left = 0, right = pileCount - 1; while(left <= right) { testPart1++; // test the inner loop middle = (left + right) / 2; elements.at(insertPos) > pileList.at(middle).back() ? left = middle + 1 : right = middle - 1; } pileCount += (left >= pileCount); pileList.at(left).push_back(elements.at(insertPos)); } int firstPile = 0; for(insertPos = 0; insertPos < elements.size(); insertPos++) { elements.at(insertPos) = pileList.at(firstPile).back(); pileList.at(firstPile).pop_back(); if(pileList.at(firstPile).empty()) { firstPile++; continue; } int currentPile = firstPile; while(currentPile + 1 < pileCount && pileList.at(currentPile).back() > pileList.at(currentPile + 1).back()) { swap(pileList.at(currentPile), pileList.at(currentPile + 1)); currentPile++; testPart2++; // test the inner loop } } fprintf(stderr, "Part1: %d Part2: %d\n", testPart1, testPart2); return; } };
Part1: 69311 Part2: 540780
It's obviously that the first part is O(nlogn) and the second part is O(nsqrtn).
I've seen that you are of more than willing diligence, but this is no longer valuable. You aren't listening. I'll tell you the problem again, but this time I'll be bloody specific. I've actually hinted several times.It's obviously that the first part is O(nlogn) and the second part is O(nsqrtn).
The problem is your use of `std::vector<std::vector<???>>'. This is not an opinion. This is fact. You can not insert a value at an arbitrary location into an array in O(log(n)) time. For the second phase of the algorithm to be O(n * log(n)) time the insertion must be O(log(n)) or better. You must use primitives that support a O(log(n)) insertion. This is not a suggestion. This is a statement of fact.
The primitives you use in the implementation of an algorithm determines its performance just as much as the natural performance of the algorithm. If you implement an O(n^2) algorithm using a primitive with poor performance you may wind up with a O(2^n) algorithm. I've seen it.
For this reason, that the performance of the primitives count, you can't just any random tree, priority queue, or stack. You can use a heap. You can use a decent priority queue. You can use a list if you understand how to implement binary search with nothing but forward iterators. (Even then you'd pay the cost of advancing iterators by a known value.) You can even use a van Emde Boas tree if you like. It doesn't matter so long as it has a O(log(n)) or better insert and a O(log(N)) or better search. And one doesn't imply the other.
You are lucky I was interested in the algorithm; I wouldn't have bothered repeating myself otherwise. If you want the best performance out of this algorithm, you are going to have to use the right tools.
Soma
I'm so sorry I speak poor English and sometimes I can't understand what you mean.
But I really wonder if a vector<vector<DataType> > will really slow down the algorithm. Of course I can use a tree or a heap. But why I not directly use heap sort or tree sort if I can use them? I just think patience sort has its own way to do that.
Any way, I will use a list<> instead of vector<vector<> > to have a try. Thank for your patience reply.
Oh, I can hardly bring up an O(nlogn) method to merge those stupid piles. In fact, using a heap or something else can do that effectively. But what I want is the "pure" patience sort.
Or maybe patience sort is really an O(nsqrtn) sorting algorithm, just as the following site shows.
http://www.cs.princeton.edu/courses/...s/patience.pdf