This is a discussion on Confused about "Patience Sorting" within the C++ Programming forums, part of the General Programming Boards category; Patience Sorting is described as an O(nlogn) sorting algorithm in Wikipedia. Patience sorting - Wikipedia, the free encyclopedia But my ...

1. ## Confused about "Patience Sorting"

Patience Sorting is described as an O(nlogn) sorting algorithm in Wikipedia.
Patience sorting - Wikipedia, the free encyclopedia
But my Patience Sorting is running O(nsqrtn). I don't know why. Would someone have a look on my implementation?

2. This is the original implementation, causing O(nsqrtn) time complexity.
Code:
```template<typename DataType> class PatienceSorter
{
public:
static void sort(vector<DataType>& elements)
{
int insertPos, currentPile, pileCount = 0;
vector<vector<DataType> > pileList;

pileList.resize(elements.size());
for(insertPos = 0; insertPos < elements.size(); insertPos++)
{
currentPile = 0;
while(!(pileList.at(currentPile).empty()) && pileList.at(currentPile).back() < elements.at(insertPos))
currentPile++;
pileList.at(currentPile).push_back(elements.at(insertPos));

pileCount += (currentPile >= pileCount);
}

for(insertPos = 0; insertPos < elements.size(); insertPos++)
{
currentPile = -1;
for(int i = 0; i < pileCount; i++)
if(!pileList.at(i).empty())
if(currentPile == -1 || pileList.at(currentPile).back() > pileList.at(i).back())
currentPile = i;
elements.at(insertPos) = pileList.at(currentPile).back();
pileList.at(currentPile).pop_back();
}
return;
}
};```

3. Then I use a binary search according to the wiki page. The first part of the algorithm is working O(nlogn), but the second part of the algorithm is still O(nsqrtn).
Code:
```template<typename DataType> class PatienceSorter
{
public:
static void sort(vector<DataType>& elements)
{
int insertPos, currentPile, pileCount, emptyPileCount;
vector<vector<DataType> > pileList;

pileList.resize(elements.size());
pileList.front().push_back(elements.front());
pileCount = 1;

for(insertPos = 1; insertPos < elements.size(); insertPos++)
{
int middle, left = 0, right = pileCount - 1;
while(left <= right)
{
middle = (left + right) / 2;
elements.at(insertPos) > pileList.at(middle).back() ? left = middle + 1 : right = middle - 1;
}
pileCount += (left >= pileCount);
pileList.at(left).push_back(elements.at(insertPos));
}

emptyPileCount = 0;
for(insertPos = 0; insertPos < elements.size(); insertPos++)
{
currentPile = emptyPileCount;
elements.at(insertPos) = pileList.at(currentPile).back();
pileList.at(currentPile).pop_back();

if(pileList.at(currentPile).size() == 0)
{
emptyPileCount++;
continue;
}

if(currentPile + 1 < pileCount && pileList.at(currentPile).back() > pileList.at(currentPile + 1).back())
{
int rightBound = pileCount - 1;

int middle, left = currentPile + 1, right = pileCount - 1;
while(left <= right)
{
middle = (left + right) / 2;
pileList.at(currentPile).back() > pileList.at(middle).back() ? left = middle + 1 : right = middle - 1;
}
if(left >= pileCount)
{
left = pileCount - 1;
}
mergePiles(pileList.at(currentPile), pileList.at(left));

emptyPileCount++;
}
}
return;
}
private:
static void mergePiles(vector<DataType>& sourcePile, vector<DataType>& destinationPile)
{
vector<DataType> newPile;
int sourceScanPos = 0, destinationScanPos = 0;

while(sourceScanPos < sourcePile.size() && destinationScanPos < destinationPile.size())
{
if(sourcePile.at(sourceScanPos) > destinationPile.at(destinationScanPos))
newPile.push_back(sourcePile.at(sourceScanPos++));
else
newPile.push_back(destinationPile.at(destinationScanPos++));
}
newPile.insert(newPile.end(), sourcePile.begin() + sourceScanPos, sourcePile.end());
newPile.insert(newPile.end(), destinationPile.begin() + destinationScanPos, destinationPile.end());

destinationPile = newPile;
return;
}
};```

4. The same thing happens to "Strand Sort". It seems Strand Sort is an O(nsqrtn) sorting algorithm but wikipedia says it is an O(nlogn) algorithm.
Strand sort - Wikipedia, the free encyclopedia

5. How are you determine the algorithmic complexity of your implementation?

You do realize that it is the largest overall growth potential that is measured? An example might be a linear, O(n), function to preprocess input before the main processing phase. The preprocessing phase might be expensive, but it quickly becomes a negligible portion of the overall performance characteristics if the main processing phase is O(n^2). If you count every aspect of the implementation, or fail to consider the phase with the largest growth potential, you get skewed results.

(I don't have the time or the inclination just now to read your source.)

Soma

6. I determined the time complexity mainly by their performance. This is my test result:
Code:
```Sorting 100000 numbers ranged from 0 to 2147483647
================ Test Result ================
Size Filename          Time		Space
2509 A-Quick.cpp       0.176s		728k
2556 A-Median.cpp      0.192s		728k
2379 A-Weakheap.cpp    0.204s		739k
2450 A-Merge.cpp       0.208s		809k
1665 A-Comb.cpp        0.212s		728k
1930 A-Shell.cpp       0.212s		728k
2259 A-Heap.cpp        0.216s		728k
3232 A-Tree2.cpp       0.228s		729k
3324 A-Bookmark2.cpp   0.240s		736k
2560 A-Tree.cpp        0.244s		2258k
5344 A-Library.cpp     0.264s		748k
2964 A-Bookmark.cpp    0.292s		2415k
3997 A-Tournament.cpp  0.328s		2789k
4214 A-Skiplist.cpp    0.396s		3390k
3936 B-Patience2.cpp   0.568s		736k
2464 B-Pool.cpp        0.612s		1310k
2255 B-Patience.cpp    0.800s		768k
2555 B-Beap.cpp        0.864s		728k
2619 B-Strand.cpp      1.480s		729k```
Code:
```Sorting 250000 numbers ranged from 0 to 2147483647
================ Test Result ================
Size Filename          Time		Space
2509 A-Quick.cpp       0.444s		728k
2556 A-Median.cpp      0.464s		728k
2379 A-Weakheap.cpp    0.528s		738k
1665 A-Comb.cpp        0.552s		728k
2259 A-Heap.cpp        0.560s		730k
2450 A-Merge.cpp       0.564s		819k
1930 A-Shell.cpp       0.584s		728k
2560 A-Tree.cpp        0.700s		2229k
3232 A-Tree2.cpp       0.768s		729k
3324 A-Bookmark2.cpp   0.772s		734k
5344 A-Library.cpp     0.840s		834k
2964 A-Bookmark.cpp    0.888s		2291k
3997 A-Tournament.cpp  0.928s		2751k
4214 A-Skiplist.cpp    1.148s		3336k
4065 B-Patience2.cpp   1.928s		30390k
2464 B-Pool.cpp        2.488s		1769k
2255 B-Patience.cpp    3.464s		864k
2619 B-Strand.cpp      5.576s		729k
2555 B-Beap.cpp        7.072s		728k```
I also counted the repeated times of the inner loop and I'm sure patience and strand sorting(my implementation) is O(nsqrtn).

7. No one can help me?

8. I've looked over your source.

I can not, at a glance, determine what you are seeing. A cursory look at those times does not relate to O(n * log(n)) or O(n * sqrt(n)). It looks to me as if you are paying a O(c) cost, a O(n * log(n)) cost, a potential O(n + (n - 1) + (n - 2) + [...] + 0) cost, and a O(log n) cost. This "totals" to O(n * log(n)) regardless of the final result.

I've never worked specifically with "Patience Sort". If I have time I may build a "purpose built" version for you to compare your work with. I imagine that the problem is as simple as your use of a general purpose expandable array structure when you need, at least, a general purpose expandable array and simple stack. The `std::vector<std::vector<???>>' setup you are using would be costly, depending on the implementation, if you ever need to "move" a stack--as would be needed for O(n * log(n)) performance. A "purpose built" version would perform trivially better, but the O(n * log(n)) growth is still more relevant.

My advice, in the intervening time, would be to use a `std::vector<???>' (where ??? is the element type to be sorted) of size `\$\$\$' (where \$\$\$ is the number of elements to be sorted), a `std::vector<std::vector<??? *> *>' of size `\$\$\$', and a `std::vector<std::vector<??? *>>' of size `\$\$\$'. The cost would still be O(n). (It is just a larger factor than normal.) This would allow you to allocate memory only once, trivially reorder stacks, and trivially "grow" the stacks. The problem with this sort of implementation is indirection. The cost "per n" is higher, but this cost is constant and never scales. With your solution the cost is lower, variable, and potentially scales.

The reason people may not be helping you may be a simple thing indeed. It doesn't look like you have a bad implementation of the core algorithm. It seems like you've only chosen "bad" primitives for the implementation of the core algorithm. You could probably get closer to O(n * log(n)) by only replace these primitives with "purpose built" primitives.

Soma

9. I may not have posted help with this one yet, but I've been watching this thread with interest!

I tried implementing patience sorting about a month ago. I didn't do the binary search bit to find the best pile though, so the runtime of mine would be even worse than yours.
I was under the impression that a van Emde Boas tree was necessary to reach the theoretical O(n*logn) upper bound, but I could be wrong about that.

Anyway, besides timing the runs, have you considered simply measuring the number of comparisons and assignments? To do this, simply provide a templated type whose overloaded operators increment global counters. If the number of assignments and comparisons is or isn't n*logn, then that might point to where the problem lies.

10. Building piles can be O(nlogn) now. The problem is how to take numbers from the piles in O(nlogn) time.

11. I was under the impression that a van Emde Boas tree was necessary to reach the theoretical O(n*logn) upper bound, but I could be wrong about that.
The "stack" placement bit is obviously O(e * log(e)) if you have a O(log(p)) search available. (Where `e' is the number of elements and `p' is the number of piles and `p' is never larger than, but may be equal to, `e'.) For each `e' you have a `log(p)'.

With the noted modification, as explained in the "Wikipedia", the "recovery" bit is, not as obviously, O(e * log(e)) if you have a O(log(p)) search available. First you "recover" the `n' lowest value elements of the first "pile". (This is obviously O(n) since the "piles" are already "sorted".) After recovery of the top `n' elements from the first pile, you remove, conceptually or actually, the first "pile" from the set of "piles". (This is either O(c) or uninterested because you don't actually need to do it.) You then search, using your O(log(p)) algorithm, for where to put the "pile" you've just removed. (You are effectively using your O(log(p)) search as an O(log(p)) insert algorithm for an already sorted set.) After moving the removed "pile" to its proper place, the "piles" are, again, sorted from lowest to highest value. You have `p' "piles" that must remain sorted and a O(p) algorithm for keeping them sorted. For each `p' you have a `log(p)'.

For each `p' you have a `log(p)'.
This was an obvious mistake.
For each `e' you have a `log(p)'.
It should be this. (And `p' is never larger than `e'.)
[/Edit]

I'm not going to post the "lump sum" "big O" for the fiddly bits because they are irrelevant in the face of the largest growth potential which is O(n * log(n)).

My understanding is that the van Emde Boas tree is only useful for implementations with keys that have a "prefixed" state. (A key state that can be divided into smaller keys.) With such an implementation, using the van Emde Boas tree, pulling the data off the tree in order, with an overall O(n * log(n)) time, becomes a trivial part of the second stage. You can still "hit" the overall O(n * log(n)) time without the tree, but the second stage is no longer trivial.

If the number of assignments and comparisons is or isn't n*logn, then that might point to where the problem lies.
*shrug*

The "top" values in the stack, assuming the first part of the algorithm is successful, is guaranteed to be in order. A modified binary search, with an extra comparison, would give a false reading with such a simple metric but would perform better with data sets with generally random values having many sorted subsets. (I'm not positive. It is just a strong "feeling".)

The problem is how to take numbers from the piles in O(nlogn) time.
The question is, how are you not getting O(n * log(n)) time?

I've looked over your source again and I see nothing wrong with the implementation. You are almost certainly not getting the true O(n * log(n)) because of the primitives you are using. Even the traditional, and most common, iterative implementation of "Binary sort" is not truly O(n * log(n)) because of exactly two operations overhead per invocation. It is small overhead, but in any simple metric, accumulated over millions of times, the algorithm will appear skewed--not really O(n * log(n)).

I have no interest in dissecting your source. You are obviously capable of doing it yourself. Insert some additional checks, such as what iMalc suggested, and post a complete testbed. If you do that and still have no idea where to begin I'll take another look. But I'm not promising anything because those times are reasonably O(n * log(n)) even if not truly O(n * log(n)).

Soma

12. You mean testing the code like this?
Code:
```template<typename DataType> class PatienceSorter
{
public:
static void sort(vector<DataType>& elements)
{
int insertPos, pileCount;
vector<vector<DataType> > pileList;

int testPart1 = 0, testPart2 = 0;

pileList.resize(elements.size());
pileList.front().push_back(elements.front());
pileCount = 1;

for(insertPos = 1; insertPos < elements.size(); insertPos++)
{
int middle, left = 0, right = pileCount - 1;
while(left <= right)
{
testPart1++; // test the inner loop
middle = (left + right) / 2;
elements.at(insertPos) > pileList.at(middle).back() ? left = middle + 1 : right = middle - 1;
}
pileCount += (left >= pileCount);
pileList.at(left).push_back(elements.at(insertPos));
}

int firstPile = 0;

for(insertPos = 0; insertPos < elements.size(); insertPos++)
{
elements.at(insertPos) = pileList.at(firstPile).back();
pileList.at(firstPile).pop_back();

if(pileList.at(firstPile).empty())
{
firstPile++;
continue;
}

int currentPile = firstPile;
while(currentPile + 1 < pileCount && pileList.at(currentPile).back() > pileList.at(currentPile + 1).back())
{
swap(pileList.at(currentPile), pileList.at(currentPile + 1));
currentPile++;
testPart2++; // test the inner loop
}
}
fprintf(stderr, "Part1: %d Part2: %d\n", testPart1, testPart2);
return;
}
};```
This is the result when sorting 10000 numbers:
Part1: 69311 Part2: 540780

It's obviously that the first part is O(nlogn) and the second part is O(nsqrtn).

13. It's obviously that the first part is O(nlogn) and the second part is O(nsqrtn).
I've seen that you are of more than willing diligence, but this is no longer valuable. You aren't listening. I'll tell you the problem again, but this time I'll be bloody specific. I've actually hinted several times.

The problem is your use of `std::vector<std::vector<???>>'. This is not an opinion. This is fact. You can not insert a value at an arbitrary location into an array in O(log(n)) time. For the second phase of the algorithm to be O(n * log(n)) time the insertion must be O(log(n)) or better. You must use primitives that support a O(log(n)) insertion. This is not a suggestion. This is a statement of fact.

The primitives you use in the implementation of an algorithm determines its performance just as much as the natural performance of the algorithm. If you implement an O(n^2) algorithm using a primitive with poor performance you may wind up with a O(2^n) algorithm. I've seen it.

For this reason, that the performance of the primitives count, you can't just any random tree, priority queue, or stack. You can use a heap. You can use a decent priority queue. You can use a list if you understand how to implement binary search with nothing but forward iterators. (Even then you'd pay the cost of advancing iterators by a known value.) You can even use a van Emde Boas tree if you like. It doesn't matter so long as it has a O(log(n)) or better insert and a O(log(N)) or better search. And one doesn't imply the other.

You are lucky I was interested in the algorithm; I wouldn't have bothered repeating myself otherwise. If you want the best performance out of this algorithm, you are going to have to use the right tools.

Soma

14. Originally Posted by phantomotap
I've seen that you are of more than willing diligence, but this is no longer valuable. You aren't listening. I'll tell you the problem again, but this time I'll be bloody specific. I've actually hinted several times.
I'm so sorry I speak poor English and sometimes I can't understand what you mean.

But I really wonder if a vector<vector<DataType> > will really slow down the algorithm. Of course I can use a tree or a heap. But why I not directly use heap sort or tree sort if I can use them? I just think patience sort has its own way to do that.

Any way, I will use a list<> instead of vector<vector<> > to have a try. Thank for your patience reply.

15. Oh, I can hardly bring up an O(nlogn) method to merge those stupid piles. In fact, using a heap or something else can do that effectively. But what I want is the "pure" patience sort.

Or maybe patience sort is really an O(nsqrtn) sorting algorithm, just as the following site shows.
http://www.cs.princeton.edu/courses/...s/patience.pdf

Page 1 of 2 12 Last