# Quicksort review

Show 80 post(s) from this thread on one page
Page 1 of 2 12 Last
• 05-01-2010
Elysia
Quicksort review
I currently have a custom written Quicksort algorithm. But I thought I'd ask for someone to look over it, because there is at least one thing I am unsure of.
The code passes all the tests I have for it, but I cannot say if it's right or not.
The question is: which of these checks are required and which ones are not? I have commented out a great deal, and like I mentioned, it passes all my tests, but that doesn't mean it will always pass all tests, so...

Another thing is that I tried to parallellize the algorithm, but it turned out to be slower. Not quite sure why. Perhaps someone has some suggestions or ideas...

Code:

```template<template<typename> class PivotType, typename RandomIt> void _Quicksort(RandomIt begin, RandomIt end, int Threads) {         if (end - begin <= 1)                 return;         RandomIt pivot = PivotType<RandomIt>::GetPivot(begin, end); // Gets the pivot         RandomIt left = begin, right = end - 2;         std::iter_swap(pivot, end - 1); // Swaps the two values pointed by the two iterators         pivot = end - 1;         //while (/*left <= right && *//*right >= begin && *//*left < end*/1)         for (;;)         {                 while (*left < *pivot /*&& *//*left <= right && *//*left < end*/) ++left;                 while (*right > *pivot /*&& right >= left && */&& right > begin) --right;                 if (left > right) break;                 std::iter_swap(left, right);                 if (right == begin || left == end - 1) break;                 /*if (left < end) */++left;                 /*if (right > begin) */--right;         }         std::iter_swap(left, pivot);         //boost::thread* pThreads[2] = { NULL, NULL };         //if (Threads-- > 0)                 //pThreads[0] = new boost::thread(boost::bind(&_Quicksort<PivotType, RandomIt>, begin, left, 0));         //else                 Quicksort<PivotType>(begin, left);         //if (Threads-- > 0)                 //pThreads[1] = new boost::thread(boost::bind(&_Quicksort<PivotType, RandomIt>, left + 1, end, 0));         //else                 Quicksort<PivotType>(left + 1, end);         //if (pThreads[0]) pThreads[0]->join();         //if (pThreads[1]) pThreads[1]->join();         //Quicksort<PivotType>(begin, left);         //Quicksort<PivotType>(left + 1, end); }```
• 05-01-2010
anon
One thing you can do is to recurse only on one half of the range, which would halve the number of threads you are going to need. You can also sort small ranges with insertion sort, so as not to create a huge number of threads for very short ranges.
• 05-01-2010
Elysia
The plan is to spawn two threads only for the first two main recursions (by setting Threads = 2), but this is also slower, for some reason.
Also tried adding insertion sort (or a call to std::sort) for relatively low number of elements to sort, but with no noticeable speed gain.
• 05-01-2010
iMalc
It seems to work correctly when adapted for use in my sorting demo program.
I'm not surprised that you don't need all of the commented out checks. My Quicksort has even fewer and I've throughtly tested it. I'm only able to get away with that though because I make a copy of the pivot rather than merely swapping it around.

Performing a final insertion sort pass instead of quicksorting right down to the smallest portions reduces the comparison count, but increases the number of assignments done. It's usually a few percent faster, but it depends on how long the data type takes to compare vs to assign.

What you'd be doing right now is creating two new threads and then leaving the original thread just sitting there waiting, or performing two recursive calls. As Anon noted, I would only create at most one thread after each partition, then reuse the same thread for one of the portions, and create a thread or recurse for the other portion. Then after iterating you have the join. Well actually by iterating on the larger portion, you end up with multiple threads to join, but some of them are from the later partioning steps that occur during the iterative processing.

I'd also have a cutoff below which I just perform a recursive call directly for whenever the thread creation overhead makes it not worth it. You don't have to limit yourself to only creating threads on the first split.
Something like this (untested):
Code:

```template<template<typename> class PivotType, typename RandomIt> void _Quicksort(RandomIt begin, RandomIt end) {         std::vector<boost::thread*> threadVec;         while (end - begin < 1)         {                 RandomIt pivot = PivotType<RandomIt>::GetPivot(begin, end); // Gets the pivot                 RandomIt left = begin, right = end - 2;                 std::iter_swap(pivot, end - 1); // Swaps the two values pointed by the two iterators                 pivot = end - 1;                 for (;;)                 {                         while (*left < *pivot) ++left;                         while (*pivot < *right && begin < right) --right;                         if (right < left) break;                         std::iter_swap(left, right);                         if (right == begin || left == end - 1) break;                         ++left;                         --right;                 }                 std::iter_swap(left, pivot);                 // Find the smaller portion to create a new thread for, or recurse on.                 // The larger portion will be done via iteration                 if (end - (left + 1) < left - begin)                 {                         if (end - (left + 1) > 50) // arbitrary cutoff before deciding to use an extra thread                                 threadVec.push_back(new boost::thread(boost::bind(&_Quicksort<PivotType, RandomIt>, left + 1, end)));                         else                                 _Quicksort<PivotType, RandomIt>(left + 1, end);                         end = left;        // iterate on the smaller portion                 }                 else                 {                         if (begin, left > 50) // arbitrary cutoff before deciding to use an extra thread                                 threadVec.push_back(new boost::thread(boost::bind(&_Quicksort<PivotType, RandomIt>, begin, left)));                         else                                 _Quicksort<PivotType, RandomIt>(begin, left);                         begin = left + 1;        // iterate on the smaller portion                 }         }         for (std::vector<boost::thread*>::const_reverse_iterator it = threadVec.begin(); it != threadVec.end(); ++it)                 it->join(); }```
One could additionally keep a count of how many additional threads have been created in total, and use recursion whenever that count is reached, but that would require locking (perhaps InterlockedIncrement) since any of the threads might wish to create another thread.

You'll also notice my habbit of converting greater-than item comparisons into less-than item comparisons such that a less-than operator is all that is required for the sort.

I didn't increase the constant of the outmost while loop to then perform a final insertion sort pass in this instance because the final insertion sort is then done in with just one thread which is bound to be slower. One could perform mini insertion sorts as you encounter small partitions though.

How many cores do you have?
• 05-02-2010
Elysia
Quote:

Originally Posted by iMalc
It seems to work correctly when adapted for use in my sorting demo program.
I'm not surprised that you don't need all of the commented out checks. My Quicksort has even fewer and I've throughtly tested it. I'm only able to get away with that though because I make a copy of the pivot rather than merely swapping it around.

Yeah, likewise, this has also been thoroughly tested.
What difference would making a copy of the pivot do?

Quote:

Performing a final insertion sort pass instead of quicksorting right down to the smallest portions reduces the comparison count, but increases the number of assignments done. It's usually a few percent faster, but it depends on how long the data type takes to compare vs to assign.
The tests were done on simple ints.

Quote:

You'll also notice my habbit of converting greater-than item comparisons into less-than item comparisons such that a less-than operator is all that is required for the sort.
That is a good idea, actually. I didn't think of that.

Quote:

How many cores do you have?
2.

This is an interesting concept. But I still think it tries to spawn way too many threads. Something needs to be done about that.
In some scenarios, Quicksort could call itself a ridiculous amount of times.
Ah, I don't know. This needs to be tested, I suppose.
• 05-02-2010
phantomotap
>_<

\$#%&!

I'm sorry I didn't notice this yesterday. I may have been able to run it through my suite.

*shrug*

Anyway! Elysia, I strongly suggest you use an instantiated template, a type, with a nested accessor instead of template template parameter. (This as the nested `rebind' of the standard allocators.) The approach gives you a lot for very little effort.

Soma
• 05-02-2010
Elysia
Quote:

Originally Posted by phantomotap
Anyway! Elysia, I strongly suggest you use an instantiated template, a type, with a nested accessor instead of template template parameter. (This as the nested `rebind' of the standard allocators.) The approach gives you a lot for very little effort.

Would you show an example of such an approach?
• 05-02-2010
phantomotap
O_o

I already did. You literally need to look at the allocator parameter of the STL container classes, how that parameter is used, and how the technique manifests itself.

Soma
• 05-02-2010
Elysia
I'll take another look. But at a first glance, I have no idea how it's accomplished. Vector takes a type, yet there is no default type for allocator parameter, and the allocator parameter is not required.
• 05-02-2010
phantomotap
*shrug*

I'll be around for about another hour. If you don't have it by then, I'll post an explanation.

Soma
• 05-02-2010
Elysia
Hehehe. Digging through the source of M\$'s STL is a difficult thing, but let me see if I can find anything.

So let's see... by looking at vector, this is what I gather so far:
- It takes a type for the allocator (let's call is _Alloc).
- It has a member of type _Alloc<_Ty> (where _Ty is the vector type). This despite being able to pass in a type for the allocator, ie std::allocator<some_weird_type>.
- It calls the allocator object's member function allocate to allocate some memory.
What I fail to see is how this is better (if this is right).
• 05-02-2010
phantomotap
*shrug*

There exists an almost one-to-one translation for almost any use case for template template parameters.

I can think of far to many limits of template template parameters to even list. Using a nested accessor gives you a lot of extra utility for almost no extra work.

Soma

Code:

```#include <iostream> namespace example1 {         template         <                 typename T         >         struct policy         {                 static void execute                 (                         const T & t_f                 )                 {                         std::cout << t_f << '\n';                 }         };         template         <                 typename type_T,                 template <typename> class policy_C = policy         >         struct use_policy         {                 static void execute                 (                         const type_T & t_f                 )                 {                         typedef policy_C<type_T> policy_type;                         policy_type::execute(t_f);                 }         }; } namespace example2 {         template         <                 typename T         >         struct policy         {                 template                 <                         typename TR                 >                 struct rebind                 {                         typedef policy<TR> other;                 };                 static void execute                 (                         const T & t_f                 )                 {                         std::cout << t_f << '\n';                 }         };         struct void_type{};         template         <                 typename type_T,                 typename policy_T = policy<void_type>         >         struct use_policy         {                 static void execute                 (                         const type_T & t_f                 )                 {                         typedef typename policy_T::template rebind<type_T>::other policy_type;                         policy_type::execute(t_f);                 }         }; } namespace example3 {         template         <                 typename T         >         struct policy         {                 template                 <                         typename TR                 >                 struct rebind                 {                         typedef policy<TR> other;                 };                 static void execute                 (                         const T & t_f                 )                 {                         std::cout << t_f << '\n';                 }         };         template <> struct policy<void>         {                 template                 <                         typename TR                 >                 struct rebind                 {                         typedef policy<TR> other;                 };         };         template         <                 typename type_T,                 typename policy_T = policy<void>         >         struct use_policy         {                 static void execute                 (                         const type_T & t_f                 )                 {                         typedef typename policy_T::template rebind<type_T>::other policy_type;                         policy_type::execute(t_f);                 }         }; } namespace example4 {         template         <                 typename T0,                 typename T1 = void,                 typename T2 = void,                 typename T3 = void,                 typename T4 = void,                 typename T5 = void,                 typename T6 = void,                 typename T7 = void         >         struct policy         {                 template                 <                         typename TR                 >                 struct rebind                 {                         typedef policy<TR> other;                 };                 static void execute                 (                         const T0 & t_f                 )                 {                         std::cout << t_f << '\n';                 }         };         template <> struct policy<void>         {                 template                 <                         typename TR                 >                 struct rebind                 {                         typedef policy<TR> other;                 };         };         template         <                 typename type_T,                 typename policy_T = policy<void>         >         struct use_policy         {                 static void execute                 (                         const type_T & t_f                 )                 {                         typedef typename policy_T::template rebind<type_T>::other policy_type;                         policy_type::execute(t_f);                 }         }; } int main() {         example1::use_policy<int>::execute(5);         example2::use_policy<int>::execute(5);         example3::use_policy<int>::execute(5);         example4::use_policy<int>::execute(5);         return(0); }```
• 05-02-2010
Elysia
Whoa, you showed about 4 examples of doing almost the same thing.
So what is it the nested accessor accomplishes in this example which nested template parameters does not, except a more tricky syntax?
• 05-02-2010
iMalc
Quote:

Originally Posted by Elysia
This is an interesting concept. But I still think it tries to spawn way too many threads. Something needs to be done about that.
In some scenarios, Quicksort could call itself a ridiculous amount of times.
Ah, I don't know. This needs to be tested, I suppose.

The number of recursive calls, or thread spawns) is reduced by two things.
1. It only recurses, or spawns a thread on the smaller portion
2. There is a cutoff of 50 before another thread is spawned

Number 1 is also the typical solution to preventing specifically crafted worst-case input ordering from blowing the stack. The case that uses the most stack becomes the same as the case where the sorting partitions equally, and is hence the fastest.
For number 2, I didn't actually run this code (I haven't installed boost) so I don't know what a good cut-off for the minimum number before creating an additional thread is. This is why I would also do something like:
Code:

```localThreadCount = InterlockedIncrement(threadCount); if (end - (left + 1) > 50 && localThreadCount < 8) // Assuming we want a limit of 8 threads.     threadVec.push_back(new boost::thread(boost::bind(&_Quicksort<PivotType, RandomIt>, left + 1, end))); else {     InterlockedDecrement(threadCount);     _Quicksort<PivotType, RandomIt>(left + 1, end); } end = left;        // iterate on the smaller portion```
And similarly, use InterlockedDecrement in the loop performing the joins.
• 05-02-2010
Elysia
I did some research, and no matter how I did, multi-threading was always slower, sometimes by as much as 50%.
I made the tests take about 10s, and the multi-threaded approach clearly used 100% cpu and took 15s vs the 10s without multi-threading.
I simply copied the first partition step, spawned two threads, and let it call itself recursively.
So... I don't know. Might be a hardware issue.

However, the idea to use iteration for one part of the next partition step was a nice idea. It saved about a second, or 8% or some such. Nice speed boost.
Show 80 post(s) from this thread on one page
Page 1 of 2 12 Last