Originally Posted by

**Niccolo**
Here's one of my favorites (and I'm certain** @flp1969 **that this is a study of the algorithm).

Ok, @cooper1200, buckle in.

First, about ( i < j ) - it's probably ok. It is a "loop termination" check. Set that aside for a moment (like you did), it should be ok. There are several versions which may do this differently, but the overall design is the same thing.

In many years of study (we're perpetual students, right), I've never seen a book or professor actually inform students the moment sorting has actually happened. It is actually subtle and wonderful.

There were a few PhD's earned on optimizations and details about this algorithm. Choosing the initial pivot value is it's own "thing", but IT IS NOT FIRST as your code suggests. It is supposed to be a value we might assume is near the center of the data (so this is somewhat equally dividing the data into two parts), and is nominally "median", as in not the lowest or highest, but a value near the middle. You can just choose any value, but if that HAPPENS to be the least or greatest, or close to it, the efficiency degrades highly. The "median of 3" is typically chosen - that is, 3 random samples are taken, the center of the 3 taken, and most often that is assume to belong near the middle of the array being sorted (and swapped into that position if required).

There's a "tail" of sorts in professional builds which resort to a different type of sort (usually insert sort) when the size of the data involved is small (somewhere below 15 items).

...but we'll assume the algorithm is about to get going, and a suitable Pivot will be selected.

Now, imagine you have dealt cards from a deck on the table to be sorted, set left to right as an array in memory, and in random order. Let's say there's 20 cards.

Look at entry 10 and ask, is this obviously very low (A, 2) or very high (K,Q)? Maybe that's not a good pivot choice. Find a 5, a 6 nearby and swap it. Move this center card upward to note it is the center pivot card.

Now, point your left finger on the leftmost card, and your right on the rightmost card.

Rest.

Now, compare the card on your left (you're pointing to) with the pivot card. Which is less? If the Pivot is greater, move your finger to the next card going toward the right. Repeat the comparison, which is less, the pivot or your fingered card? Keep doing this until the pivot is lower than the card under your finger. When that happens, swap those cards, but leave your finger pointing to the next card.

Now, check your right finger and compare to the pivot. Which is greater? If the fingered card is greater, move toward the left and repeat. When the fingered card is lesser, swap with the pivot (then return to your left hand sequence). Keep this up, bouncing between left and right processing, until your hands cross.

What just happened?

You've moved through all of the cards on the left side. Every swap you made ensure that those on the left are less than the pivot. When you moved through the cards on your right side you've made sure all of those on the right are greater than the pivot card. You switch between left and right just so a card switched to the pivot card has a chance to jump into the right side of the collection (a K on the left would have made it into the right section, or an A or 2 on the right would have made into the left side).

At that moment, before you get to the two recursive calls, the pivot card is in it's final, sorted position. This is the moment the first item to be sorted has been sorted into position. It is the moment I never see texts or teachers ever pause to make clear to students.

At this point, though, that's the only card that is placed. It is so because you know, because of the process, all of the cards on the left are less than the pivot. You know the card is in it's final position, and won't be moved again, because ALL OF THE CARDS BELOW IT ARE ON THE LEFT - it is a count of all cards lower than the pivot. That establishes the position for the pivot card.

That's really the point of the algorithm. You're tossing cards around to find which one belongs in that pivot position.

With that step completed, the left and right sides are treated as two, smaller arrays. You choose those arrays to ignore the pivot (it should not be considered again, it's done).

The left and right side are both processed the same way, only smaller in size. The entire operation repeats on each partition, and each step places one pivot card, until the result is a partition size of 2 (which only requires a quick look and possibly a swap if they're reversed)....and it's done.

I have slightly over simplified the explanation. Edge cases can exist, sometimes the pivot position must drift (when badly chosen at the start of a recursion).

It is know this performs poorly when fed data already in order (or largely in order). Some data, loosely ordered, play havok with the algorithm, and professional implementations may take effort to note the condition and recover.

Questions?

A few other related points:

There are experimental versions of this which use two pivot points (making 3 sections) and 3 pivot points (making 4 sections). It turns out these aren't all that helpful usually. The dual pivot version SEEMS like it should be faster, but it really isn't most of the time. Oddly (well, researchers know why, but we're not full fledged scientists) the 3 pivot point version (creating 4 sections) can be faster, but only under the situation where the comparison function is heavy. If the sort is working with, say, localized strings (multiple languages), the performance may place a high priority on reducing the number of comparisons performed. This is where the 3 pivot version excels. However, the algorithm to do it is more complex, so when the comparison is simple (like integers), it is actually slower.

On the other hand, it is possible to schedule the recursive calls to the left and right partitions as threads. This can lead to an explosion of threads (there will be many, many sections to sort in large sets), so a task scheduler should be used. This does not give "linear" results, but it does improve performance. It is roughly about 2.5 to 3 times faster in 4 threads, for example. There is an issue with CPU cache and RAM which impacts what benefits may be observed, so the scheduling of the sections may well need to acknowledge adjacent groups so they're cache friendly when threaded (running in parallel).