Hi, I'm writing various sorting algorithms in C++ and comparing their performance to each other (as well as trying to optimize them where I can). I'm running into a problem, but I'm not sure if this is a mistake on my part or if it is due to some physical limitation of the computer that I am unaware of. I'm hoping that someone here can shed some light on it.

I'm comparing Heapsort, Mergesort, and Quicksort, which operate on an array of randomly generated 16-bit shorts between -32768 to 32767 using srand() from <ctime> with seed time(0) (I chose short instead of int so that I could test the algorithms on large arrays while consuming half the memory that would be consumed by an array of ints). I've compiled these with g++.

For array sizes up to 10 million, the times that I get are pretty much what one would expect, i.e. quicksort outperforms mergesort, which outperforms heapsort. But for more than 20 million elements, quicksort performs worse than mergesort, and for more than 50 million elements, quicksort performs worse than heapsort. This is contrary to my understanding that on average, quicksort is the best performing algorithm of the three. And since these numbers are randomly generated, we expect that quicksort's worst case is not encountered.

Here is my Quicksort:

Note that here I have chosen the last element as the pivot for simplicity. Since the input is randomly generated, it should not have an impact on performance (for what it's worth I get the same problem if I randomly choose the pivot).Code:int partition(short * nums, int lbound, int rbound) { int pivotIndex = rbound - 1; while (lbound != pivotIndex) { if (nums[lbound] > nums[rbound]) { short tmp = nums[lbound]; nums[lbound] = nums[pivotIndex]; nums[pivotIndex] = tmp; pivotIndex--; } else lbound++; } if (nums[pivotIndex] < nums[rbound]) pivotIndex++; short tmp = nums[pivotIndex]; nums[pivotIndex] = nums[rbound]; nums[rbound] = tmp; return pivotIndex; } void quickSort(short * nums, int lbound, int rbound) { int size = rbound - lbound + 1; if (size <= 1) return; int pivotIndex = partition(nums, lbound, rbound); quickSort(nums, lbound, pivotIndex - 1); quickSort(nums, pivotIndex + 1, rbound); } void quickSort(short * nums, int size) { quickSort(nums, 0, size - 1); }

For reference, here is what I have for Mergesort:

In order to speed up this sort, I try to avoid using heap memory as much as possible and use stack allocations instead. STACKLIMIT_MERGE is an upper bound (determined experimentally) on the number of shorts for which I can use dynamic stack allocation (via alloca) and complete the mergesort without incurring a stack overflow exception.Code:void mergeSort(short * nums, int size) { if (size == 0 || size == 1) return; short * lnums; short * rnums; if (size <= STACKLIMIT_MERGE) { lnums = static_cast<short *>(alloca((size/2) * sizeof(short))); rnums = static_cast<short *>(alloca((size - size/2) * sizeof(short))); } else { lnums = new short[size/2]; rnums = new short[size - size/2]; } for (int i = 0; i < size/2; i++) { lnums[i] = nums[i]; rnums[i] = nums[size/2 + i]; } if (size % 2 != 0) rnums[size - size/2 - 1] = nums[size - 1]; mergeSort(lnums, size/2); mergeSort(rnums, size - size/2); int lindex = 0, rindex = 0; for (int i = 0; i < size; i++) { if ((rindex != size - size/2) && (lindex == size/2 || rnums[rindex] < lnums[lindex])) { nums[i] = rnums[rindex]; rindex++; } else { nums[i] = lnums[lindex]; lindex++; } } if (size > STACKLIMIT_MERGE) { delete [] lnums; delete [] rnums; } }

Finally, my heapsort is as follows:

My overall concern is to find out why the Quicksort is scaling worse than Mergesort and Heapsort.Code:void heapify(short * nums, int size) //max heap { int last = (int)(ceil((log((double)(size + 1)) / log((double)2)))); last = (int)(pow((float)2, last - 1) - 2); for (int i = last; i >= 0; i--) { int j = i; int k = ((2 * j + 2) < size) ? ((nums[2 * j + 1] > nums[2 * j + 2]) ? (2 * j + 1) : (2 * j + 2)) : (((2 * j + 1) < size) ? (2 * j + 1) : -1); if (k == -1) continue; while (nums[k] > nums[j]) { short swp = nums[k]; nums[k] = nums[j]; nums[j] = swp; j = k; k = ((2 * k + 2) < size) ? ((nums[2 * k + 1] > nums[2 * k + 2]) ? (2 * k + 1) : (2 * k + 2)) : (((2 * k + 1) < size) ? (2 * k + 1) : -1); if (k == -1) break; } } } void heapSort(short * nums, int size) { heapify(nums, size); int index = size - 1; while (index >= 0) { short root = nums[0]; nums[0] = nums[index--]; nums[index + 1] = root; //Critical step if (index <= 0) continue; int j = 0; int k = (index == 1) || (nums[1] > nums[2]) ? 1 : 2; while (nums[k] > nums[j]) //rebuild the heap { short swp = nums[k]; nums[k] = nums[j]; nums[j] = swp; j = k; k = ((2 * k + 2) <= index) ? ((nums[2 * k + 1] > nums[2 * k + 2]) ? (2 * k + 1) : (2 * k + 2)) : (((2 * k + 1) <= index) ? (2 * k + 1) : -1); if (k == -1) break; } } }

Here are some numbers for reference:

1 million elements - H: 0.641s, M: 0.625s, Q: 0.360s

10 million elements - H: 10.610s, M: 7.172s, Q: 5.922s

20 million elements - H: 24.188s, M: 14.765s, Q: 16.875s

50 million elements - H: 72.812s, M: 38.250s, Q: 78.313s

I'm running on Windows XP SP2, 2GB RAM, Intel x86 Centrino Duo (2 x 1.60 GHz), 120GB disk space (20GB free). I realize much of this info might be irrelevant.

Any insight into my issue would be much appreciated.