# Thread: Sorting algorithms, worst-case input

1. ## Sorting algorithms, worst-case input

NOTE: Not particularly a C++ programming question, rather than a general programming question. Code was done in C++ though, so I chose this sub-forum.

As part of a data structures project, we're asked to implement three sorting algorithms, Mergesort, Heapsort and Counting Sort. We're asked to give increasingly large inputs (I've used dynamically allocated integer arrays with up to 100 million items) and fill the input tables with:

a) Random-number generator (rand() didn't cut the mustard in such big tables as it only allows for a range of 32768, and the ways to get a proper random seed deemed frustrating, so I used an implementation of Mersenne Twister.)
b) Worst-case inputs

The first part is easy to implement, but the second part requires some good guessing, as there's no proper bibliography for sorting algorithms' worst-case inputs. The only thing you'll find is things such as "Heapsort [...] has the advantage of a worst-case Θ(n log n) runtime."

Worst-case inputs for algorithms like Bubble-sort could be easy to guess, a descendingly ordered input, for example. Counting sort requires more time as the range of numbers increases. However, Mergesort and Heapsort use more advanced methods of sorting, and I'm quite stumbled as to how to arrange the input to get a worst-case one.

I'm not asking you to solve my homework or give me the full answer, I'm just asking if anyone has an idea and can point me to the right direction.

2. If you use brute-force (try all permutations) to find the worst case for (say) 8 elements, do you notice a pattern?

That's how I would do it. No idea if it will work, though.

3. Hah! Brute-force! Would never have imagined it! There's no timing function however that has enough precision to calculate the time required to sort 8 numbers. I had to go up to millions of items in order to get times of milliseconds!

I'll have to maybe underclock my CPU to 10MHz or something...

4. Even at 10MHz it will probably sort 8 elements before you notice it. You'll need to underclock it to 10Hz or something. And make sure you get a lightweight OS, or you will be sitting there for a few years waiting for Windows or Linux to boot up.

Or, you could add a global counter to count the number of swaps/comparisons done.

Generally, though, if you want to time something this small, the usual way is to loop it a few million times (taking some measures to make sure compiler optimizations won't screw it up).

5. Yeah, guess I'll have to use brute-force as a last resort. Any other ideas, anyone?

It's going to be hard to implement a million runs of this the way I've structured the code, as I'm using a double-nested for loop.

In the current state the code has 3 kinds of table initialisation (sorted, inverted, random) and 3 sorting algorithms (mergesort, heapsort, counting sort), so the double-nested loop makes sure the to-be sorted table is reinitialized as needed for each run.

Parts of code:

Code:
```const int sorts = 3;
const int inits = 3;
const int testruns = 20;
const int number_range = 20000000;
const int my_size = 100 * 1000000;```
Code:
```int main()
{
int* myarray = new int[my_size];

int size_n = my_size;

// !!! Only uncomment the following line if you want all printed messages to go
// !!! to "mybench.txt" instead of the console! System messages will not be shown
// !!! either, such as the output of (system("PAUSE"))!
//freopen("mybench.txt", "w", stdout);

for (int k=0;k<testruns;k++) {
cout << "-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-" << endl;
cout << "Run " << k+1 << " out of " << testruns << endl;
cout << "-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-" << endl;
cout << endl;
for (int i=0;i<inits;i++){
switch (i){
case 0:
cout << "***Inverse Fill Init***" << endl << endl;
break;
case 1:
cout << "***Sorted Fill Init***" << endl << endl;
break;
case 2:
cout << "***Random Fill Init***" << endl << endl;
break;
}
for (int j=0;j<sorts;j++){
switch(j){
case 0:
cout << ("MergeSort\n===============================\n");
break;
case 1:
cout << ("HeapSort\n===============================\n");
break;
case 2:
cout << ("Counting Sort\n===============================\n");
break;
}

cout << "Required CPU time in seconds: " << endl;
//for(int k=0;k<10;k++){
initArray(myarray, size_n, i);
doSort(myarray, size_n, j, k, i);
//}
cout << endl;
}

}
}

// Εκτύπωση αποτελεσμάτων σε αρχείο
mybench.setf(ios::fixed, ios::floatfield);
mybench.precision(3);
for(int k=0;k<(inits * testruns);k++){
if (k%testruns==0) mybench << endl;
mybench << c_req_merge[k] << "\t" << c_req_heap[k] << "\t" << c_req_count[k] << endl;
}

/*
system("PAUSE");
printMyArray(myarray, size_n);
*/

delete[] myarray;
//system("PAUSE");
return 0;
}```
Code:
```void doSort(int myarray[], int size_n, int typeOfSort, int run, int init)
{
clock_t c_bef,c_aft,c_req;

float c_req_f;

c_bef = clock();

switch(typeOfSort){
case 0:
// 0: MergeSort
mergeSort(myarray, myarray, size_n);
break;

case 1:
// 1: heapSort
heapSort(myarray, size_n);
break;

case 2:
// 2: Counting Sort
counting_Sort(myarray, size_n);
break;
}

c_aft = clock();

c_req = c_aft - c_bef;

c_req_f = (float)c_req / (float)CLOCKS_PER_SEC;

cout.setf(ios::fixed, ios::floatfield);
cout.precision(3);
cout << c_req_f << endl;

switch(typeOfSort){
case 0:
c_req_merge[(init * testruns) + run] = c_req_f;
break;
case 1:
c_req_heap[(init * testruns) + run] = c_req_f;
break;
case 2:
c_req_count[(init * testruns) + run] = c_req_f;
break;
}

}```

6. I recommend the global counter approach. Increment it by one everytime you compare/swap in the sort.

I just did it with my 6 sorting functions I wrote some time ago for practice. The results are quite amazing, but I won't spoil it for you . Many were unexpected.

7. Did you use the Brute Force method? Mind sharing your code?

8. Yes, I brute-forced it.

The code is quite messy, though.

Hint: use std::next_permutation() in <algorithm>

9. Here's the code I'm using to test MergeSort:

Code:
```int main(){

int* myarray = new int[my_size];
int* myarray2 = new int[my_size];
int* temp = new int[my_size];

for (int i=0;i<my_size;i++)
myarray[i]=i+1;

do {
for (int i=0;i<my_size;i++){
cout << myarray[i] << " ";
myarray2[i] = myarray[i];
}
//cout << endl;

mergecomps=0;

mergeSort(myarray2, temp, my_size);

/*for (int i=0;i<my_size;i++)
cout << myarray2[i] << " ";
cout << endl;*/

cout << " - This permutation required " << mergecomps << " comparisons." << endl;
//system("PAUSE");

} while ( next_permutation (myarray,myarray+my_size) );

system("PAUSE");

}```
However, I have no idea how to mine the information I need from all that data. You talked about patterns. So I should grab all the cases where the algorithm hit its top comparisons/swaps and see if I can notice a pattern?

Even with an array of size say 4 or 5 has too much data to actually notice paterns.

10. I would add another variable
Code:
`int max_comps = -1;`
And print out "myarray" whenever mergecomps > max_comps, and update max_comps.

(Or just remember the "myarray" associated with the max_comps.

11. Oookay... So an array with a size of 6, using MergeSort, gives the attached results. You can of course ignore every permutation below the actual maximum comparsions count. It doesn't take many permutations to actually reach it.

12. That's interesting... for my implementation, it's "0 1 2 3 4 5 6 7 8 9" for mergesort. Further investigation reveals that all permutations take exactly the same number of comparisons, suggesting that worst case = best case.

It's probably just slight differences in implementation.

13. MergeSort demo

As you can see, there are cases in which the algorithm requires more comparisons. Try using the Ascending and Descending orders first, then the one called "Strange". You should see the difference.

Or... I'm just counting comparisons wrong. Check it out and tell me what you think.