Thread: Sorting algorithms, worst-case input

  1. #1
    Registered User
    Join Date
    Dec 2007
    Posts
    53

    Sorting algorithms, worst-case input

    NOTE: Not particularly a C++ programming question, rather than a general programming question. Code was done in C++ though, so I chose this sub-forum.

    As part of a data structures project, we're asked to implement three sorting algorithms, Mergesort, Heapsort and Counting Sort. We're asked to give increasingly large inputs (I've used dynamically allocated integer arrays with up to 100 million items) and fill the input tables with:

    a) Random-number generator (rand() didn't cut the mustard in such big tables as it only allows for a range of 32768, and the ways to get a proper random seed deemed frustrating, so I used an implementation of Mersenne Twister.)
    b) Worst-case inputs

    The first part is easy to implement, but the second part requires some good guessing, as there's no proper bibliography for sorting algorithms' worst-case inputs. The only thing you'll find is things such as "Heapsort [...] has the advantage of a worst-case Θ(n log n) runtime."

    Worst-case inputs for algorithms like Bubble-sort could be easy to guess, a descendingly ordered input, for example. Counting sort requires more time as the range of numbers increases. However, Mergesort and Heapsort use more advanced methods of sorting, and I'm quite stumbled as to how to arrange the input to get a worst-case one.

    I'm not asking you to solve my homework or give me the full answer, I'm just asking if anyone has an idea and can point me to the right direction.

    I'd be glad to get an answer anytime, even after the deadline is due.

  2. #2
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    If you use brute-force (try all permutations) to find the worst case for (say) 8 elements, do you notice a pattern?

    That's how I would do it. No idea if it will work, though.

  3. #3
    Registered User
    Join Date
    Dec 2007
    Posts
    53
    Hah! Brute-force! Would never have imagined it! There's no timing function however that has enough precision to calculate the time required to sort 8 numbers. I had to go up to millions of items in order to get times of milliseconds!

    I'll have to maybe underclock my CPU to 10MHz or something...

  4. #4
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Even at 10MHz it will probably sort 8 elements before you notice it. You'll need to underclock it to 10Hz or something. And make sure you get a lightweight OS, or you will be sitting there for a few years waiting for Windows or Linux to boot up.

    Or, you could add a global counter to count the number of swaps/comparisons done.

    Generally, though, if you want to time something this small, the usual way is to loop it a few million times (taking some measures to make sure compiler optimizations won't screw it up).

  5. #5
    Registered User
    Join Date
    Dec 2007
    Posts
    53
    Yeah, guess I'll have to use brute-force as a last resort. Any other ideas, anyone?

    It's going to be hard to implement a million runs of this the way I've structured the code, as I'm using a double-nested for loop.

    In the current state the code has 3 kinds of table initialisation (sorted, inverted, random) and 3 sorting algorithms (mergesort, heapsort, counting sort), so the double-nested loop makes sure the to-be sorted table is reinitialized as needed for each run.

    Parts of code:

    Code:
    const int sorts = 3;
    const int inits = 3;
    const int testruns = 20;
    const int number_range = 20000000;
    const int my_size = 100 * 1000000;
    Code:
    int main()
    {
    	int* myarray = new int[my_size];
    
    	int size_n = my_size;
    
    	// !!! Only uncomment the following line if you want all printed messages to go
    	// !!! to "mybench.txt" instead of the console! System messages will not be shown
    	// !!! either, such as the output of (system("PAUSE"))!
    	//freopen("mybench.txt", "w", stdout);
    
    	for (int k=0;k<testruns;k++) {
    		cout << "-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-" << endl;
    		cout << "Run " << k+1 << " out of " << testruns << endl;
    		cout << "-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-" << endl;
    		cout << endl;
    		for (int i=0;i<inits;i++){
    			switch (i){
    			case 0:
    				cout << "***Inverse Fill Init***" << endl << endl;
    				break;
    			case 1:
    				cout << "***Sorted Fill Init***" << endl << endl;
    				break;
    			case 2:
    				cout << "***Random Fill Init***" << endl << endl;
    				break;
    			}
    			for (int j=0;j<sorts;j++){
    				switch(j){
    				case 0:
    					cout << ("MergeSort\n===============================\n");
    					break;
    				case 1:
    					cout << ("HeapSort\n===============================\n");
    					break;
    				case 2:
    					cout << ("Counting Sort\n===============================\n");
    					break;
    				}
    
    				cout << "Required CPU time in seconds: " << endl;
    				//for(int k=0;k<10;k++){
    				initArray(myarray, size_n, i);
    				doSort(myarray, size_n, j, k, i);				
    				//}
    				cout << endl;
    			}
    
    
    		}
    	}
    
    	// Εκτύπωση αποτελεσμάτων σε αρχείο
    	mybench.setf(ios::fixed, ios::floatfield);
    	mybench.precision(3);
    	for(int k=0;k<(inits * testruns);k++){
    		if (k%testruns==0) mybench << endl;
    		mybench << c_req_merge[k] << "\t" << c_req_heap[k] << "\t" << c_req_count[k] << endl;		
    	}
    
    	/*
    	system("PAUSE");
    	printMyArray(myarray, size_n);
    	*/
    
    	delete[] myarray;
    	//system("PAUSE");
    	return 0;
    }
    Code:
    void doSort(int myarray[], int size_n, int typeOfSort, int run, int init)
    {
    	clock_t c_bef,c_aft,c_req;
    
    	float c_req_f;
    
    	c_bef = clock();
    
    	switch(typeOfSort){
    		case 0:
    			// 0: MergeSort
    			mergeSort(myarray, myarray, size_n);
    			break;
    
    		case 1:
    			// 1: heapSort
    			heapSort(myarray, size_n);
    			break;
    
    		case 2:
    			// 2: Counting Sort
    			counting_Sort(myarray, size_n);
    			break;
    	}
    
    	c_aft = clock();
    
    	c_req = c_aft - c_bef;
    
    	c_req_f = (float)c_req / (float)CLOCKS_PER_SEC;
    
    	cout.setf(ios::fixed, ios::floatfield);
    	cout.precision(3);
    	cout << c_req_f << endl;
    
    	switch(typeOfSort){
    		case 0:
    			c_req_merge[(init * testruns) + run] = c_req_f;
    			break;
    		case 1:
    			c_req_heap[(init * testruns) + run] = c_req_f;
    			break;
    		case 2:
    			c_req_count[(init * testruns) + run] = c_req_f;
    			break;
    	}
    
    }
    Last edited by Leftos; 06-13-2009 at 06:19 AM.

  6. #6
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I recommend the global counter approach. Increment it by one everytime you compare/swap in the sort.

    I just did it with my 6 sorting functions I wrote some time ago for practice. The results are quite amazing, but I won't spoil it for you . Many were unexpected.

  7. #7
    Registered User
    Join Date
    Dec 2007
    Posts
    53
    Did you use the Brute Force method? Mind sharing your code?

  8. #8
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Yes, I brute-forced it.

    The code is quite messy, though.

    Hint: use std::next_permutation() in <algorithm>

  9. #9
    Registered User
    Join Date
    Dec 2007
    Posts
    53
    Here's the code I'm using to test MergeSort:

    Code:
    int main(){
    
    	int* myarray = new int[my_size];
    	int* myarray2 = new int[my_size];
    	int* temp = new int[my_size];
    
    	for (int i=0;i<my_size;i++)
    		myarray[i]=i+1;
    
    	do {
    		for (int i=0;i<my_size;i++){
    			cout << myarray[i] << " ";
    			myarray2[i] = myarray[i];
    		}
    		//cout << endl;
    
    		mergecomps=0;
    
    		mergeSort(myarray2, temp, my_size);
    
    		/*for (int i=0;i<my_size;i++)
    			cout << myarray2[i] << " ";
    		cout << endl;*/
    
    		cout << " - This permutation required " << mergecomps << " comparisons." << endl;
    		//system("PAUSE");
    
    	} while ( next_permutation (myarray,myarray+my_size) );
    
    	system("PAUSE");
    
    }
    However, I have no idea how to mine the information I need from all that data. You talked about patterns. So I should grab all the cases where the algorithm hit its top comparisons/swaps and see if I can notice a pattern?

    Even with an array of size say 4 or 5 has too much data to actually notice paterns.

  10. #10
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I would add another variable
    Code:
    int max_comps = -1;
    And print out "myarray" whenever mergecomps > max_comps, and update max_comps.

    (Or just remember the "myarray" associated with the max_comps.

  11. #11
    Registered User
    Join Date
    Dec 2007
    Posts
    53
    Oookay... So an array with a size of 6, using MergeSort, gives the attached results. You can of course ignore every permutation below the actual maximum comparsions count. It doesn't take many permutations to actually reach it.

    Any ideas on how to start with patterns?

  12. #12
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    That's interesting... for my implementation, it's "0 1 2 3 4 5 6 7 8 9" for mergesort. Further investigation reveals that all permutations take exactly the same number of comparisons, suggesting that worst case = best case.

    It's probably just slight differences in implementation.

  13. #13
    Registered User
    Join Date
    Dec 2007
    Posts
    53
    MergeSort demo

    As you can see, there are cases in which the algorithm requires more comparisons. Try using the Ascending and Descending orders first, then the one called "Strange". You should see the difference.

    Or... I'm just counting comparisons wrong. Check it out and tell me what you think.

  14. #14
    Registered User
    Join Date
    Dec 2007
    Posts
    53

    Answer found!

    So, brute-forcing did work, many thanks cyberfish!

    I was able to mine the conclusions for HeapSort, and here they are directly translated from my Uni's forum as I posted them.
    Possible answer to HeapSort Worst-Case Input
    I think that if you draw two or three examples from the above file you will easily see the pattern. To achieve the worst-case scenario you have to understand how the algorithm works and do the opposite job. Most, if not all, implementations of HeapSort use maxHeap so that the first item is the largest of the array, and to easily put it in the end. So to "burden" the algorithm, you have to have any input that qualifies as minHeap!
    Doc mentioned containing worst-case inputs for n=15 (took over 3 trillion permetuations!): comps_heap_worst_n15

  15. #15
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I have no idea why we got different things for mergesort, but it's probably where we count the comparisons and swaps... guess this is not really a scientific way.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. get keyboard and mouse events
    By ratte in forum Linux Programming
    Replies: 10
    Last Post: 11-17-2007, 05:42 PM
  2. Intel syntax on MinGW ?
    By TmX in forum Tech Board
    Replies: 2
    Last Post: 01-06-2007, 09:44 AM
  3. Keypress reading
    By geek@02 in forum Windows Programming
    Replies: 1
    Last Post: 06-16-2004, 12:16 PM
  4. error with code
    By duffy in forum C Programming
    Replies: 8
    Last Post: 10-22-2002, 09:45 PM
  5. Extra printed stmts...why?
    By mangoz in forum C Programming
    Replies: 4
    Last Post: 12-19-2001, 07:56 AM

Tags for this Thread