1. ## Sorting methods

Hey,

Since I never learned programming in school I figured I'd visit some of the common problem sets that most programs put you through. One of these problems sets was learning all the different sorting methods.

On the wikipedia quick sort page it says that insertion sort is faster on smaller arrays and alot of quick sorts use insertion sorts when they get down into the small arrays in the recursion. I was thinking about adding insertion sort into my quick sort algorithm to see if I can increase its speed.

So far I have clocked it at sorting 10 million randomly generated integers in 10-11 seconds. I wanted to see if I could increase that time, but every simulated test I have done shows that the quicksort is outperforming the insertion sort or at least equivalent even on small arrays of 10 or less. Since both finish the sorts in under a second its hard to notice a difference, and when I put a integar to show how many loops are done the quicksort is still a smaller number.

Any suggestions?

Code:
```void ReadAndSort::QuickSort(int left, int right)
{
int pivot = Numbers[(left+right)/2];
int x = left;
int y = right;
while(x <= y)
{
while(Numbers[x] < pivot)
x++;
while(Numbers[y] > pivot)
y--;
if(x <= y)
{
Swap(Numbers[x], Numbers[y]);
x++;
y--;
}
timesran++;
}
if(left < y)
QuickSort(left, y);
if(x < right)
QuickSort(x, right);
}

{
double Timeslooped = 0;
int j = 0;
for(int x = 1; x < Numbers.size()-1; x++)
{
j = x;
while( j > 0 && Numbers[j-1] > Numbers[j] )
{
int Temp = Numbers[j];
Numbers[j] = Numbers[j-1];
Numbers[j-1] = Temp;
j--;
Timeslooped++;
}
}
cout<<"Insertion sort took "<<Timeslooped<<" iterations."<<endl;
}```
Just to be clear the ReadAndSort class reads all the numbers in a file then sorts them then saves back to the file the newly sorted result. Numbers is a vector that it uses to hold all the numbers while sorting.

Also if you notice any glaring weaknesses in the code please let me know. Thanks!

2. If your small arrays get sorted too quickly, time how long it takes to sort a whole bunch of them.

3. You're implementation of QuickSort doesn't currently call InsertionSort for small subarrays, so how are you testing this?
See my version of QuickSort here and try varying the CUTOFF macro. Make it small enough and InsertionSort is no longer used; see how much slower that is.

4. I am sorry I think you misunderstood me.

I was testing QuickSort and Insertion sort side by side to see if the Insertion sort was actually faster than the Quicksort on small arrays just to see if it was worth adjusting the code.

Very usefull link though! Looks like you spent alot of time on this.

5. Hmm I added it to the QuickSort and it ended up being 2-3 seconds slower on 10,000,000 numbers maybe there is something wrong with my insertion sort.

6. I am curious to ask iMalc why he used for( ; ; ) to implement one of the loops in the algorithm, is there a difference between this and while(1) ? is it somehow faster? Given iMalcs expertise with sorting algorithms i think there must be a good reason, or is it just personal preference?

7. Originally Posted by rogster001
I am curious to ask iMalc why he used for( ; ; ) to implement one of the loops in the algorithm, is there a difference between this and while(1) ? is it somehow faster? Given iMalcs expertise with sorting algorithms i think there must be a good reason, or is it just personal preference?
They are equivalent in meaning, and are likely to result in exactly the same code. That said, a book on optimisation in C -- old when I read it years ago -- mentioned that some compilers -- old when the book was written -- may generate code that performs an unnecessary test given the while(1) version, whereas the for ( ;; ) version would always be recognised as a deliberate infinite loop, and hence should be preferred.

8. ok thanks, i suppose if there is the chance that no evaluation as such is perfomed on the 'for' version then that would be preferred .

Thiniking about it it may be logical to expect for( ;; ) not to evoke an evaluation - there is no true/false implied or automagically assigned is there - whereas while(1) straight away says 'i am true' - so one might expect this to be compiled as a test - gee whiz, just give me that computer science professorship already :->

9. I've seen a compiler complain about "conditional expresion is constant" before (I probably had the warning level up high) which is one reason I started using for(;, the other reason as already noted is that in a debug build say, the compiler might not optimise out the test, and the speed of debug builds if of at least some importance.

You might find that the code for Numbers.size()-1 is not turning out as fast as storing that in a variable might be. Then of course there's that cout statement in there which is sure to affect things. Best remove that when timing them.
I'd also remove timeslooped and timesran for timing them since you wouldn't have those in a final implementation.
At least that's some ways it might be slowing down the insertion sort to the speed of a quicksort on small arrays.

10. You might find that the code for Numbers.size()-1 is not turning out as fast as storing that in a variable might be.
Are you meaning that adjusting an int by hand to reflect the size would be a quicker int for the computer to compare to than grabbing a vector.size()?
To use the inserstion inside the bubble sort I had to add a (left, right) parameter to it as well so this isn't what slowed them down while working together, but nonetheless a good point.

Then of course there's that cout statement in there which is sure to affect things. Best remove that when timing them.
I'd also remove timeslooped and timesran for timing them since you wouldn't have those in a final implementation.
At least that's some ways it might be slowing down the insertion sort to the speed of a quicksort on small arrays.
Yea I commented most of the useless stuff out other than the timesran variable as I used it to see how many total iterations each sort style performed.
I am also using an array to time like this:
Code:
```#include <time.h>
....
...
int clock[2];
clock [0] = time(NULL);
ParticularSort(left, right);
clock [1] = time(NULL);
int TimeToRun = clock[1]-clock[0];
.....
cout<<"It took "<<TimeToRun<<" seconds to perform blah sort on "<<Numbers.size()<<" numbers."<<endl;
.....```
Since it is a .h I'm assuming its an old file and maybe one of you C++ gurus have a suggestion for tracking the time that would be more accurate? Milliseconds would help me break down performance on the sorts that take under a second to perform.

Thanks a ton for the information already and if you feel like checking it out I made a few sorts in JavaScript at
www.lesshardtofind.com/bubble.html

I'm still trying to figure out how to make my quicksort visual.. as you can't display with javascript while in a loop which is going to make recursion tricky! I got an idea though.

11. Originally Posted by Lesshardtofind
Are you meaning that adjusting an int by hand to reflect the size would be a quicker int for the computer to compare to than grabbing a vector.size()?
What iMalc is suggesting is that saving the value to an int and evaluating that int multiple times will not be slower than calling Numbers.size() multiple times. Apart from potential function call overhead, there are very few implementation choices for std::vector<>::size() that will be faster than retrieving the value.

Although - if using that approach - bear in mind that vector's size() method returns a size_t - which can hold values that an int cannot.

Originally Posted by Lesshardtofind
Since it is a .h I'm assuming its an old file and maybe one of you C++ gurus have a suggestion for tracking the time that would be more accurate? Milliseconds would help me break down performance on the sorts that take under a second to perform.
Firstly, you need to be clear in your requirement. What you are seeking is a high resolution timer. Accuracy and resolution are different things.

Either way, you would be better off increasing the number of test cases than looking for a more accurate or higher resolution timer. High-resolution timers also have significant jitter, so the accuracy of the timer tends to go down as resolution increases, unless your host system has hard realtime characteristics (and if you were using such a system, you wouldn't be asking the question in the first place).

But, if you want to characterise timings for an algorithm, it is usually easier to increase the number of runs (or test cases) than it is to find and control a timer with high resolution and high accuracy. All you need to do to get back to individual timings is divide by the number of runs. One other benefit is that variation of timer accuracy or precision will tend to be smoothed out. The only cost is the time needed to generate the additional test cases or runs, the time waiting for them to complete, and the processor cycles.