C Board  

Go Back   C Board > General Programming Boards > C Programming

Reply
 
LinkBack Thread Tools Display Modes
Old 01-19-2009, 11:17 PM   #16
C++ Witch
 
laserlight's Avatar
 
Join Date: Oct 2003
Location: Singapore
Posts: 11,358
Quote:
Originally Posted by MK27
I'm actually writing (another) program that performs four or five different styles of searches, counts comparisons and list iterations, shuffles, generates random lists, switches criteria, breaks ties with secondary criteria, etc. Real fun, for the whole family! So I expected to see those O(n^2) tables confirmed
You should see them confirmed. If you don't, then your input sizes are not large enough to matter (since "big O" notation refers to asymptotic growth, not actual runtime performance in all cases) and/or you used pathological worst case input for the faster algorithms (e.g., certain canonical variants of quick sort degrade to O(n^2) complexity when sorting an array already sorted in reverse), and/or best case input for the slower algorithms.
__________________
C + C++ Compiler: MinGW port of GCC
Build + Version Control System: SCons + Bazaar

Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
laserlight is online now   Reply With Quote
Old 01-20-2009, 03:13 AM   #17
Algorithm Dissector
 
iMalc's Avatar
 
Join Date: Dec 2005
Location: New Zealand
Posts: 2,742
Quote:
Originally Posted by MK27 View Post
No, actually I've been trying to take you back from the dark side. Of course, I never really believed what I was saying, I just thought I did.
Well I've certainly learned that you are a rather interesting character.

Quote:
Originally Posted by MK27 View Post
I've also formed a sort of emotional defensiveness about bubblesort since the first time I had to sort anything, that's what I intuitively wrote. laserlight later identified it as such, scornfully I think -- and now tabstop says I've been using some kind of cheat flag!
My first sorting algorithm wasn't bubblesort. It was pretty similiar except it didn't swap only adjacent items. Coming up with bubblesort independently is nothing to be sneezed at. If one were to immediately think it were the best method there is, that might be well, less-productive thinking.

Quote:
Originally Posted by MK27 View Post
So when I heard somebody else in history had come up with a better method(s), I felt determined to master them. Not to be outdone as a masochist either, I'm actually writing (another) program that performs four or five different styles of searches, counts comparisons and list iterations, shuffles, generates random lists, switches criteria, breaks ties with secondary criteria, etc. Real fun, for the whole family! So I expected to see those O(n^2) tables confirmed, and thankfully tabstop cleared this flag issue up or my suspicion that many computer science gurus are more determined to obfuscate than explain would have reached critical proportions.
Code as much as you can, I have no problem with that. You've simply shown me that it's just harder than I realised to explain something to someone who you don't realise doesn't know common terms etc.

Quote:
Originally Posted by MK27 View Post
No doubt. I've never taken any math beyond grade ten but I imagine I will have to if I ever want to work as a programmer. It just seems to me that most computer math is really not complicated, but it ends up couched in "confusing to the layman" style because the assumption (probably correct in most cases) is that the student of computers was already a student of math. What's further implied is that you have to be because it's so useful, and that's part of what I doubt. So anyway, instead of one esoteric, obscurantistic swamp to wade in, there is two! I should probably be happy. But seriously (I hope someone is listening!), much of the information could be explained much better, rather than just sticking to that old story. A lot of web tutorials read like photocopies of one another, which might be okay if the original had more merit. The more such "methods of explanation" get repeated, the more they acquire a weight potentially divorced from their real value. That is probably the heart of my criticism, iMalc. Honestly. About guarding the establishment. It would be nice if we had more "look, I did this/had to do this, and I'm not saying I regret it myself, but for the future maybe we really don't need to use so much latin where english will do".
I never took classes because I felt I had to, to be a programmer. As far as I was concerned, I already was a programmer. I did so because they were interesting to me. I know a number of programmers with quite varying levels of capability or qualification in various areas, but I believe the main thing you need to be successful is determination and a passion for knowledge, which probably mostly comes down to being one of a certain group of personality types.
You'll find that it isn't really possible to learn such a huge amount about a various subject without ending up using the same terms commonly used, even though you're at the same time putting it in your own words.

Quote:
Originally Posted by MK27 View Post
Is that a gauntlet on the floor?!
I'm sorry, I'm not so good with metaphors or quotes sometimes.

Quote:
Originally Posted by MK27 View Post
Just kidding. By "insert/remove in constant time" I presume you mean the difference between a linked list and an array, where the entire array must be re-numbered. "Random access" I'll have to google...ugh.
Yeah that's it. Sorry, I'm not used to someone not knowing the term. You'll know what it is, you just aren't familiar with the term for it.

Quote:
Originally Posted by MK27 View Post
Well, one possible relevance is that in circumstances where a list is mostly sorted but slightly changed, the bubblesort may be the best choice.
Yeah bubblesort definitely has it's uses where counts are small etc.

Quote:
Originally Posted by MK27 View Post
This one is definitely an armoured glove.
Again, I'm not familiar with the metaphor. Nevermind, this time it's my turn to look something up!

Quote:
Originally Posted by MK27 View Post
Okay, but if you are worried that you potentially sound like someone who is (are you worried, iMalc?) but really you are not, why not test yourself further by seeing if you can come up with a new/different/better method of explanation rather than resorting to the one everyone's heard already? At least then there is the chance of some original thought occurring.
You know I'm not really worried. I'm guilty of blabbering today though. Hey, I'm on holiday and it was raining most of the day outside. I had to do something . It's kind of hard to prove you came up with something independently of someone else when the facts lead you both down the same path though. I guess I should be flattered that any explanations I've given sound like something someone else once wrote. I didn't think I was that good at explaining stuff.

Quote:
Originally Posted by MK27 View Post
...said the spider to the fly. What if The Emperor Wears No Clothes?
If I cared enough you'd certainly be keeping me busy here. Seriously, I don't read fiction - ever.

But I digress. Can we be of any more assistance with the original topic?
__________________
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger

Last edited by iMalc; 01-20-2009 at 03:16 AM.
iMalc is offline   Reply With Quote
Old 01-26-2009, 05:55 PM   #18
critical genius
 
MK27's Avatar
 
Join Date: Jul 2008
Location: SE Queens
Posts: 5,212
Quote:
Originally Posted by laserlight
You should see them confirmed. If you don't, then your input sizes are not large enough to matter (since "big O" notation refers to asymptotic growth, not actual runtime performance in all cases) and/or you used pathological worst case input for the faster algorithms (e.g., certain canonical variants of quick sort degrade to O(n^2) complexity when sorting an array already sorted in reverse), and/or best case input for the slower algorithms.
For the most part, yes. I think I've written the coolest linked-list comparative sort demo program ever! It does everything I mentioned earlier plus you can use two text files (lists of a single word on each line) to create random lists for the node struct (a first name, a last name, and an assigned "number").
Code:
 typedef struct _node {          /* singly linked list */
         char first[64];
         char second[64];
         int ID;
         struct _node *next; 
 }node;
You can also choose which criteria to use in the sort and a secondary criteria used to break ties. So using a short dictionary*, you can create lists of hundreds of thousands of nodes with very few possible ties, or using secondary criteria millions. However, you probably don't want to use the slower sorts there -- on my dual core pentium it took about about half an hour to bubblesort 65500 records (over 5 billion iterations). The mergesort will do 1,000,000 records in 6 seconds (a little less than 20 million iterations).
*there are lots of these kinds of (usually alphabetized) text lists of names and/or words around.

But the real purpose of my post now is to shed some further light on the significance of the big O notation and the issues it conceals.

Of course, deciding how to count the iterations was not as clear cut for all the sort styles. But here's some sample output:

Code:
[root~/C] ./a.out -h
Usage: ./a.out [-h] [-f -l] [-1] [-2] [-d]
	-d either 1 or 2: copious output to stderr
	-h print this message
	-f filename
	-l number of records in list, needed with -f
	-1 primary sort criteria (1=first name, 2=last name[default], 3=number)
	-2 secondary sort criteria (0=none[default], 1=first name, etc.)

Eg. "./a.out -f data.txt -l 27 -1 2 -2 1"
Records should have the structure "Some Name:0101".

[root~/C] ./a.out
Using "", 0 records
Primary Criteria offset=64

Command (h for help, Q to quit): h

d display list
h show this message
b (modified) bubblesort
i insertion sort
m mergesort
q quicksort
s selection sort
S shuffle
N new random list
R reload from file
W write to file

Command (h for help, Q to quit): N
How many records? 10 
New list is 1.41 kb

Command (h for help, Q to quit): d
0x1a90010 nonglare,James (1)
0x1a900b0 stroys,James (420)
0x1a90150 enslave,Jim (1024)
0x1a901f0 mirex,Tammy (666)
0x1a90290 peartly,Lorenzo (420)
0x1a90330 soymilks,Bob (1024)
0x1a903d0 weeks,Robert (0)
0x1a90470 addrest,Tammy (69)
0x1a90510 starnose,Leslie (27)
0x1a905b0 acerous,Paul (1024)

Command (h for help, Q to quit): N
How many records? 10000
New list is 1406.25 kb

Command (h for help, Q to quit): b

Bubblesorting.......
97760000 Iterations	 Comparison calls: 97750224
Elapsed time: 0 min 31 sec

Command (h for help, Q to quit): S
Shuffled.

Command (h for help, Q to quit): s

Selectionsorting....
99980002 Iterations	 Comparison calls: 49995000
Elapsed time: 0 min 19 sec

Command (h for help, Q to quit): S
Shuffled.

Command (h for help, Q to quit): i

Insertionsorting.......
24883640 Iterations	 Comparison calls: 24883640
Elapsed time: 0 min 6 sec
(The "shuffle" is a Fisher-Yates algorithm.)
So the bubble sort and the selection sort do about what is expected vis. "iterations", (10,000 squared), the insertion sort comes out quite a bit better. Iterations/comparisons remain at a constant ratio for the selection sort, whereas because the bubblesort is "modified" (using a flag as per my discussion with tabstop above), the bubblesort and the insertion sort have a best case of n or n-1 for both iterations and comparisons.

The default is a string comparison, which is much more processor intensive than a numerical comparison (eg, selection sort will do 10,000 records by ID number in 8 seconds). The difference between the number of comparisons is what accounts for the speed difference above; while both bubble and selection sort did almost n^2 iterations, selection sort did half as many comparisons. It should be noted that while the iteration count may be subject to criticism, the comparison count is absolute as it is incremented every time a discrete comparison function is called. This discrepancy (that an iteration need not be a comparison) is not referred to in many discussions of sorting efficiency and is not accounted for with big O.

A third measure is available with the number of recursive function calls in merge sort and quick sort. Because the sorts are similarly recursive, these counts are always the same:
Code:
Command (h for help, Q to quit): N
How many records? 2097152
New list is 294912.00 kb
Command (h for help, Q to quit): m

Mergesorting..............2097151 function calls
61865985 Iterations	 Comparison calls: 41388898
Elapsed time: 0 min 15 sec


Command (h for help, Q to quit): S
Shuffled.

Command (h for help, Q to quit): q

Quicksorting..............2097151 function calls
148857828 Iterations	 Comparison calls: 75517673
Elapsed time: 0 min 32 sec
2^21 is 2097152, so we would expect 43830192 here as the the big O (n*log(n)) best/average score here. Initially, I could not write a quick sort faster than a merge sort and am not sure why anyone would think it should be faster, except that it does hold out some possibilities for optimization with the pivot value. I couldn't be bothered to go further than selecting the lowest of the two first values as the pivot. It does much, much worse using just the head of each list as it's pivot, probably since with very long alphabetic matches lists will end up composed of smaller subsections of the alphabet, easily creating a lopsided division with a totally random pivot value. Its open ended "worst case" means it's prone to certain weaknesses, such as when a secondary criteria is added to resolve ties. This becomes increasingly obvious as the list size increases, as all these differences do (at ten items, a bubblesort only does about twice as many iterations as a mergesort, at a hundred items it does ten times as many):

Code:
/* set primary criteria to "3" (ID number, of which there are only ten possible) and 
secondary to first name (of which there are 59 possible).  So there will be lots and 
lots of ties to "resolve".  Other cases in this demo use a single criteria (the last name)
of which there are 80638 unique possibilities: */

[root~/C] ./a.out -1 3 -2 1 
Using "", 0 records
Primary Criteria offset=128
Secondary Criteria offset=0

Command (h for help, Q to quit): N
How many records? 100000
New list is 14062.50 kb

Command (h for help, Q to quit): m

Mergesorting..............99999 function calls
2315025 Iterations	 Comparison calls: 2748000
Elapsed time: 0 min 0 sec

Command (h for help, Q to quit): S
Shuffled.

Command (h for help, Q to quit): q

Quicksorting..............99999 function calls
20542647 Iterations	 Comparison calls: 20123133
Elapsed time: 0 min 2 sec

Command (h for help, Q to quit): N
How many records? 1000000
New list is 140625.00 kb

Command (h for help, Q to quit): m

Mergesorting..............999999 function calls
27884993 Iterations	 Comparison calls: 34081612
Elapsed time: 0 min 6 sec

Command (h for help, Q to quit): S
Shuffled.

Command (h for help, Q to quit): q

Quicksorting..............999999 function calls
1836821111 Iterations	 Comparison calls: 1832407830
Elapsed time: 4 min 4 sec 
Quicksort suffers this way with ties for the same reason it suffers sorting an already ordered list, which is it's worst case scenario. Because mergesort does not spend time putting a list in order after it's been split off, and instead does it when merging them together, it will benefit if the original list was already in some kind of order, whereas this will only hurt quicksort depending on how the pivot value is choosen. To establish a mean or median value, we must iterate through the list before splitting it -- although in discussions of quicksort I have seen, this process apparently "does not count" as part of it's big O notation. Additional comparisons at each of these iterations would be required as well. Taking the pivot from an already ordered tied list will usually end up on the low side, creating an imbalanced list. Putting pivot ties on the right (opposite the pivot) may improve performance slightly.

Notice the number of recursions is still the same -- an imbalanced list does not increase the number of recursions, since the short half simply goes through fewer. But the number of iterations and comparisons with quicksort can change radically:
Code:
Command (h for help, Q to quit): N
How many records? 100
New list is 14.06 kb

Command (h for help, Q to quit): q

Quicksorting..............99 function calls
1256 Iterations	 Comparison calls: 694
Elapsed time: 0 min 0 sec

Command (h for help, Q to quit): q

Quicksorting..............99 function calls
9800 Iterations	 Comparison calls: 4950
Elapsed time: 0 min 0 sec
This is the "worst case" (n^2) on an ordered list.

Now, apparently merge sort will perform the same in either case. This is true if we just measure iterations, but if we take more reality into account, the number of comparisons can, will, and must change. This is because when two split lists come back to be merged, if one list already contains all smaller values, all of the values on the left will be compared one at a time to the first element of the right, and then (since the returned lists have an internal order) no further comparisons need be done and the right is simply iterated onto the end. That is the normative and perhaps only way to implement a merge sort:
Code:
             /* the end (merge) of the sort function */    
        while ((left) && (right) && (lenL>LC) && (lenR>RC)) {
                Iterations++;
                result=compfunc(offset,left,right);
                if ((result==0) && (altfunc)) result=altfunc(altset,left,right);  
                if (result==2) { ptr->next=right; ptr=right; right=right->next; RC++; }
                else { ptr->next=left; ptr=left; left=left->next; LC++; }
        }       /* if one list is longer or contains more high values, leaving a final remainder */
        while ((left) && (lenL>LC)) { ptr->next=left; ptr=left; left=left->next; LC++; Iterations++; }
        while ((right) && (lenR>RC)) { ptr->next=right; ptr=right; right=right->next; RC++; Iterations++; }
        ptr->next=NULL; /* the new tail */
The second last two lines represent these presorted/tied (hence already unequal) cases -- which in quicksort would be a handicapping imbalance. If the values are well mixed, these two lines are irrelevant and a comparison is made for every iteration. Observation bears this out:
Code:
Command (h for help, Q to quit): N
How many records? 100
New list is 14.06 kb

Command (h for help, Q to quit): m

Mergesorting..............99 function calls
817 Iterations	 Comparison calls: 545
Elapsed time: 0 min 0 sec


Command (h for help, Q to quit): m

Mergesorting..............99 function calls
817 Iterations	 Comparison calls: 316
Elapsed time: 0 min 0 sec
As predicted, the number of function calls and iterations (roughly n*log(n)) remains constant, but in the case of the ordered list, the number of comparisons is almost cut in half. This factor is completely hidden when considering big O notation.

To give quicksort one last chance, I decided to add a function which could select a better pivot, despite my misgivings about the additional iterations and comparisons required. So now, the quicksort compares the head with an element about half way thru, and if these are identical, goes thru the whole list until it finds one that isn't. Then we take a mean of the two different values. This helps to ensure that that we will actually have two lists.

This clearly helped to divert quicksort (which it should really be contrasted with mergesort as "splitsort", but anyway...) away from it's worst case, the already sorted list:
Code:
 
Command (h for help, Q to quit): N
How many records? 100
New list is 14.06 kb

Command (h for help, Q to quit): q

Quicksorting..............99 function calls
1443 Iterations	 Comparison calls: 882
Elapsed time: 0 min 0 sec


Command (h for help, Q to quit): q

Quicksorting..............99 function calls
1127 Iterations	 Comparison calls: 754
Elapsed time: 0 min 0 sec
In the second instance the list is already sorted. Contrast this with the cruder quicksort, above, which made the n*n on an already sorted list. While it still hasn't quite caught up to the mergesort, it very nearly has -- the time for the much tied sort of one million records went from over 4 minutes to 7 seconds.
Then I modified the newlist() function to assign random ID's just from rand(), rather than a fixed set of 10. That means billions of possibilities on a 64-bit so I could now compare the sorts on scrambled sets of millions of unique numbers:
Code:
[root~/C] ./a.out -1 3 
Using "", 0 records
Primary Criteria offset=128

Command (h for help, Q to quit): N
How many records? 4000000
New list is 562500.00 kb

Command (h for help, Q to quit): m

Mergesorting..............3999999 function calls
123539969 Iterations	 Comparison calls: 82696100
Elapsed time: 0 min 9 sec


Command (h for help, Q to quit): S
Shuffled.

Command (h for help, Q to quit): q

Quicksorting..............4000000 function calls
190179315 Iterations	 Comparison calls: 100817020
Elapsed time: 0 min 23 sec
There is still obviously room to optimize the selection of the pivot value further, particularly if we were only dealing with unique numbers, since then taking a true mean of a sublist would be more feasible. But if quicksort and mergesort still both have a best case of n*log(n) (88 million here) anyway, I think I will just stick with merge sort.

A final interesting point: both the quicksort and the insertion sort demonstrated the variation inherent in their design; with insertion sort this variation is toward a "best case" (improving on n^2), whereas with quicksort this is toward a "worst case" (away from n*log(n)). A quick sort cannot really do better than a merge sort although the implementation of a quicksort may be easier to tweak closer to the best case, but in any case would be better known as the "split sort" or "quack sort".

I put up a nice syntax-colored version of the code which produced this nonsense, in case someone wants to argue some silly point:
MK27's linked list comparative sort demo program!
I believe this is fairly portable. It could almost be used as a crude API, since the sort functions use external comparison functions with offsets to criteria (meaning you could plug any kind of struct into it without changes).
__________________

"A man can't just sit around." -- Larry Walters

Last edited by MK27; 01-26-2009 at 06:30 PM.
MK27 is online now   Reply With Quote
Old 01-26-2009, 07:27 PM   #19
and the Hat of Guessing
 
tabstop's Avatar
 
Join Date: Nov 2007
Posts: 8,854
Bubblesort is n-1 iterations, each of which makes 1, 2, 3, ..., n comparisons. So the comparison count is n^2, not the iteration count. (Again, big O does not concern itself with constants or non-important terms. 4n^2 is O(n^2), n^2+n is O(n^2), etc.)
tabstop is online now   Reply With Quote
Old 01-26-2009, 08:12 PM   #20
critical genius
 
MK27's Avatar
 
Join Date: Jul 2008
Location: SE Queens
Posts: 5,212
Quote:
Originally Posted by tabstop View Post
Bubblesort is n-1 iterations, each of which makes 1, 2, 3, ..., n comparisons. So the comparison count is n^2, not the iteration count. (Again, big O does not concern itself with constants or non-important terms. 4n^2 is O(n^2), n^2+n is O(n^2), etc.)
With the bubblesort the ratio of iterations to comparisons is exact (in the demo, comparisons is invariably within 0.5% of iterations). Big O count cannot be based on comparison count and must be based on iteration count, as I pointed out with reference to merge sort, or else merge sort cannot have a constant best/average/worst rating.
__________________

"A man can't just sit around." -- Larry Walters
MK27 is online now   Reply With Quote
Old 01-26-2009, 09:00 PM   #21
Registered User
 
Join Date: Sep 2006
Posts: 3,152
I'm not sure why you belabored yourself with testing the Quicksort algorithm, for linked lists, since it is not designed for any data that must be retrieved (relatively), slowly.

Whether it's a linked list, a tape drive, or data that must be accessed over a (relatively slow) network, Quicksort is simply not the algorithm of choice.

Mergesort is designed for that purpose.

Put your data into an array or static disk drive, give Quicksort some smart tweaks (and watch out because it is "fragile"), and *then* you can watch Quicksort *fly*.
Adak is offline   Reply With Quote
Old 01-26-2009, 09:25 PM   #22
critical genius
 
MK27's Avatar
 
Join Date: Jul 2008
Location: SE Queens
Posts: 5,212
Quote:
Originally Posted by Adak View Post
I'm not sure why you belabored yourself with testing the Quicksort algorithm, for linked lists, since it is not designed for any data that must be retrieved (relatively), slowly.

Whether it's a linked list, a tape drive, or data that must be accessed over a (relatively slow) network, Quicksort is simply not the algorithm of choice.
In this case that abhorrent snail, the RAM.

Quote:
Originally Posted by Adak View Post
Mergesort is designed for that purpose.

Put your data into an array or static disk drive, give Quicksort some smart tweaks (and watch out because it is "fragile"), and *then* you can watch Quicksort *fly*.
Don't be silly Adak. The weaknesses I concerned myself with vis. "quicksort" are inherit to the algorithm and independent of the implementation.
__________________

"A man can't just sit around." -- Larry Walters
MK27 is online now   Reply With Quote
Old 01-26-2009, 11:29 PM   #23
C++ Witch
 
laserlight's Avatar
 
Join Date: Oct 2003
Location: Singapore
Posts: 11,358
Quote:
Originally Posted by MK27
To establish a mean or median value, we must iterate through the list before splitting it -- although in discussions of quicksort I have seen, this process apparently "does not count" as part of it's big O notation.
The reason why it "does not count" is that it does not change the complexity in terms of big O notation. In order to split the list into two partitions an iteration must be made over the entire list. Making another iteration over part of the list, or the entire list, just increases the constant factor, so the complexity remains the same.

Quote:
Originally Posted by MK27
To give quicksort one last chance, I decided to add a function which could select a better pivot, despite my misgivings about the additional iterations and comparisons required. So now, the quicksort compares the head with an element about half way thru, and if these are identical, goes thru the whole list until it finds one that isn't. Then we take a mean of the two different values. This helps to ensure that that we will actually have two lists.
I would echo Adak's complaint here: you have selected a data structure on which quick sort is not commonly used. For example, are you aware of the "median of three" method for selecting a pivot for quicksort? Typically, this method involves selecting as the pivot the median of the first, middle and last elements. It should still be possible to construct pathological worst case input, but harder and rarer in practice. Another possibility is random pivot selection, but that means using a RNG or PRNG. However, both these methods require random access to be efficient, but linked lists do not provide random access.

Quote:
Originally Posted by MK27
In this case that abhorrent snail, the RAM.
What do you mean? If you have forgotten, RAM stands for Random Access Memory, and that describes an array, not a linked list. Relative to disk, RAM is fast.

Quote:
Originally Posted by MK27
The weaknesses I concerned myself with vis. "quicksort" are inherit to the algorithm and independent of the implementation.
The choice of data structure can affect the complexity and performance of the algorithm. A commonly cited case is where selection sort is used on a heap: the result is otherwise known as heap sort, which has an average and worst case complexity of O(n log n).
__________________
C + C++ Compiler: MinGW port of GCC
Build + Version Control System: SCons + Bazaar

Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
laserlight is online now   Reply With Quote
Old 01-27-2009, 12:59 AM   #24
Algorithm Dissector
 
iMalc's Avatar
 
Join Date: Dec 2005
Location: New Zealand
Posts: 2,742
Quote:
Originally Posted by MK27 View Post
(The "shuffle" is a Fisher-Yates algorithm.)
Fisher-Yates huh? I had to lok that one up. It turns out that I'm using Durstenfeld's version of the algorithm, which is the fast, modern way of implementing that algorithm. But as I've tried to show previously with other stuff I've posted about, I wasn't told about this algorithm, or been asked to implement it. I simply needed to shuffle an array and I wrote code that would do it in such a way that to me would obviously produce no bias.

Quote:
2^21 is 2097152, so we would expect 43830192 here as the the big O (n*log(n)) best/average score here. Initially, I could not write a quick sort faster than a merge sort and am not sure why anyone would think it should be faster, except that it does hold out some possibilities for optimization with the pivot value.
As others say, Big Oh notation isn't exact operation counts. The implementation could do anything from 1000*n*log(n) operations to 0.001*n*log(n) operations or even outside that range. For example, using an SIMD instruction set. you could perform multiple compares or swaps in one operation.

Quote:
Quicksort suffers this way with ties for the same reason it suffers sorting an already ordered list, which is it's worst case scenario. Because mergesort does not spend time putting a list in order after it's been split off, and instead does it when merging them together, it will benefit if the original list was already in some kind of order, whereas this will only hurt quicksort depending on how the pivot value is choosen. To establish a mean or median value, we must iterate through the list before splitting it -- although in discussions of quicksort I have seen, this process apparently "does not count" as part of it's big O notation. Additional comparisons at each of these iterations would be required as well. Taking the pivot from an already ordered tied list will usually end up on the low side, creating an imbalanced list. Putting pivot ties on the right (opposite the pivot) may improve performance slightly.
See my comments below.

Quote:
Now, apparently merge sort will perform the same in either case. This is true if we just measure iterations, but if we take more reality into account, the number of comparisons can, will, and must change. This is because when two split lists come back to be merged, if one list already contains all smaller values, all of the values on the left will be compared one at a time to the first element of the right, and then (since the returned lists have an internal order) no further comparisons need be done and the right is simply iterated onto the end. That is the normative and perhaps only way to implement a merge sort:
You may notice that the number of comparisons is bounded by a factor of two. The case where all items in one list are less than the other gives say k cmparisons where both lists are of length k. Incidentally the concatenation of the two lists can be done in constant time. Then the worse case is that 2k comparisons are required. This can be taken form the fact that each comparison reduces the number of items by 1, there are two lists of length k, items never move between the two lists being merged, and only the head items are compared each time. So Merge Sort performs between n*log(n) and 2*n*log(n) comparisons.

Quote:
To give quicksort one last chance, I decided to add a function which could select a better pivot, despite my misgivings about the additional iterations and comparisons required. So now, the quicksort compares the head with an element about half way thru, and if these are identical, goes thru the whole list until it finds one that isn't. Then we take a mean of the two different values. This helps to ensure that that we will actually have two lists.
Taking the mean isn't an option for every data type, but something like that has been done before, for integers. Someone has previously called this Artificial Sort. My prior work to this only requires that two items be subtractable, not averagable.

Quote:
In the second instance the list is already sorted. Contrast this with the cruder quicksort, above, which made the n*n on an already sorted list. While it still hasn't quite caught up to the mergesort, it very nearly has -- the time for the much tied sort of one million records went from over 4 minutes to 7 seconds.
My own findings with sorting on linked lists also agree that quicksort for linked lists doesn't win out until the list length gets really really long.

Quote:
There is still obviously room to optimize the selection of the pivot value further, particularly if we were only dealing with unique numbers, since then taking a true mean of a sublist would be more feasible. But if quicksort and mergesort still both have a best case of n*log(n) (88 million here) anyway, I think I will just stick with merge sort.
Let me introduce you to the smart splitter choosing techinque of my SuperQuickSort for linked-lists. The already sorted and reverse sorted cases become O(n). The trick is that the work done in hunting for a splitter isn't simply additional work that reduces the avarage amount of work required later by picking a beter splitter. Instead it actually replaces a portion of the splitting process (potentially the entire splitting process, if the list was already sorted). Have a look at it on my website. It's one of the things that I've actually gone into much more detail with. (Link in sig)

Quote:
A final interesting point: both the quicksort and the insertion sort demonstrated the variation inherent in their design; with insertion sort this variation is toward a "best case" (improving on n^2), whereas with quicksort this is toward a "worst case" (away from n*log(n)). A quick sort cannot really do better than a merge sort although the implementation of a quicksort may be easier to tweak closer to the best case, but in any case would be better known as the "split sort" or "quack sort".
I agree that the standard quicksort probably isn't worthwhile for lists.

Incidentally I've recently become of the opinion that Splay-Tree-Sort is in many ways the ultimate doubly-linked-list sorting algorithm.
It's just so amazingly adaptive, and the list links can be reused as tree child pointer links during the sorting. Any initial sort order you can throw at it, other than random order, gives astoundingly low comparison counts. If this interests you, I highly recommend trying it out.

Quote:
I put up a nice syntax-colored version of the code which produced this nonsense, in case someone wants to argue some silly point:
MK27's linked list comparative sort demo program!
I believe this is fairly portable. It could almost be used as a crude API, since the sort functions use external comparison functions with offsets to criteria (meaning you could plug any kind of struct into it without changes).
Nice!
Mine is implemented in C++ rather than C and this means I don't have to make any modifications whatsoever to the algorithms to animate then and count comparisons etc. The same code you would use in a real program is the exact code I use in my demo app. I recommend you download it and check it out as I'm sure you'll enjoy it. It's linked at the bottom of many of my pages.

Last edited by iMalc; 01-27-2009 at 01:08 AM.
iMalc is offline   Reply With Quote
Old 08-07-2009, 11:48 AM   #25
Banned
 
Join Date: Aug 2009
Posts: 43
you just pick on pointer store its value
and search in the rest of the list for the minimal
value
and swap the values
then go next to the next node
kakaroto is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
seg fault for linked list in another linked list--help!! idshiy C Programming 5 11-04-2006 06:39 PM
Reverse function for linked list Brigs76 C++ Programming 1 10-25-2006 10:01 AM
sort linked list using BST Micko C Programming 8 10-04-2004 02:04 PM
Sort Linked List Algorithm. xddxogm3 C++ Programming 7 10-13-2003 11:05 PM
Template Class for Linked List pecymanski C++ Programming 2 12-04-2001 09:07 PM


All times are GMT -6. The time now is 10:56 AM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22