Thread: How to sort a simple linked list using swap-sort method

  1. #16
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by MK27
    I'm actually writing (another) program that performs four or five different styles of searches, counts comparisons and list iterations, shuffles, generates random lists, switches criteria, breaks ties with secondary criteria, etc. Real fun, for the whole family! So I expected to see those O(n^2) tables confirmed
    You should see them confirmed. If you don't, then your input sizes are not large enough to matter (since "big O" notation refers to asymptotic growth, not actual runtime performance in all cases) and/or you used pathological worst case input for the faster algorithms (e.g., certain canonical variants of quick sort degrade to O(n^2) complexity when sorting an array already sorted in reverse), and/or best case input for the slower algorithms.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  2. #17
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by MK27 View Post
    No, actually I've been trying to take you back from the dark side. Of course, I never really believed what I was saying, I just thought I did.
    Well I've certainly learned that you are a rather interesting character.

    Quote Originally Posted by MK27 View Post
    I've also formed a sort of emotional defensiveness about bubblesort since the first time I had to sort anything, that's what I intuitively wrote. laserlight later identified it as such, scornfully I think -- and now tabstop says I've been using some kind of cheat flag!
    My first sorting algorithm wasn't bubblesort. It was pretty similiar except it didn't swap only adjacent items. Coming up with bubblesort independently is nothing to be sneezed at. If one were to immediately think it were the best method there is, that might be well, less-productive thinking.

    Quote Originally Posted by MK27 View Post
    So when I heard somebody else in history had come up with a better method(s), I felt determined to master them. Not to be outdone as a masochist either, I'm actually writing (another) program that performs four or five different styles of searches, counts comparisons and list iterations, shuffles, generates random lists, switches criteria, breaks ties with secondary criteria, etc. Real fun, for the whole family! So I expected to see those O(n^2) tables confirmed, and thankfully tabstop cleared this flag issue up or my suspicion that many computer science gurus are more determined to obfuscate than explain would have reached critical proportions.
    Code as much as you can, I have no problem with that. You've simply shown me that it's just harder than I realised to explain something to someone who you don't realise doesn't know common terms etc.

    Quote Originally Posted by MK27 View Post
    No doubt. I've never taken any math beyond grade ten but I imagine I will have to if I ever want to work as a programmer. It just seems to me that most computer math is really not complicated, but it ends up couched in "confusing to the layman" style because the assumption (probably correct in most cases) is that the student of computers was already a student of math. What's further implied is that you have to be because it's so useful, and that's part of what I doubt. So anyway, instead of one esoteric, obscurantistic swamp to wade in, there is two! I should probably be happy. But seriously (I hope someone is listening!), much of the information could be explained much better, rather than just sticking to that old story. A lot of web tutorials read like photocopies of one another, which might be okay if the original had more merit. The more such "methods of explanation" get repeated, the more they acquire a weight potentially divorced from their real value. That is probably the heart of my criticism, iMalc. Honestly. About guarding the establishment. It would be nice if we had more "look, I did this/had to do this, and I'm not saying I regret it myself, but for the future maybe we really don't need to use so much latin where english will do".
    I never took classes because I felt I had to, to be a programmer. As far as I was concerned, I already was a programmer. I did so because they were interesting to me. I know a number of programmers with quite varying levels of capability or qualification in various areas, but I believe the main thing you need to be successful is determination and a passion for knowledge, which probably mostly comes down to being one of a certain group of personality types.
    You'll find that it isn't really possible to learn such a huge amount about a various subject without ending up using the same terms commonly used, even though you're at the same time putting it in your own words.

    Quote Originally Posted by MK27 View Post
    Is that a gauntlet on the floor?!
    I'm sorry, I'm not so good with metaphors or quotes sometimes.

    Quote Originally Posted by MK27 View Post
    Just kidding. By "insert/remove in constant time" I presume you mean the difference between a linked list and an array, where the entire array must be re-numbered. "Random access" I'll have to google...ugh.
    Yeah that's it. Sorry, I'm not used to someone not knowing the term. You'll know what it is, you just aren't familiar with the term for it.

    Quote Originally Posted by MK27 View Post
    Well, one possible relevance is that in circumstances where a list is mostly sorted but slightly changed, the bubblesort may be the best choice.
    Yeah bubblesort definitely has it's uses where counts are small etc.

    Quote Originally Posted by MK27 View Post
    This one is definitely an armoured glove.
    Again, I'm not familiar with the metaphor. Nevermind, this time it's my turn to look something up!

    Quote Originally Posted by MK27 View Post
    Okay, but if you are worried that you potentially sound like someone who is (are you worried, iMalc?) but really you are not, why not test yourself further by seeing if you can come up with a new/different/better method of explanation rather than resorting to the one everyone's heard already? At least then there is the chance of some original thought occurring.
    You know I'm not really worried. I'm guilty of blabbering today though. Hey, I'm on holiday and it was raining most of the day outside. I had to do something . It's kind of hard to prove you came up with something independently of someone else when the facts lead you both down the same path though. I guess I should be flattered that any explanations I've given sound like something someone else once wrote. I didn't think I was that good at explaining stuff.

    Quote Originally Posted by MK27 View Post
    ...said the spider to the fly. What if The Emperor Wears No Clothes?
    If I cared enough you'd certainly be keeping me busy here. Seriously, I don't read fiction - ever.

    But I digress. Can we be of any more assistance with the original topic?
    Last edited by iMalc; 01-20-2009 at 03:16 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  3. #18
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by laserlight
    You should see them confirmed. If you don't, then your input sizes are not large enough to matter (since "big O" notation refers to asymptotic growth, not actual runtime performance in all cases) and/or you used pathological worst case input for the faster algorithms (e.g., certain canonical variants of quick sort degrade to O(n^2) complexity when sorting an array already sorted in reverse), and/or best case input for the slower algorithms.
    For the most part, yes. I think I've written the coolest linked-list comparative sort demo program ever! It does everything I mentioned earlier plus you can use two text files (lists of a single word on each line) to create random lists for the node struct (a first name, a last name, and an assigned "number").
    Code:
     typedef struct _node {          /* singly linked list */
             char first[64];
             char second[64];
             int ID;
             struct _node *next; 
     }node;
    You can also choose which criteria to use in the sort and a secondary criteria used to break ties. So using a short dictionary*, you can create lists of hundreds of thousands of nodes with very few possible ties, or using secondary criteria millions. However, you probably don't want to use the slower sorts there -- on my dual core pentium it took about about half an hour to bubblesort 65500 records (over 5 billion iterations). The mergesort will do 1,000,000 records in 6 seconds (a little less than 20 million iterations).
    *there are lots of these kinds of (usually alphabetized) text lists of names and/or words around.

    But the real purpose of my post now is to shed some further light on the significance of the big O notation and the issues it conceals.

    Of course, deciding how to count the iterations was not as clear cut for all the sort styles. But here's some sample output:

    Code:
    [root~/C] ./a.out -h
    Usage: ./a.out [-h] [-f -l] [-1] [-2] [-d]
    	-d either 1 or 2: copious output to stderr
    	-h print this message
    	-f filename
    	-l number of records in list, needed with -f
    	-1 primary sort criteria (1=first name, 2=last name[default], 3=number)
    	-2 secondary sort criteria (0=none[default], 1=first name, etc.)
    
    Eg. "./a.out -f data.txt -l 27 -1 2 -2 1"
    Records should have the structure "Some Name:0101".
    
    [root~/C] ./a.out
    Using "", 0 records
    Primary Criteria offset=64
    
    Command (h for help, Q to quit): h
    
    d display list
    h show this message
    b (modified) bubblesort
    i insertion sort
    m mergesort
    q quicksort
    s selection sort
    S shuffle
    N new random list
    R reload from file
    W write to file
    
    Command (h for help, Q to quit): N
    How many records? 10 
    New list is 1.41 kb
    
    Command (h for help, Q to quit): d
    0x1a90010 nonglare,James (1)
    0x1a900b0 stroys,James (420)
    0x1a90150 enslave,Jim (1024)
    0x1a901f0 mirex,Tammy (666)
    0x1a90290 peartly,Lorenzo (420)
    0x1a90330 soymilks,Bob (1024)
    0x1a903d0 weeks,Robert (0)
    0x1a90470 addrest,Tammy (69)
    0x1a90510 starnose,Leslie (27)
    0x1a905b0 acerous,Paul (1024)
    
    Command (h for help, Q to quit): N
    How many records? 10000
    New list is 1406.25 kb
    
    Command (h for help, Q to quit): b
    
    Bubblesorting.......
    97760000 Iterations	 Comparison calls: 97750224
    Elapsed time: 0 min 31 sec
    
    Command (h for help, Q to quit): S
    Shuffled.
    
    Command (h for help, Q to quit): s
    
    Selectionsorting....
    99980002 Iterations	 Comparison calls: 49995000
    Elapsed time: 0 min 19 sec
    
    Command (h for help, Q to quit): S
    Shuffled.
    
    Command (h for help, Q to quit): i
    
    Insertionsorting.......
    24883640 Iterations	 Comparison calls: 24883640
    Elapsed time: 0 min 6 sec
    (The "shuffle" is a Fisher-Yates algorithm.)
    So the bubble sort and the selection sort do about what is expected vis. "iterations", (10,000 squared), the insertion sort comes out quite a bit better. Iterations/comparisons remain at a constant ratio for the selection sort, whereas because the bubblesort is "modified" (using a flag as per my discussion with tabstop above), the bubblesort and the insertion sort have a best case of n or n-1 for both iterations and comparisons.

    The default is a string comparison, which is much more processor intensive than a numerical comparison (eg, selection sort will do 10,000 records by ID number in 8 seconds). The difference between the number of comparisons is what accounts for the speed difference above; while both bubble and selection sort did almost n^2 iterations, selection sort did half as many comparisons. It should be noted that while the iteration count may be subject to criticism, the comparison count is absolute as it is incremented every time a discrete comparison function is called. This discrepancy (that an iteration need not be a comparison) is not referred to in many discussions of sorting efficiency and is not accounted for with big O.

    A third measure is available with the number of recursive function calls in merge sort and quick sort. Because the sorts are similarly recursive, these counts are always the same:
    Code:
    Command (h for help, Q to quit): N
    How many records? 2097152
    New list is 294912.00 kb
    Command (h for help, Q to quit): m
    
    Mergesorting..............2097151 function calls
    61865985 Iterations	 Comparison calls: 41388898
    Elapsed time: 0 min 15 sec
    
    
    Command (h for help, Q to quit): S
    Shuffled.
    
    Command (h for help, Q to quit): q
    
    Quicksorting..............2097151 function calls
    148857828 Iterations	 Comparison calls: 75517673
    Elapsed time: 0 min 32 sec
    2^21 is 2097152, so we would expect 43830192 here as the the big O (n*log(n)) best/average score here. Initially, I could not write a quick sort faster than a merge sort and am not sure why anyone would think it should be faster, except that it does hold out some possibilities for optimization with the pivot value. I couldn't be bothered to go further than selecting the lowest of the two first values as the pivot. It does much, much worse using just the head of each list as it's pivot, probably since with very long alphabetic matches lists will end up composed of smaller subsections of the alphabet, easily creating a lopsided division with a totally random pivot value. Its open ended "worst case" means it's prone to certain weaknesses, such as when a secondary criteria is added to resolve ties. This becomes increasingly obvious as the list size increases, as all these differences do (at ten items, a bubblesort only does about twice as many iterations as a mergesort, at a hundred items it does ten times as many):

    Code:
    /* set primary criteria to "3" (ID number, of which there are only ten possible) and 
    secondary to first name (of which there are 59 possible).  So there will be lots and 
    lots of ties to "resolve".  Other cases in this demo use a single criteria (the last name)
    of which there are 80638 unique possibilities: */
    
    [root~/C] ./a.out -1 3 -2 1 
    Using "", 0 records
    Primary Criteria offset=128
    Secondary Criteria offset=0
    
    Command (h for help, Q to quit): N
    How many records? 100000
    New list is 14062.50 kb
    
    Command (h for help, Q to quit): m
    
    Mergesorting..............99999 function calls
    2315025 Iterations	 Comparison calls: 2748000
    Elapsed time: 0 min 0 sec
    
    Command (h for help, Q to quit): S
    Shuffled.
    
    Command (h for help, Q to quit): q
    
    Quicksorting..............99999 function calls
    20542647 Iterations	 Comparison calls: 20123133
    Elapsed time: 0 min 2 sec
    
    Command (h for help, Q to quit): N
    How many records? 1000000
    New list is 140625.00 kb
    
    Command (h for help, Q to quit): m
    
    Mergesorting..............999999 function calls
    27884993 Iterations	 Comparison calls: 34081612
    Elapsed time: 0 min 6 sec
    
    Command (h for help, Q to quit): S
    Shuffled.
    
    Command (h for help, Q to quit): q
    
    Quicksorting..............999999 function calls
    1836821111 Iterations	 Comparison calls: 1832407830
    Elapsed time: 4 min 4 sec
    Quicksort suffers this way with ties for the same reason it suffers sorting an already ordered list, which is it's worst case scenario. Because mergesort does not spend time putting a list in order after it's been split off, and instead does it when merging them together, it will benefit if the original list was already in some kind of order, whereas this will only hurt quicksort depending on how the pivot value is choosen. To establish a mean or median value, we must iterate through the list before splitting it -- although in discussions of quicksort I have seen, this process apparently "does not count" as part of it's big O notation. Additional comparisons at each of these iterations would be required as well. Taking the pivot from an already ordered tied list will usually end up on the low side, creating an imbalanced list. Putting pivot ties on the right (opposite the pivot) may improve performance slightly.

    Notice the number of recursions is still the same -- an imbalanced list does not increase the number of recursions, since the short half simply goes through fewer. But the number of iterations and comparisons with quicksort can change radically:
    Code:
    Command (h for help, Q to quit): N
    How many records? 100
    New list is 14.06 kb
    
    Command (h for help, Q to quit): q
    
    Quicksorting..............99 function calls
    1256 Iterations	 Comparison calls: 694
    Elapsed time: 0 min 0 sec
    
    Command (h for help, Q to quit): q
    
    Quicksorting..............99 function calls
    9800 Iterations	 Comparison calls: 4950
    Elapsed time: 0 min 0 sec
    This is the "worst case" (n^2) on an ordered list.

    Now, apparently merge sort will perform the same in either case. This is true if we just measure iterations, but if we take more reality into account, the number of comparisons can, will, and must change. This is because when two split lists come back to be merged, if one list already contains all smaller values, all of the values on the left will be compared one at a time to the first element of the right, and then (since the returned lists have an internal order) no further comparisons need be done and the right is simply iterated onto the end. That is the normative and perhaps only way to implement a merge sort:
    Code:
                 /* the end (merge) of the sort function */    
            while ((left) && (right) && (lenL>LC) && (lenR>RC)) {
                    Iterations++;
                    result=compfunc(offset,left,right);
                    if ((result==0) && (altfunc)) result=altfunc(altset,left,right);  
                    if (result==2) { ptr->next=right; ptr=right; right=right->next; RC++; }
                    else { ptr->next=left; ptr=left; left=left->next; LC++; }
            }       /* if one list is longer or contains more high values, leaving a final remainder */
            while ((left) && (lenL>LC)) { ptr->next=left; ptr=left; left=left->next; LC++; Iterations++; }
            while ((right) && (lenR>RC)) { ptr->next=right; ptr=right; right=right->next; RC++; Iterations++; }
            ptr->next=NULL; /* the new tail */
    The second last two lines represent these presorted/tied (hence already unequal) cases -- which in quicksort would be a handicapping imbalance. If the values are well mixed, these two lines are irrelevant and a comparison is made for every iteration. Observation bears this out:
    Code:
    Command (h for help, Q to quit): N
    How many records? 100
    New list is 14.06 kb
    
    Command (h for help, Q to quit): m
    
    Mergesorting..............99 function calls
    817 Iterations	 Comparison calls: 545
    Elapsed time: 0 min 0 sec
    
    
    Command (h for help, Q to quit): m
    
    Mergesorting..............99 function calls
    817 Iterations	 Comparison calls: 316
    Elapsed time: 0 min 0 sec
    As predicted, the number of function calls and iterations (roughly n*log(n)) remains constant, but in the case of the ordered list, the number of comparisons is almost cut in half. This factor is completely hidden when considering big O notation.

    To give quicksort one last chance, I decided to add a function which could select a better pivot, despite my misgivings about the additional iterations and comparisons required. So now, the quicksort compares the head with an element about half way thru, and if these are identical, goes thru the whole list until it finds one that isn't. Then we take a mean of the two different values. This helps to ensure that that we will actually have two lists.

    This clearly helped to divert quicksort (which it should really be contrasted with mergesort as "splitsort", but anyway...) away from it's worst case, the already sorted list:
    Code:
     
    Command (h for help, Q to quit): N
    How many records? 100
    New list is 14.06 kb
    
    Command (h for help, Q to quit): q
    
    Quicksorting..............99 function calls
    1443 Iterations	 Comparison calls: 882
    Elapsed time: 0 min 0 sec
    
    
    Command (h for help, Q to quit): q
    
    Quicksorting..............99 function calls
    1127 Iterations	 Comparison calls: 754
    Elapsed time: 0 min 0 sec
    In the second instance the list is already sorted. Contrast this with the cruder quicksort, above, which made the n*n on an already sorted list. While it still hasn't quite caught up to the mergesort, it very nearly has -- the time for the much tied sort of one million records went from over 4 minutes to 7 seconds.
    Then I modified the newlist() function to assign random ID's just from rand(), rather than a fixed set of 10. That means billions of possibilities on a 64-bit so I could now compare the sorts on scrambled sets of millions of unique numbers:
    Code:
    [root~/C] ./a.out -1 3 
    Using "", 0 records
    Primary Criteria offset=128
    
    Command (h for help, Q to quit): N
    How many records? 4000000
    New list is 562500.00 kb
    
    Command (h for help, Q to quit): m
    
    Mergesorting..............3999999 function calls
    123539969 Iterations	 Comparison calls: 82696100
    Elapsed time: 0 min 9 sec
    
    
    Command (h for help, Q to quit): S
    Shuffled.
    
    Command (h for help, Q to quit): q
    
    Quicksorting..............4000000 function calls
    190179315 Iterations	 Comparison calls: 100817020
    Elapsed time: 0 min 23 sec
    There is still obviously room to optimize the selection of the pivot value further, particularly if we were only dealing with unique numbers, since then taking a true mean of a sublist would be more feasible. But if quicksort and mergesort still both have a best case of n*log(n) (88 million here) anyway, I think I will just stick with merge sort.

    A final interesting point: both the quicksort and the insertion sort demonstrated the variation inherent in their design; with insertion sort this variation is toward a "best case" (improving on n^2), whereas with quicksort this is toward a "worst case" (away from n*log(n)). A quick sort cannot really do better than a merge sort although the implementation of a quicksort may be easier to tweak closer to the best case, but in any case would be better known as the "split sort" or "quack sort".

    I put up a nice syntax-colored version of the code which produced this nonsense, in case someone wants to argue some silly point:
    MK27's linked list comparative sort demo program!
    I believe this is fairly portable. It could almost be used as a crude API, since the sort functions use external comparison functions with offsets to criteria (meaning you could plug any kind of struct into it without changes).
    Last edited by MK27; 01-26-2009 at 06:30 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #19
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Bubblesort is n-1 iterations, each of which makes 1, 2, 3, ..., n comparisons. So the comparison count is n^2, not the iteration count. (Again, big O does not concern itself with constants or non-important terms. 4n^2 is O(n^2), n^2+n is O(n^2), etc.)

  5. #20
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by tabstop View Post
    Bubblesort is n-1 iterations, each of which makes 1, 2, 3, ..., n comparisons. So the comparison count is n^2, not the iteration count. (Again, big O does not concern itself with constants or non-important terms. 4n^2 is O(n^2), n^2+n is O(n^2), etc.)
    With the bubblesort the ratio of iterations to comparisons is exact (in the demo, comparisons is invariably within 0.5% of iterations). Big O count cannot be based on comparison count and must be based on iteration count, as I pointed out with reference to merge sort, or else merge sort cannot have a constant best/average/worst rating.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  6. #21
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I'm not sure why you belabored yourself with testing the Quicksort algorithm, for linked lists, since it is not designed for any data that must be retrieved (relatively), slowly.

    Whether it's a linked list, a tape drive, or data that must be accessed over a (relatively slow) network, Quicksort is simply not the algorithm of choice.

    Mergesort is designed for that purpose.

    Put your data into an array or static disk drive, give Quicksort some smart tweaks (and watch out because it is "fragile"), and *then* you can watch Quicksort *fly*.

  7. #22
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Adak View Post
    I'm not sure why you belabored yourself with testing the Quicksort algorithm, for linked lists, since it is not designed for any data that must be retrieved (relatively), slowly.

    Whether it's a linked list, a tape drive, or data that must be accessed over a (relatively slow) network, Quicksort is simply not the algorithm of choice.
    In this case that abhorrent snail, the RAM.

    Quote Originally Posted by Adak View Post
    Mergesort is designed for that purpose.

    Put your data into an array or static disk drive, give Quicksort some smart tweaks (and watch out because it is "fragile"), and *then* you can watch Quicksort *fly*.
    Don't be silly Adak. The weaknesses I concerned myself with vis. "quicksort" are inherit to the algorithm and independent of the implementation.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #23
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by MK27
    To establish a mean or median value, we must iterate through the list before splitting it -- although in discussions of quicksort I have seen, this process apparently "does not count" as part of it's big O notation.
    The reason why it "does not count" is that it does not change the complexity in terms of big O notation. In order to split the list into two partitions an iteration must be made over the entire list. Making another iteration over part of the list, or the entire list, just increases the constant factor, so the complexity remains the same.

    Quote Originally Posted by MK27
    To give quicksort one last chance, I decided to add a function which could select a better pivot, despite my misgivings about the additional iterations and comparisons required. So now, the quicksort compares the head with an element about half way thru, and if these are identical, goes thru the whole list until it finds one that isn't. Then we take a mean of the two different values. This helps to ensure that that we will actually have two lists.
    I would echo Adak's complaint here: you have selected a data structure on which quick sort is not commonly used. For example, are you aware of the "median of three" method for selecting a pivot for quicksort? Typically, this method involves selecting as the pivot the median of the first, middle and last elements. It should still be possible to construct pathological worst case input, but harder and rarer in practice. Another possibility is random pivot selection, but that means using a RNG or PRNG. However, both these methods require random access to be efficient, but linked lists do not provide random access.

    Quote Originally Posted by MK27
    In this case that abhorrent snail, the RAM.
    What do you mean? If you have forgotten, RAM stands for Random Access Memory, and that describes an array, not a linked list. Relative to disk, RAM is fast.

    Quote Originally Posted by MK27
    The weaknesses I concerned myself with vis. "quicksort" are inherit to the algorithm and independent of the implementation.
    The choice of data structure can affect the complexity and performance of the algorithm. A commonly cited case is where selection sort is used on a heap: the result is otherwise known as heap sort, which has an average and worst case complexity of O(n log n).
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  9. #24
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by MK27 View Post
    (The "shuffle" is a Fisher-Yates algorithm.)
    Fisher-Yates huh? I had to lok that one up. It turns out that I'm using Durstenfeld's version of the algorithm, which is the fast, modern way of implementing that algorithm. But as I've tried to show previously with other stuff I've posted about, I wasn't told about this algorithm, or been asked to implement it. I simply needed to shuffle an array and I wrote code that would do it in such a way that to me would obviously produce no bias.

    2^21 is 2097152, so we would expect 43830192 here as the the big O (n*log(n)) best/average score here. Initially, I could not write a quick sort faster than a merge sort and am not sure why anyone would think it should be faster, except that it does hold out some possibilities for optimization with the pivot value.
    As others say, Big Oh notation isn't exact operation counts. The implementation could do anything from 1000*n*log(n) operations to 0.001*n*log(n) operations or even outside that range. For example, using an SIMD instruction set. you could perform multiple compares or swaps in one operation.

    Quicksort suffers this way with ties for the same reason it suffers sorting an already ordered list, which is it's worst case scenario. Because mergesort does not spend time putting a list in order after it's been split off, and instead does it when merging them together, it will benefit if the original list was already in some kind of order, whereas this will only hurt quicksort depending on how the pivot value is choosen. To establish a mean or median value, we must iterate through the list before splitting it -- although in discussions of quicksort I have seen, this process apparently "does not count" as part of it's big O notation. Additional comparisons at each of these iterations would be required as well. Taking the pivot from an already ordered tied list will usually end up on the low side, creating an imbalanced list. Putting pivot ties on the right (opposite the pivot) may improve performance slightly.
    See my comments below.

    Now, apparently merge sort will perform the same in either case. This is true if we just measure iterations, but if we take more reality into account, the number of comparisons can, will, and must change. This is because when two split lists come back to be merged, if one list already contains all smaller values, all of the values on the left will be compared one at a time to the first element of the right, and then (since the returned lists have an internal order) no further comparisons need be done and the right is simply iterated onto the end. That is the normative and perhaps only way to implement a merge sort:
    You may notice that the number of comparisons is bounded by a factor of two. The case where all items in one list are less than the other gives say k cmparisons where both lists are of length k. Incidentally the concatenation of the two lists can be done in constant time. Then the worse case is that 2k comparisons are required. This can be taken form the fact that each comparison reduces the number of items by 1, there are two lists of length k, items never move between the two lists being merged, and only the head items are compared each time. So Merge Sort performs between n*log(n) and 2*n*log(n) comparisons.

    To give quicksort one last chance, I decided to add a function which could select a better pivot, despite my misgivings about the additional iterations and comparisons required. So now, the quicksort compares the head with an element about half way thru, and if these are identical, goes thru the whole list until it finds one that isn't. Then we take a mean of the two different values. This helps to ensure that that we will actually have two lists.
    Taking the mean isn't an option for every data type, but something like that has been done before, for integers. Someone has previously called this Artificial Sort. My prior work to this only requires that two items be subtractable, not averagable.

    In the second instance the list is already sorted. Contrast this with the cruder quicksort, above, which made the n*n on an already sorted list. While it still hasn't quite caught up to the mergesort, it very nearly has -- the time for the much tied sort of one million records went from over 4 minutes to 7 seconds.
    My own findings with sorting on linked lists also agree that quicksort for linked lists doesn't win out until the list length gets really really long.

    There is still obviously room to optimize the selection of the pivot value further, particularly if we were only dealing with unique numbers, since then taking a true mean of a sublist would be more feasible. But if quicksort and mergesort still both have a best case of n*log(n) (88 million here) anyway, I think I will just stick with merge sort.
    Let me introduce you to the smart splitter choosing techinque of my SuperQuickSort for linked-lists. The already sorted and reverse sorted cases become O(n). The trick is that the work done in hunting for a splitter isn't simply additional work that reduces the avarage amount of work required later by picking a beter splitter. Instead it actually replaces a portion of the splitting process (potentially the entire splitting process, if the list was already sorted). Have a look at it on my website. It's one of the things that I've actually gone into much more detail with. (Link in sig)

    A final interesting point: both the quicksort and the insertion sort demonstrated the variation inherent in their design; with insertion sort this variation is toward a "best case" (improving on n^2), whereas with quicksort this is toward a "worst case" (away from n*log(n)). A quick sort cannot really do better than a merge sort although the implementation of a quicksort may be easier to tweak closer to the best case, but in any case would be better known as the "split sort" or "quack sort".
    I agree that the standard quicksort probably isn't worthwhile for lists.

    Incidentally I've recently become of the opinion that Splay-Tree-Sort is in many ways the ultimate doubly-linked-list sorting algorithm.
    It's just so amazingly adaptive, and the list links can be reused as tree child pointer links during the sorting. Any initial sort order you can throw at it, other than random order, gives astoundingly low comparison counts. If this interests you, I highly recommend trying it out.

    I put up a nice syntax-colored version of the code which produced this nonsense, in case someone wants to argue some silly point:
    MK27's linked list comparative sort demo program!
    I believe this is fairly portable. It could almost be used as a crude API, since the sort functions use external comparison functions with offsets to criteria (meaning you could plug any kind of struct into it without changes).
    Nice!
    Mine is implemented in C++ rather than C and this means I don't have to make any modifications whatsoever to the algorithms to animate then and count comparisons etc. The same code you would use in a real program is the exact code I use in my demo app. I recommend you download it and check it out as I'm sure you'll enjoy it. It's linked at the bottom of many of my pages.
    Last edited by iMalc; 01-27-2009 at 01:08 AM.

  10. #25
    Banned
    Join Date
    Aug 2009
    Posts
    43
    you just pick on pointer store its value
    and search in the rest of the list for the minimal
    value
    and swap the values
    then go next to the next node

  11. #26
    Registered User
    Join Date
    Jul 2010
    Posts
    3
    hello
    i want a bubblesorting program of a string using 2d array n using pointers.plz
    help me plzzzzzzzzzzzzz.

  12. #27
    Registered User
    Join Date
    Jul 2010
    Posts
    3
    hello
    i want a bubblesorting program of a string using 2d array n using pointers.plz
    help me plzzzzzzzzzzzzz.
    plz.
    i wil be highly thank ful to him/her.
    plz

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 5
    Last Post: 11-04-2006, 06:39 PM
  2. Reverse function for linked list
    By Brigs76 in forum C++ Programming
    Replies: 1
    Last Post: 10-25-2006, 10:01 AM
  3. sort linked list using BST
    By Micko in forum C Programming
    Replies: 8
    Last Post: 10-04-2004, 02:04 PM
  4. Sort Linked List Algorithm.
    By xddxogm3 in forum C++ Programming
    Replies: 7
    Last Post: 10-13-2003, 11:05 PM
  5. Template Class for Linked List
    By pecymanski in forum C++ Programming
    Replies: 2
    Last Post: 12-04-2001, 09:07 PM