Thread: iMalc Super Quick Sort

  1. #1
    Registered User
    Join Date
    Aug 2007
    Posts
    42

    iMalc Super Quick Sort

    Any one implement iMalc super quick sort and actually get it to be functionall?

    I've got some questions for Malcom, If he would be able to spend a moment to assist?

    first let me say, I'm trying to implement the sort using standard C.
    Code:
    #include <stdio.h>
    #include <unistd.h>
    
    typedef struct node {
         struct  node* prev;
         struct  node* next;
          void  *data;
    } node_t;
    
    void* node_get(node_t* n) {
           return (n->data);
    }
    
    static void
    list_partition( node_t *start
                  , node_t *split
                  , node_t *less_n
                  , node_t *grtr_n
                  , int  comp(const void *, const void *) ) {
    
        do {
            node_t *next  = start->next;
            if ( comp(node_get(start), node_get(split))  < 0 ) {
                start->next = less_n;
                less_n = start;
            } else {
                start->next = grtr_n;
                grtr_n =  start ;
            }
            start = next;
        } while (start != NULL );
    
        return ;
    }
    
    
    static  node_t *
    list_SQsortAux( node_t *head
                  , int  comp(const void *, const void *) ) {
       node_t *first = head;
       node_t *second = head->next;
       node_t *less = NULL, *center = NULL, *greatereq = NULL ;
    
     if(second != NULL ) {
          do {     /* find a splitter node - the first whose next item is smaller */
             if (comp(node_get(second), node_get(first)) < 0 ) {  /* Splitter found */
                greatereq = center = first ;
                first->next = NULL;
                less = head;
    
                /* Partition the list   */
                list_partition(second, center, less, greatereq, comp );
                node_t *const join = list_SQsortAux(less, comp );
                node_t *const last = list_SQsortAux(greatereq, comp );
    
                /* join the two sorted lists */
                head = less ;
                join->next = greatereq ;
    
                return last;
             }
             /* Still searching for splitter  */
             first = second;
             second = second->next;
          }while( second != NULL )  ;
       }
    
       /* Already sorted, No split node found */
       return first ;
    }
    
    void
    list_SQsort(node_t *ll, int  comp(const void *, const void *) ) {
    
        if (ll != NULL ) {
           list_SQsortAux(ll, comp );
        }
    
    }
    Within the aux function: what is the point of the *prev ? its not used ?

    I think there is a flaw in the design.. Consider the following added to a list,
    50, 20, 10, 15, 55

    The Spitt node is found for 50, this is also becomes the head, first, center and greater.
    And the partitioning when it encounters 55 will be incorrect as it will attempt to add it to the same stack as less, except less ptr has now moved.

    Any ideas on how to resolve this? Any Improvements for a double linked list, where a sentinel node is used ?

    Thanks
    Last edited by bean66; 08-22-2007 at 04:58 PM.

  2. #2
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Hi bean66

    First, here's my implementation for reference:
    Code:
    //Gives a fake node pointer where only the 'next' member is valid (non-standard)
    #define fakePrev(T, H, N) (T*)(((size_t)&H) + ((size_t)H) - ((size_t)&H->N))
    
    template <class TItem>
    inline void SuperQuickSort(TItem *&head) {
        if (head != NULL)
            SuperQuickSortAux(head);
    }
    
    template <class TItem>
    TItem *SuperQuickSortAux(TItem *&head) {
        TItem *prev, *first = head, *second = head->next, *less, *center, *greatereq;
        if (second != NULL) {
            prev = fakePrev(TItem, head, next);
            do { //find a spliter - choose the first item whose next item is smaller
                if (*second < *first) { //Splitter found
                    greatereq = center = first;
                    prev->next = first->next = NULL;
                    less = head;
                    //partition the list
                    Partition(second, *center, less, greatereq);
                    //recursive calls
                    TItem *const join = SuperQuickSortAux(less);
                    TItem *const last = SuperQuickSortAux(greatereq);
                    //join the sorted lists
                    head = less;
                    join->next = greatereq;
                    return last;
                }
                //still looking for splitter
                prev = first;
                first = second;
                second = second->next;
            } while (second != NULL);
        }
        //already sorted (we never found a splitter)
        return first;
    }
    
    template <class TItem, class TKey>
    void Partition(TItem *headIn1, const TKey &splitter, TItem *&headOut1, TItem *&headOut2) {
    	assert(headIn1 != NULL);
    	do {
    		TItem *next = headIn1->next;
    		if (*headIn1 < splitter) {
    			headIn1->next = headOut1;
    			headOut1 = headIn1;
    		} else {
    			headIn1->next = headOut2;
    			headOut2 = headIn1;
    		}
    		headIn1 = next;
    	} while (headIn1 != NULL);
    }
    Direct link:
    http://homepages.ihug.co.nz/~aurora7...t%20Techniques

    The implementation I have of this on my website was actually for a singly-linked list, though I've also got an implementation of it for a doubly-linked list. The above implementation therefore does not require a structure with a prev pointer.
    The prev local variable I declare above is used for cutting the list once the splitter has been found.
    fakePrev is just a hackish way of avoiding the common code fore dealing with the special case of removing the first node from a list. There becomes no need for special case code for removing from the start of the list e.g.
    Code:
    if (prev == NULL) head = curr->next; else prev = curr->next;
    Oh, I should also mention that the Aux function always returns the tail of the list it sorted. Missing that comment in the code above.

    One of the things I've done with the implementation I posted is that the code that performs the search for the splitter actually contains the recursive calls directly within the if-statement in the loop, where it finds the splitter, after which it always returns. I possibly should have made the recursive calls later from outside the loop, but the effect is the same.

    In the example you give, 50 will be found as the splitter, so the algorithm's strengths do not become obvious here. The less-than list will start empty and the greater-than-or-equal list will contain just 50.
    Ah drat I just noticed I appear to have missed out the partition function from the web page. I've added it above. Will update my site with it shortly
    Then the partition continues as per normal ater which the two lists will be as follows:
    15, 10, 20
    55, 50
    A recursive call sorts out each of these lists, and they are then concatenated giving:
    10, 15, 20, 50, 55
    In this example the list is small enough to not bennefit from the modifications over the quicksort algorithm.

    The algorithm does work correctly, as I maintain a number of unit test within the SortingDemo program in the zip file on that page. It is fairly trivial to show how it takes O(n) time for a sorted or reverse sorted list. However, make no mistake, like ordinary quicksort, it has its O(n*n) case. Best case is O(n), average case is O(nlogn), worst case is O(n*n).
    It's just that it is about as unlinkely to occur in practice as it is for an array, which is the main advantage. To others: It's best to read the link above for a description of how the algorithm varies from normal quicksort.

    The above algoirthm should work as-is in C++ on a list with a 'next' pointer member in the node structure, and a less-than operator defined. For C you need to remove the template (obviously) and replace the call to the less-than operator, with a call to a compare function.
    Actually I've been tempted to implement this particular algorithm in other languages such as prolog
    Last edited by iMalc; 08-23-2007 at 01:48 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I've updated my linked-list sorting page with an example of sorting using this algorithm. Hopefully it should clarify things immensely.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    Registered User
    Join Date
    Aug 2007
    Posts
    42
    iMalc:
    Thanks for the response.
    Would you post your double link list implementation ?

    I agree that the algorithm does work and works as you describe, I'm just having a bit of a struggle getting my implementation in C functional. I guess the fakePrev is a little odd to me. Where is the prev used ? I dont see where it's accessed ?

    One other thing I'm missing. Is the partition designed to build to seperate lists or to re-locate into 2 new lists (less, greatereq) ??? I'm confused since the less/greatereq are assigned to existing list elelments.

    In the example i gave: 50, 20, 10, 15, 55 (and in my code version)
    After the split is picked at 50. The less list is also set at node 50 greatereq at 20 resulting in an interlinked set of pointers after partition returned
    Less: 15->10->20->50
    great: 55->20

    Should the routine set the less than to NULL? and greater to 50?
    Code:
                    prev->next = first->next = NULL;
                    less = head;
    I think that is what I missed.. the prev->next is really the head when the split is the first node?

    I can't use that technique in my version since the fakePrev will not work due to my node structure.

    For my purposes, I'll often have a mostly sorted input or a reversed input. Usuall of short amounts of data. For those cases I can hand the sorting off to an insertion sort. But there are occasional inputs of a large amount of data.

    Thanks,
    Ken

  5. #5
    Registered User
    Join Date
    Aug 2007
    Posts
    42
    I just saw your updated we page. That clarifies things...

    This is a very unique sorting routine to say the least. I've not seen that techniqe used elsewhere either.

    Thanks again.
    Ken

  6. #6
    Registered User
    Join Date
    Aug 2007
    Posts
    42
    I've managed to get this working, a few things, different in C.

    the declarations *&.. needed to be changed to **,
    Code:
    static void
    list_partition( node_t *start
                  , node_t *split
                  , node_t **less_n
                  , node_t **grtr_n
                  , int32_t  comp(const void *, const void *) ) {
    
        do {
            node_t *next  = start->next;
    
            if ( comp(lnode_get(start), lnode_get(split))  < 0 ) {
                  start->next = *less_n;
                *less_n = start;
            } else {
                 start->next = *grtr_n;
                *grtr_n =  start ;
            }
            start = next;
        } while (start != NULL );
    
        return ;
    }
    
    static  node_t *
    list_SQsortAux( list_t  *list
                  , node_t **head
                  , int32_t  comp(const void *, const void *) ) {
       assert (head != NULL );
       node_t *prev  = NULL;
       node_t *first = *head;
       node_t *second = (*head)->next;
       node_t *center = NULL;
       node_t *less=NULL, *greatereq =NULL;
    
    
       if(second != NULL ) {
          prev =  fakePrev (lnode_t, (*head), next );
          /* find a splitter node - the first whose next item is smaller   */
          do {
             if (comp(lnode_get(second), lnode_get(first)) < 0 ) {
                /* Splitter found, first  */
                /*
                ** 1. Set center and greatereq to splitter
                ** 2. disconnect the spliter from list
                ** 3. disconnect the prior list from splitter
                ** 4. less than is head, can be null if splitter == head
                ** 5. remaining list is second
                */
                greatereq = center = first ;  /* 1 */
                first->next = NULL;              /* 2 */
                prev->next = NULL;            /* 3 */
                less = *head ;
    
                /* Partition the list   */
                list_partition(second, center, &less, &greatereq, comp );
                lnode_t *const join = list_SQsortAux(list, &less, comp );
                lnode_t *const last = list_SQsortAux(list, &greatereq, comp );
    
                /* join the two sorted lists */
                *head = less;
                join->next = greatereq;
    
                return last;  /* Return the tail end of the list */
             }
    
             /* Still searching for splitter  */
             prev = first;
             first = second;
             second = second->next;
          }while( second != NULL )  ;
       }
    
       /* Already sorted, No split node found */
       return first ;
    }
    Thanks for your help and detailed explenation. Once I got my head around the fact that partitioning and SortAux list pointers were actually ** that made life a lot easier!

    Now to implement using doubly linked lists.

  7. #7
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Hmm, the doubly-linked list implementation I have is actually for a ring-list (last node points back to the first). I'll post it anyway:
    Code:
    template <class TItem, class TKey>
    void Partition(TItem *headIn1, const TKey &splitter, TItem *&headOut1, TItem *&headOut2) {
    	while (headIn1 != NULL) {
    		TItem *temp = Pop(headIn1);
    		if (*temp < splitter)
    			Push(headOut1, temp);
    		else
    			Push(headOut2, temp);					
    	}
    }
    template <class TItem>
    void SuperQuickSort(TItem *&head) {
    	if ((head != NULL) && (head->next != head)) { //Must be at least 2 items
    		TItem *first = head, *second = head->next, *less, *center, *greatereq, *curr;
    		do { //find a spliter - choose the first item whose next item is smaller
    			if (*second < *first) { //Splitter found
    				less = greatereq = NULL;
    				if (first != head) {
    					//use passed items as less-than list.
    					//this actually splits the list in two!
    					less = SplitJoinNonEmptyLists(head, first);
    					head = first;
    				}
    				curr = head;
    				center->prev = center->next = center = Pop(curr);
    				//partition the list
    				Partition(curr, *center, less, greatereq);
    				//recursive calls
    				SuperQuickSort(less);
    				SuperQuickSort(greatereq);
    				//join the sorted lists
    				head = SplitJoinNonEmptyLists(less, ConcatLists(center, greatereq));
    				break;
    			}
    			//still looking for splitter
    			first = second;
    			second = second->next;
    		} while (second != head);
    	}
    }
    
    template <class TItem>
    TItem *SplitJoinNonEmptyLists(TItem *head1, TItem *head2) {
    	assert(head1 != NULL && head2 != NULL);
    	TItem *prev1 = head1->prev;
    	TItem *prev2 = head2->prev;
    	prev1->next = head2;
    	head1->prev = prev2;
    	prev2->next = head1;
    	head2->prev = prev1;
    	return head1;
    }
    
    template <class TItem>
    TItem *ConcatLists(TItem *head1, TItem *head2) {
    	//if either list is NULL return the other list
    	if (head1 == NULL) return head2;
    	if (head2 == NULL) return head1;
    	return SplitJoinNonEmptyLists(head1, head2);
    }
    As you can see I did things slightly diferently with this version when I wrote it long ago. I still need to go back and tidy it up to put center in the greatereq list like the singly-linked list version. Will do this shortly, I just don't have time to test it right now.

    Other things you need to take care of to use this:
    Doubly-linked list Push and Pop functions - should be easy.

    Yeah I always forget about C not having references. Gets me every time. Sorry bout that.

    Yup, prev->next is really the head when we want to cut the list off before the first node. To not use that sneaky trick, you'd just need to check for prev still being NULL, and use head instead of prev->next. However you can't easily use the trick for doubly-linked lists, because only the 'next' member of prev is valid to use, unless you are rather careful and clever in your doubly-linked list class definition.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  8. #8
    Registered User
    Join Date
    Aug 2007
    Posts
    42
    iMalc,

    Thanks posting the double linked code... Yes, reference pointer logic does get a bit tricky, usually freaks out new coders.

    I hacked my implementation to set the last nodes next to null and the first nodes prev to null. Then simply passed in the first node.
    When finished, traversed the list setting the curr->prev = prev. And a few other pointer manipulations and voila...

    I think that maybe that method (treating a doubly linked as a simple single ) is quicker than maintaining doubly linked lists for the sort. It does have the disadvantage of being O(n * log n) + c*n.

    My list implementation uses a sentinel node that is never NULL. If the list is empty its prev/next point back onto itself. This makes the code super easy to insert/delete a new node entry. Traversal is also equally easy, just stop at the sentinel node. Push/pop are trivial as are enque/deque.

    Here are some timing numbers the for various sorts against a doubly linked dist.

    Sort inner loop outer loop data Optimized
    InsertSort 5 10000000 Random 2.73
    InsertSort 5 10000000 Sorted 2.24
    InsertSort 5 10000000 reversed 1.81

    Sqsort 5 10000000 Random 3.14
    Sqsort 5 10000000 Sorted 1.28
    Sqsort 5 10000000 reversed 1.62

    InsertSort 10 5000000 Random 2.6
    InsertSort 10 5000000 Sorted 2.67
    InsertSort 10 5000000 reversed 1.76

    Sqsort 10 5000000 Random 2.71
    Sqsort 10 5000000 Sorted 1.3
    Sqsort 10 5000000 reversed 1.89

    MergeSort 10 5000000 Random 3.6
    MergeSort 10 5000000 Sorted 3.6
    MergeSort 10 5000000 reversed 5.96

    InsertSort 100 500000 Random 6.97
    InsertSort 100 500000 Sorted 20.66
    InsertSort 100 500000 reversed 1.76

    Sqsort 100 500000 Random 4.94
    Sqsort 100 500000 Sorted 1.3
    Sqsort 100 500000 reversed 1.57

    MergeSort 100 500000 Random 6.8
    MergeSort 100 500000 Sorted 4.39
    MergeSort 100 500000 reversed 4.8


    Interesting to note that it beats insertion sort for short ordered lists, just what I was looking to do! In general my lists will be only 1 or 2 nodes. But occasionaly more and ordering will typically be ordered(70&#37, reversed(20%) or mixed(10%).

    Thanks again for your assistance in understanding the algorithm.

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Whether it treats a doubly-linked list as a singly-linked one (fixing afterwards), or it treats it as an actual doubly-linked list only really differs in terms of what Push and Pop do. The Push and Pop I have maintain the prev pointers as well.

    Nice to see some timings. I usually just check the comparison counts etc.
    I presume that the smaller numbers are the list lengths, and the larger ones are the number of iterations?
    The case you describe is pretty much the ideal scenario for using this algorithm, so it should work well.
    If the lists were a lot longer, then bucketsort would be better for the random case, but the best-case of the algorithm discussed here is really as fast as determining if a list is sorted (no nodes get modified at all), for which there isn't really anything that could be any faster.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Registered User
    Join Date
    Aug 2007
    Posts
    42
    iMalc,

    Yes the smaller numbers are for an inner loop, ie the number of nodes inserted into the list. The outer loop is an iteration over the inner.
    OuterLoop
    Inner loop
    add nodes to list.
    end loop

    Sort
    Free Nodes
    End Loop

    I'll probably build the In place version as well instead of a post processing one just to compare the timings.

    I'm surprised that the merge sort was so slow. I think it was due to the List Transfer and merge processes that were written.

    Thanks again.
    Ken

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 26
    Last Post: 07-05-2010, 10:43 AM
  2. Quick Sort
    By e66n06 in forum C++ Programming
    Replies: 13
    Last Post: 08-21-2007, 08:02 AM
  3. Quick Sort
    By ramayana in forum C Programming
    Replies: 9
    Last Post: 06-17-2007, 05:54 PM
  4. radix sort and radix exchange sort.
    By whatman in forum C Programming
    Replies: 1
    Last Post: 07-31-2003, 12:24 PM
  5. Quick sort
    By Saiyanman in forum C Programming
    Replies: 4
    Last Post: 07-30-2002, 10:16 PM