Thread: Sorting list performances

  1. #1
    Registered User
    Join Date
    Feb 2010
    Posts
    72

    Sorting list performances

    Hi all,

    I have a list that contain about (200000 elements) and i want to keep them in a sorting order, so i do some testing with insertion sort and merge sort but it seems that insertion sort is better!! is it normal? how can i improve time and CPU performances? Thanks for your help.


    My node (for simplicity, i omitted the prev node in the node struct):

    Code:
    struct node {
        int number;
        struct node *next;
    };
    Insertion sort :

    Code:
    void insert_node(int value) {
        struct node *new_node = NULL;
        struct node *cur_node = NULL;
        struct node *last_node = NULL;
        int found; /* 1 means found a place to insert the new node in, 0 means not*/
    
    
        new_node = (struct node *)malloc(sizeof(struct node *));
        if(new_node == NULL) {
            printf("memory problem\n");
        }
        new_node->number = value;
        /* If the first element */
        if (head == NULL) {
            new_node->next = NULL;
            head = new_node;
        } 
    
        else if (new_node->number < head->number) {
            new_node->next = head;
            head = new_node;    
        } 
    
        else {
            cur_node = head;
            found = 0;
            while (( cur_node != NULL ) && ( found == 0 )) {
                if( new_node->number < cur_node->number )
                {
                    found = 1;
                }
                else
                {
                    last_node = cur_node;
                    cur_node = cur_node->next;
                }
            }
        /* We got the right place to insert our node */
        if( found == 1 )
        {
            new_node->next = cur_node; 
        }
        /* Insert at the tail of the list */
        else
        {
            last_node->next = new_node;
            new_node->next = NULL;
        }           
    }
    Merge Sort :

    Code:
    /* add a node to the linked list */
    struct node *addnode(int number, struct node *next) {
        struct node *tnode;
    
        tnode = (struct node*)malloc(sizeof(*tnode));
    
        if(tnode != NULL) {
            tnode->number = number;
            tnode->next = next;
        }
    
        return tnode;
    }
    
    /* preform merge sort on the linked list */
    struct node *merge_sort(struct node *head) {
        struct node *head_one;
        struct node *head_two;
    
        if((head == NULL) || (head->next == NULL))
            return head;
    
        head_one = head;
        head_two = head->next;
        while((head_two != NULL) && (head_two->next != NULL)) {
            head = head->next;
            head_two = head->next->next;
        }
        head_two = head->next;
        head->next = NULL;
    
        return merge(merge_sort(head_one), merge_sort(head_two));
    }
    
    /* merge the lists.. */
    struct node *merge(struct node *head_one, struct node *head_two) {
        struct node *head_three;
    
        if(head_one == NULL)
            return head_two;
    
        if(head_two == NULL)
            return head_one;
    
        if(head_one->number < head_two->number) {
            head_three = head_one;
            head_three->next = merge(head_one->next, head_two);
        } else {
            head_three = head_two;
            head_three->next = merge(head_one, head_two->next);
        }
    
        return head_three;
    }

  2. #2
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    A proper merge sort does not go "all the way down". It's usual to use a sort with less overhead to sort the lists when they become "small" (small is a relative term, but maybe 10 or maybe even 100 ... other people here know more about that kind of thing).

    Also, it would be nice if you could post some runnable code (complete with timing code).

  3. #3
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Well, both has a complexity of O(n log n).
    And (afaik), you can't do sorting much better than that, without knowing what is going to be in the list.
    But I read that merge sort is supposed to be a better choice for sorting lists.
    Maybe some other factors are coming into play here ? (Don't know what..)

    > how can i improve time and CPU performances?
    Threading merge sort can be a good thing to do. (And since you have a list, you may not worry about some pitfalls like in arrays).

  4. #4
    Registered User
    Join Date
    Feb 2010
    Posts
    72
    Quote Originally Posted by oogabooga View Post
    A proper merge sort does not go "all the way down". It's usual to use a sort with less overhead to sort the lists when they become "small" (small is a relative term, but maybe 10 or maybe even 100 ... other people here know more about that kind of thing).

    Also, it would be nice if you could post some runnable code (complete with timing code).
    Well you can test with this (for the merge sort) :

    Code:
    int main(void) {
    	struct node *head;
    	struct node *current;
    	struct node *next;
    	int i;
    	struct timeval tbegin,tend; 
            double texec=0.; 
     
            // Start timer 
            gettimeofday(&tbegin,NULL); 
     
    	head = NULL;
    	/* insert some numbers into the list */
    	for (i = 100000; i > 0; i--)
    		head = addnode(i, head);
     
    	/* sort the list */
    	head = merge_sort(head);
     
    	/* print the list */
    	printf(" before after\n"), i = 0;
    	for(current = head; current != NULL; current = current->next)
    		printf("%4d\t%4d\n", i++, current->number);
     
    	/* free the list */
    	for(current = head; current != NULL; current = next)
    		next = current->next, free(current);
    
    	// End timer 
            gettimeofday(&tend,NULL); 
     
           // Compute execution time 
           texec=((double)(1000*(tend.tv_sec-tbegin.tv_sec)+((tend.tv_usec-tbegin.tv_usec)/1000)))/1000.; 
     
    	printf("Elapsed time = %f\n", texec);
     
    	/* done... */
    	return 0;
    }
    And for the insert sort something like this for example :

    struct node *head = NULL;

    Code:
    int main(void) {
     	struct node *current = NULL;
     	struct node *next = NULL;
    	int i = 0;
    
    	struct timeval tbegin,tend; 
            double texec=0.; 
     
            // Start timer 
            gettimeofday(&tbegin,NULL); 
    
     	/* insert some numbers into the list */
     	for(i = 10000; i > 0; i--)
    		insert_node(i);
    
     	/* print the list */
     	printf(" before  after\n"), i = 10000;
     	while(head->next != NULL) {
      		printf("%4d\t%4d\n", i--, head->number);
      		head = head->next;
     	}
    
     	/* free the list */
     	for(current = head; current != NULL; current = next)
      		next = current->next, free(current);
    	// End timer 
            gettimeofday(&tend,NULL); 
     
            // Compute execution time 
            texec=((double)(1000*(tend.tv_sec-tbegin.tv_sec)+((tend.tv_usec-tbegin.tv_usec)/1000)))/1000.; 
     
    	printf("Elapsed time = %f\n", texec);
    
     	return 0;
    }

  5. #5
    Registered User
    Join Date
    Feb 2010
    Posts
    72
    Quote Originally Posted by manasij7479 View Post
    Well, both has a complexity of O(n log n).
    And (afaik), you can't do sorting much better than that, without knowing what is going to be in the list.
    But I read that merge sort is supposed to be a better choice for sorting lists.
    Maybe some other factors are coming into play here ? (Don't know what..)

    > how can i improve time and CPU performances?
    Threading merge sort can be a good thing to do. (And since you have a list, you may not worry about some pitfalls like in arrays).
    Thanks for the reply but i don't get the same performances with the two and i was expecting that merge sort is about o(n * log(n)) and is much better than insert sort which is o(n2).

    How can i use threads in my case, a sample code will be great thanks .

    Regards.

  6. #6
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by evariste View Post
    Thanks for the reply but i don't get the same performances with the two and i was expecting that merge sort is about o(n * log(n)) and is much better than insert sort which is o(n2).
    Insertion sort is O(n^2) only when each insertion is linear.
    That is generally not the case in lists. (I'm not very sure about this point, especially in this context)
    How can i use threads in my case, a sample code will be great thanks .
    Well, just like in any divide and conquer algorithm, divide and you get different works that may be run at the same time.
    I tried it a few days ago. (Though in C++, but you can easily convert the threading code into pthreads or similar)
    Threaded Merge Sort: No performance gain. :(
    Especially look at iMalc's code in #9 .
    (I've still got to much work to do on that front )

  7. #7
    Registered User
    Join Date
    Feb 2010
    Posts
    72
    Thanks manasij7479, i will try to do somme threading on merge sort and see if it's better.

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Insertion sort is very good for data that is almost in sorted order - or has a very small number of objects to be sorted. I don't know if it's the very fastest for a small number of objects, but it certainly is faster than Merge sort or Quicksort, in my tests.

    I use it with Quicksort when the sub-arrays are small (70 or less has proven most effective on an i7 system), and partially sorted already (on integers). It gives Quicksort a nice boost. Since it is a stable sort as well, Insertion sort is more effective than it might appear, at first.

    Malcolm will certainly want to reply to your post! XD

  9. #9
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    The initial problem I see with the insertion sort test is that it is always inserting a number that is smaller than the head of the list, so your test (if new_node->number < head->number) always passes, and you never iterate over the list. Remember, insertion sort is O(n**2) on average (and worst-case), but O(n) in the best case, which you have.

  10. #10
    Registered User
    Join Date
    Feb 2010
    Posts
    72
    Thanks Adak, so you think that quicksort maybe a better choice in my case?

  11. #11
    Registered User
    Join Date
    Feb 2010
    Posts
    72
    Thanks cas, you're right i will try a better example for testing the worst case in insertion sort.

  12. #12
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    After a quick perusal, I see that your merge sort is improperly implemented. Check both head_one and head_two in merge_sort(), after you split the list. Each should contain half of the data. Instead, head_one is almost the entire list.

  13. #13
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by evariste View Post
    Thanks Adak, so you think that quicksort maybe a better choice in my case?
    No! Quicksort is lousy for lists. Just mentioned Quicksort, because it was the algorithm that I had tested with Insertion sort, and found it to significantly improve it. I believe that Insertion sort would also be useful to your list sorting, depending on the data and merge sort algorithm you use.

    The first thing though, is to get your Merge sort code, up to snuff. (see above post).

  14. #14
    Registered User
    Join Date
    Feb 2010
    Posts
    72
    Thanks Adak, cas, i fixed the problem with my merge sort algorithm, the new one is below :

    Merge sort algorithm:

    Code:
    /* insert a node */
    struct node *insert_node(int number, struct node *next) {
    	struct node *new_node;
     
    	new_node = (struct node*)malloc(sizeof(*new_node));
     
    	if(new_node != NULL) {
    		new_node->number = number;
    		new_node->next = next;
    	}
     
    	return new_node;
    }
     
    /* sorting the list */
    void merge_sort(struct node** list_head)
    {
      struct node* head = *list_head;
      struct node* first_half;
      struct node* second_half;
     
      if ((head == NULL) || (head->next == NULL))
      {
        return;
      }
     
      /* divide list into two sublists */
      list_divide(head, &first_half, &second_half); 
     
      /* Recursively sort */
      merge_sort(&first_half);
      merge_sort(&second_half);
     
      *list_head = merge_sorted_lists(first_half, second_half);
    }
    
    /* Merge our lists to get one sorted list */
    struct node* merge_sorted_lists(struct node* first_half, struct node* second_half)
    {
      struct node* result = NULL;
     
      if (first_half == NULL)
         return(second_half);
      else if (second_half==NULL)
         return(first_half);
     
      if (first_half->number <= second_half->number)
      {
         result = first_half;
         result->next = merge_sorted_lists(first_half->next, second_half);
      }
      else
      {
         result = second_half;
         result->next = merge_sorted_lists(first_half, second_half->next);
      }
      return(result);
    }
     
    /* Divide the list into two lists */
    void list_divide(struct node* original_list,
              struct node** first_half, struct node** second_half)
    {
      struct node* temp_one; 
      struct node* temp_two; 
      
      if (original_list ==NULL || original_list->next==NULL)
      {
        *first_half = original_list;
        *second_half = NULL;
      }
      else
      {
        temp_two = original_list;
        temp_one = original_list->next;
     
        while (temp_one != NULL)
        {
          temp_one = temp_one->next;
          if (temp_one != NULL)
          {
            temp_two = temp_two->next;
            temp_one = temp_one->next;
          }
        }
     
        *first_half = original_list;
        *second_half = temp_two->next;
        temp_two->next = NULL;
      }
    }
    And now i can see that it's very fast and in my tests i got the same time of execution as with insertion sort. Any optimizations are welcome .

    Regards.

  15. #15
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    MergeSort is certainly very good for lists, better than quicksort really since that cant tend towards O(n^2).
    However the killer is BucketSort. Merge Sort is a long way ahead of most other algorithms for linked-list sorting and then BucketSort is light-years ahead of that.
    The catch of course is that your key needs to support being split into buckets, as well as just being directly comparable. In your case you are using ints for which it is fine.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Need help with list sorting
    By dnguyen1022 in forum C++ Programming
    Replies: 1
    Last Post: 04-23-2009, 10:46 PM
  2. Replies: 16
    Last Post: 08-25-2008, 11:08 PM
  3. help with sorting a list :(
    By Axel in forum C Programming
    Replies: 54
    Last Post: 10-16-2006, 07:10 AM
  4. Compiler performances...
    By lyx in forum C++ Programming
    Replies: 2
    Last Post: 09-21-2003, 12:38 PM
  5. std::list<*> sorting
    By ziruz in forum C++ Programming
    Replies: 18
    Last Post: 06-25-2003, 03:23 PM