Thread: Algorithm for sorting array of integers w/respect to another array c code would be g8

  1. #1
    Registered User
    Join Date
    Apr 2010
    Posts
    3

    Algorithm for sorting array of integers w/respect to another array c code would be g8

    I'm looking for a fast algorithm for sorting an array of integers with respect to a master list.

    For example:

    Master List: 7 3 9 1 32 6 5 4 99 100 201 13
    Array that needs to be sorted array1: 1 5 4 3 6

    The "sorted" array would be: 3 1 6 5 4

    The order of the array that needs to sorted is originally random. Neither the master list nor the array that needs to be sorted will have repeating elements.

    The fastest algorithm I have right now is:
    Sort array1 with quicksort (n lg n)
    foreach element in master list{
    search for it in array1 //(lg n)
    if found add to array2
    }

    array2 has the "sorted" list I am looking for

    Unfortunately this gives worst case time n*lg(n) + m*lg(n) where n=number of elements in array 1 and m = number of elements in master list

    and m is really big

  2. #2
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    That's because you're doing a linear search in the second stage.
    Instead do a binary search which has a performance of O(log N).

  3. #3
    Registered User
    Join Date
    Apr 2010
    Posts
    3
    I am doing a binary search in the second stage, but I'm looking looking for each instance of the master list in the second array.

    So second stage still ends up being m*lg(n) with binary search

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by tc18a View Post
    I'm looking for a fast algorithm for sorting an array of integers with respect to a master list.

    For example:

    Master List: 7 3 9 1 32 6 5 4 99 100 201 13
    Array that needs to be sorted array1: 1 5 4 3 6

    The "sorted" array would be: 3 1 6 5 4

    The order of the array that needs to sorted is originally random. Neither the master list nor the array that needs to be sorted will have repeating elements.

    The fastest algorithm I have right now is:
    Sort array1 with quicksort (n lg n)
    foreach element in master list{
    search for it in array1 //(lg n)
    if found add to array2
    }

    array2 has the "sorted" list I am looking for

    Unfortunately this gives worst case time n*lg(n) + m*lg(n) where n=number of elements in array 1 and m = number of elements in master list

    and m is really big
    Oh what a fun post!

    If the range of numbers in the arrays can fit into one array's index, then you can use "counting sort". Counting sort is quite special, and beats any other sorting algorithm by a huge amount IF the range of the numbers is not too large.

    I'm making up a little program for you, to show what I mean.


    If Counting sort can't be used, then:

    #1) Let's make it an optimized Quicksort - here's one

    Code:
    //Optimzed Quicksort
    void quicksortOpt(int A[], int lo, int hi) {
      int i, j, pivot, temp;
      
      if(lo == hi) return; 
    
      if((hi - lo) < InsertNum) {
        insertion(A, lo, hi+1); 
        return;
      }
    
      i=lo; 
      j=hi;
      pivot= A[(lo+hi)/2]; 
    
      /* Split the array into two parts */
      do {    
        while (A[i] < pivot) i++; 
        while (A[j] > pivot) j--;
        if (i<=j) {
          //swap(A, i, j);   //this line works fine if you have a separate swap function
          /* Oddly, on large N, using swap() gives the same time, as
             using the swap code below */
          temp= A[i];
          A[i]= A[j];
          A[j]=temp;
          i++;
          j--;
        }
      } while(i<=j);
      
      if (lo < j) quicksortOpt(A, lo, j);
      if (i < hi) quicksortOpt(A, i, hi);
      
    }
    It requires Insertion sort to handle the smaller sub arrays, but it's definitely worth it. I'm putting together a little example program, showing that. I have tested this against 4 other versions of Quicksort, and it beats them all.


    #2) You should use a Binary or Indexed search, if a counting sort is not possible. What size is the smaller and larger array's, typically?

    And very important, WHAT IS THE RANGE of the numbers in the arrays?

    No repeating numbers in either array, right? That can be useful.

  5. #5
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    So, just clarifying... 'array1' a list of indexes into the master array, which you wish to sort. I.e. 1, 5, 4, 3, 6 refer to the master array elements with values: 3, 6, 32, 1, 5. You then want to generate an array indexes that refer to these items in sorted order.
    I.e. You want the indexes for items with values 1, 3, 5, 6, 32 from the master array, which are: 3, 1, 6, 5, 4

    It appears that you simply need to sort the index array with respect to the items from the master list. Assuming C++:
    Code:
    #include <algorithm>
    #include <iterator>
    #include <iostream>
    
    using namespace std;
    
    static const int master[] = {7, 3, 9, 1, 32, 6, 5, 4, 99, 100, 201, 13};
    
    bool myLessThan(int lhs, int rhs)
    {
        return master[lhs] < master[rhs];
    }
    
    int main()
    {
        int indexes[] = {1, 5, 4, 3, 6};
        sort(indexes, indexes + sizeof(indexes)/sizeof(indexes[0]), myLessThan);
        copy(indexes, indexes + sizeof(indexes)/sizeof(indexes[0]), ostream_iterator<int>(cout, "\t"));
    }
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  6. #6
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    Algorithm should not exceed complexity O(N log N) where N is the size of the indexing array array1.

    Code:
    int master[] = {7, 3, 9, 1, 32, 6, 5, 4, 99, 100, 201, 13};
    int array1[] = {1, 5, 4, 3, 6};
    
    int compare(const void *a, const void *b) {
    return master[*(int *)a] - master[*(int *)b]; }
    
    ...
    qsort(array1, sizeof(array1)/sizeof(int), sizeof(int), compare);
    ...
    Note, qsort was used here for testing. qsort is not guaranteed to be as good as O(N log N).
    Last edited by nonoob; 04-07-2010 at 09:11 AM.

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    This is an example of the two best sorters, in action. Counting sort doesn't actually sort, but has the effect of sorting, if the range of values is not too large.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <time.h>
    
    #define MAX 10000         //the size of A[] and B[], and the maximum integer value
    #define REPEAT 200        //number of times the array is refilled, and resorted
    #define SHOW 0             //SHOW = 1 prints the values, in sorted order 
    #define InsertNum 50      //When the Insertion sort is called for sub array sorting
    /* If InsertNum > 60, the times for sorting begin to increase */
    
    void check(int A[], int B[], int choice);  //checks accuracy of sort
    void countingSort(int A[], int B[]);
    void insertion(int A[], int, int);
    void quicksortOpt(int A[], int l, int r);
    void randomNums(int A[]);                //puts random int's into A[] array
    void showIt(int A[], int B[], int choice);  //shows values
     
    
    int main(void)  {
      int i, l, r, choice;
      clock_t start, stop;
      int A[MAX]; 
      int B[MAX] = { 0 };
      l = 0; r = MAX-1;
    
      printf("\n\n\n   Name of Algorithm         Seconds   Sorting %d Integers %d Times", MAX, REPEAT);
      printf("\n   =========================================================================");
     
      start = clock();
      for(i = 0; i < REPEAT; i++) {
        randomNums(A);
        quicksortOpt(A,l,r);   //OK
      }
      stop = clock();
      check(A, B, 0);
      printf("\n   Quicksort Optimized       %7.4f   Insertion Threshold is: %d", (stop-start)/CLK_TCK, InsertNum);
      if(SHOW) { showIt(A, B, 0); i = getchar(); }
    
      start = clock();
      for(i = 0; i < REPEAT; i++) {
        randomNums(A);
        countingSort(A,B);   //OK
      }
      stop = clock();
      check(A, B, 1);
      printf("\n   Counting Sort             %7.4f",   (stop-start)/CLK_TCK, InsertNum);
      if(SHOW) { showIt(A, B, 1); i = getchar(); }
    
      printf("\n\n\t\t\t     press enter when ready");
      i = getchar();
      return 0;
    }
    //================ End of Main ============================
    
    /* In this example, 0's are excluded as a valid value */
    //Counting Sort
    void countingSort(int A[], int B[]) {
      int i;
      for(i = 0; i < MAX; i++)
        if(A[i])
          B[A[i]] = A[i];
    }
    //Insertion sort
    void insertion(int A[], int lo, int hi) {  //lo and hi are local, not global 
      int value;
      int ohi = hi;                    //ohi is the original value of hi
      for(; lo < ohi; lo++)  {
        value = A[lo];
        hi = lo - 1;
        while(A[hi] > value)  {
          A[hi + 1] = A[hi];
          --hi;
          if(hi < 0) break;
        } 
        A[hi + 1] = value;
      }
    }
    
    //Optimzed Quicksort
    void quicksortOpt(int A[], int lo, int hi) {
      int i, j, pivot, temp;
      
      if(lo == hi) return; 
    
      if((hi - lo) < InsertNum) {
        insertion(A, lo, hi+1); 
        return;
      }
    
      i=lo; 
      j=hi;
      pivot= A[(lo+hi)/2]; 
    
      /* Split the array into two parts */
      do {    
        while (A[i] < pivot) i++; 
        while (A[j] > pivot) j--;
        if (i<=j) {
          //swap(A, i, j);   //this line works fine.
          /* Oddly, on large N, using swap() gives the same time, as
             using the swap code below */
          temp= A[i];
          A[i]= A[j];
          A[j]=temp;
          i++;
          j--;
        }
      } while(i<=j);
      
      if (lo < j) quicksortOpt(A, lo, j);
      if (i < hi) quicksortOpt(A, i, hi);
      
    }
    void randomNums(int A[]) {
      int i;
      for(i = 0; i < MAX; i++)
        A[i] = rand() % MAX;
    }
    
    /* checks A[] (0), or B[] (1) , depending on the value of choice */
    void check(int A[], int B[], int choice) {
      int i;
      if(!choice) {  //check A's contents directly
        for(i = 0; i < MAX - 2; i++) {
          if(A[i] > A[i+1]) {   //an error
            printf("\n\nError!: %d ", A[i]);
            return;
          }
        }
      }
      else {   //check A's contents through the index array
        for(i = 0; i < MAX - 2; i++) {
          if(B[i] > B[i+1] && B[i+1]) {   //an error
            printf("\n\nError!: %d ", B[i]);
            return;
          }
        }
      }
    }
    
    /* shows either A[]'s (0), or B[]'s (1) value, depending on choice */
    void showIt(int A[], int B[], int choice) {
       int i;
       printf("\n\n");
       if(!choice) {    //regular "direct" display
         for(i = 0; i < MAX; i++)
           printf("%7d ", A[i]);
       }
       else {
         for(i = 0; i < MAX; i++)
           if(B[i] > 0)
             printf("%7d ", B[i]);
       }
    }

    Note on Counter Sort:
    Any check for a number N in the array B[] is nearly instantaneous:

    if(B[N]) (if B was originally set for all 0's), or if(B[N] == N], otherwise.

    Search is eliminated. It's just a lookup.

    The system can sort strings very quickly - faster than anything I've ever seen, but with integers, this Quicksort is faster. It is much faster than C's qsort().

    But Counter Sort blows everything away, *IF* it can be used on your problem.

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    If I could confirm that this wasn't homework then I would have given an answer using C. Hopefully I gave you enough of an idea with what I posted. The advantage of std::sort vs qsort is that it is guaranteed to be O(n*logn).
    As noted, to optimise furthur it would be useful to know what restrictions we can assume on the data in the master array. For example, will all values be positive?
    Is there some minimum and maximum possible value for entries in the master array?
    Are you able to assume there are no duplicates in the master array? What about the index arrays?
    What about the distribution of the values in the master array, can we assume they are relatively evenly distributed?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  9. #9
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    He said there were no duplicates, anywhere.

    I don't know if it's homework or not, but I believe whatever we put up, it should help further knowledge, to him, and maybe to his classroom, as well. It's dreadful having all this bubble sort type of sorting programs, and so little time and few posts are focused on optimized sorting programs.

  10. #10
    Registered User
    Join Date
    Apr 2010
    Posts
    3
    Just for clarification, this is just part of a much bigger project I am working on for a class....(writing a fault simulator), so using other people's code for a fast sort is fine....however I figured out a fast algorithm!!!!

    Thanks to nonoob!!! Although the comparison was wrong, it led me to the right idea:

    n lg(n) algorithm;

    I created a new array called location[].
    While I am creating my master list, I store the location of each data value into location. For example, if in the master list, data value 5 is in location index 10, I do location[5] = 10. Now, for any given data value in array1, I can look up what its index in the master list is in constant time.

    I then did a qsort(array1, length of array1, sizeof(int), compare);

    Where compare is:
    {
    return location[a] - location[b] //not master!!!
    }

    qsort here gives n log n (array1 is pretty random and has unique values) and no dependency on M anymore.

    Thanks for all the responses!
    Last edited by tc18a; 04-07-2010 at 10:14 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Array Program
    By emmx in forum C Programming
    Replies: 3
    Last Post: 08-31-2003, 12:44 AM
  2. Replies: 2
    Last Post: 05-06-2003, 08:34 AM
  3. Replies: 5
    Last Post: 12-05-2002, 02:27 AM
  4. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM
  5. whats wrong with my code?
    By Unregistered in forum C Programming
    Replies: 2
    Last Post: 10-31-2001, 08:44 PM