# Thread: Algorithm for sorting array of integers w/respect to another array c code would be g8

1. ## Algorithm for sorting array of integers w/respect to another array c code would be g8

I'm looking for a fast algorithm for sorting an array of integers with respect to a master list.

For example:

Master List: 7 3 9 1 32 6 5 4 99 100 201 13
Array that needs to be sorted array1: 1 5 4 3 6

The "sorted" array would be: 3 1 6 5 4

The order of the array that needs to sorted is originally random. Neither the master list nor the array that needs to be sorted will have repeating elements.

The fastest algorithm I have right now is:
Sort array1 with quicksort (n lg n)
foreach element in master list{
search for it in array1 //(lg n)
}

array2 has the "sorted" list I am looking for

Unfortunately this gives worst case time n*lg(n) + m*lg(n) where n=number of elements in array 1 and m = number of elements in master list

and m is really big

2. That's because you're doing a linear search in the second stage.
Instead do a binary search which has a performance of O(log N).

3. I am doing a binary search in the second stage, but I'm looking looking for each instance of the master list in the second array.

So second stage still ends up being m*lg(n) with binary search

4. Originally Posted by tc18a
I'm looking for a fast algorithm for sorting an array of integers with respect to a master list.

For example:

Master List: 7 3 9 1 32 6 5 4 99 100 201 13
Array that needs to be sorted array1: 1 5 4 3 6

The "sorted" array would be: 3 1 6 5 4

The order of the array that needs to sorted is originally random. Neither the master list nor the array that needs to be sorted will have repeating elements.

The fastest algorithm I have right now is:
Sort array1 with quicksort (n lg n)
foreach element in master list{
search for it in array1 //(lg n)
}

array2 has the "sorted" list I am looking for

Unfortunately this gives worst case time n*lg(n) + m*lg(n) where n=number of elements in array 1 and m = number of elements in master list

and m is really big
Oh what a fun post!

If the range of numbers in the arrays can fit into one array's index, then you can use "counting sort". Counting sort is quite special, and beats any other sorting algorithm by a huge amount IF the range of the numbers is not too large.

I'm making up a little program for you, to show what I mean.

If Counting sort can't be used, then:

#1) Let's make it an optimized Quicksort - here's one

Code:
```//Optimzed Quicksort
void quicksortOpt(int A[], int lo, int hi) {
int i, j, pivot, temp;

if(lo == hi) return;

if((hi - lo) < InsertNum) {
insertion(A, lo, hi+1);
return;
}

i=lo;
j=hi;
pivot= A[(lo+hi)/2];

/* Split the array into two parts */
do {
while (A[i] < pivot) i++;
while (A[j] > pivot) j--;
if (i<=j) {
//swap(A, i, j);   //this line works fine if you have a separate swap function
/* Oddly, on large N, using swap() gives the same time, as
using the swap code below */
temp= A[i];
A[i]= A[j];
A[j]=temp;
i++;
j--;
}
} while(i<=j);

if (lo < j) quicksortOpt(A, lo, j);
if (i < hi) quicksortOpt(A, i, hi);

}```
It requires Insertion sort to handle the smaller sub arrays, but it's definitely worth it. I'm putting together a little example program, showing that. I have tested this against 4 other versions of Quicksort, and it beats them all.

#2) You should use a Binary or Indexed search, if a counting sort is not possible. What size is the smaller and larger array's, typically?

And very important, WHAT IS THE RANGE of the numbers in the arrays?

No repeating numbers in either array, right? That can be useful.

5. So, just clarifying... 'array1' a list of indexes into the master array, which you wish to sort. I.e. 1, 5, 4, 3, 6 refer to the master array elements with values: 3, 6, 32, 1, 5. You then want to generate an array indexes that refer to these items in sorted order.
I.e. You want the indexes for items with values 1, 3, 5, 6, 32 from the master array, which are: 3, 1, 6, 5, 4

It appears that you simply need to sort the index array with respect to the items from the master list. Assuming C++:
Code:
```#include <algorithm>
#include <iterator>
#include <iostream>

using namespace std;

static const int master[] = {7, 3, 9, 1, 32, 6, 5, 4, 99, 100, 201, 13};

bool myLessThan(int lhs, int rhs)
{
return master[lhs] < master[rhs];
}

int main()
{
int indexes[] = {1, 5, 4, 3, 6};
sort(indexes, indexes + sizeof(indexes)/sizeof(indexes[0]), myLessThan);
copy(indexes, indexes + sizeof(indexes)/sizeof(indexes[0]), ostream_iterator<int>(cout, "\t"));
}```

6. Algorithm should not exceed complexity O(N log N) where N is the size of the indexing array array1.

Code:
```int master[] = {7, 3, 9, 1, 32, 6, 5, 4, 99, 100, 201, 13};
int array1[] = {1, 5, 4, 3, 6};

int compare(const void *a, const void *b) {
return master[*(int *)a] - master[*(int *)b]; }

...
qsort(array1, sizeof(array1)/sizeof(int), sizeof(int), compare);
...```
Note, qsort was used here for testing. qsort is not guaranteed to be as good as O(N log N).

7. This is an example of the two best sorters, in action. Counting sort doesn't actually sort, but has the effect of sorting, if the range of values is not too large.

Code:
```#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAX 10000         //the size of A[] and B[], and the maximum integer value
#define REPEAT 200        //number of times the array is refilled, and resorted
#define SHOW 0             //SHOW = 1 prints the values, in sorted order
#define InsertNum 50      //When the Insertion sort is called for sub array sorting
/* If InsertNum > 60, the times for sorting begin to increase */

void check(int A[], int B[], int choice);  //checks accuracy of sort
void countingSort(int A[], int B[]);
void insertion(int A[], int, int);
void quicksortOpt(int A[], int l, int r);
void randomNums(int A[]);                //puts random int's into A[] array
void showIt(int A[], int B[], int choice);  //shows values

int main(void)  {
int i, l, r, choice;
clock_t start, stop;
int A[MAX];
int B[MAX] = { 0 };
l = 0; r = MAX-1;

printf("\n\n\n   Name of Algorithm         Seconds   Sorting %d Integers %d Times", MAX, REPEAT);
printf("\n   =========================================================================");

start = clock();
for(i = 0; i < REPEAT; i++) {
randomNums(A);
quicksortOpt(A,l,r);   //OK
}
stop = clock();
check(A, B, 0);
printf("\n   Quicksort Optimized       %7.4f   Insertion Threshold is: %d", (stop-start)/CLK_TCK, InsertNum);
if(SHOW) { showIt(A, B, 0); i = getchar(); }

start = clock();
for(i = 0; i < REPEAT; i++) {
randomNums(A);
countingSort(A,B);   //OK
}
stop = clock();
check(A, B, 1);
printf("\n   Counting Sort             %7.4f",   (stop-start)/CLK_TCK, InsertNum);
if(SHOW) { showIt(A, B, 1); i = getchar(); }

i = getchar();
return 0;
}
//================ End of Main ============================

/* In this example, 0's are excluded as a valid value */
//Counting Sort
void countingSort(int A[], int B[]) {
int i;
for(i = 0; i < MAX; i++)
if(A[i])
B[A[i]] = A[i];
}
//Insertion sort
void insertion(int A[], int lo, int hi) {  //lo and hi are local, not global
int value;
int ohi = hi;                    //ohi is the original value of hi
for(; lo < ohi; lo++)  {
value = A[lo];
hi = lo - 1;
while(A[hi] > value)  {
A[hi + 1] = A[hi];
--hi;
if(hi < 0) break;
}
A[hi + 1] = value;
}
}

//Optimzed Quicksort
void quicksortOpt(int A[], int lo, int hi) {
int i, j, pivot, temp;

if(lo == hi) return;

if((hi - lo) < InsertNum) {
insertion(A, lo, hi+1);
return;
}

i=lo;
j=hi;
pivot= A[(lo+hi)/2];

/* Split the array into two parts */
do {
while (A[i] < pivot) i++;
while (A[j] > pivot) j--;
if (i<=j) {
//swap(A, i, j);   //this line works fine.
/* Oddly, on large N, using swap() gives the same time, as
using the swap code below */
temp= A[i];
A[i]= A[j];
A[j]=temp;
i++;
j--;
}
} while(i<=j);

if (lo < j) quicksortOpt(A, lo, j);
if (i < hi) quicksortOpt(A, i, hi);

}
void randomNums(int A[]) {
int i;
for(i = 0; i < MAX; i++)
A[i] = rand() % MAX;
}

/* checks A[] (0), or B[] (1) , depending on the value of choice */
void check(int A[], int B[], int choice) {
int i;
if(!choice) {  //check A's contents directly
for(i = 0; i < MAX - 2; i++) {
if(A[i] > A[i+1]) {   //an error
printf("\n\nError!: %d ", A[i]);
return;
}
}
}
else {   //check A's contents through the index array
for(i = 0; i < MAX - 2; i++) {
if(B[i] > B[i+1] && B[i+1]) {   //an error
printf("\n\nError!: %d ", B[i]);
return;
}
}
}
}

/* shows either A[]'s (0), or B[]'s (1) value, depending on choice */
void showIt(int A[], int B[], int choice) {
int i;
printf("\n\n");
if(!choice) {    //regular "direct" display
for(i = 0; i < MAX; i++)
printf("%7d ", A[i]);
}
else {
for(i = 0; i < MAX; i++)
if(B[i] > 0)
printf("%7d ", B[i]);
}
}```

Note on Counter Sort:
Any check for a number N in the array B[] is nearly instantaneous:

if(B[N]) (if B was originally set for all 0's), or if(B[N] == N], otherwise.

Search is eliminated. It's just a lookup.

The system can sort strings very quickly - faster than anything I've ever seen, but with integers, this Quicksort is faster. It is much faster than C's qsort().

But Counter Sort blows everything away, *IF* it can be used on your problem.

8. If I could confirm that this wasn't homework then I would have given an answer using C. Hopefully I gave you enough of an idea with what I posted. The advantage of std::sort vs qsort is that it is guaranteed to be O(n*logn).
As noted, to optimise furthur it would be useful to know what restrictions we can assume on the data in the master array. For example, will all values be positive?
Is there some minimum and maximum possible value for entries in the master array?
Are you able to assume there are no duplicates in the master array? What about the index arrays?
What about the distribution of the values in the master array, can we assume they are relatively evenly distributed?

9. He said there were no duplicates, anywhere.

I don't know if it's homework or not, but I believe whatever we put up, it should help further knowledge, to him, and maybe to his classroom, as well. It's dreadful having all this bubble sort type of sorting programs, and so little time and few posts are focused on optimized sorting programs.

10. Just for clarification, this is just part of a much bigger project I am working on for a class....(writing a fault simulator), so using other people's code for a fast sort is fine....however I figured out a fast algorithm!!!!

Thanks to nonoob!!! Although the comparison was wrong, it led me to the right idea:

n lg(n) algorithm;

I created a new array called location[].
While I am creating my master list, I store the location of each data value into location. For example, if in the master list, data value 5 is in location index 10, I do location[5] = 10. Now, for any given data value in array1, I can look up what its index in the master list is in constant time.

I then did a qsort(array1, length of array1, sizeof(int), compare);

Where compare is:
{
return location[a] - location[b] //not master!!!
}

qsort here gives n log n (array1 is pretty random and has unique values) and no dependency on M anymore.

Thanks for all the responses!