Natural Mergesort

**KONI** · 04-10-2007

iMalc wrote that you need 200% more space than the original array, since you need two temporary arrays, that you alternately write to (depending on the condition he explained). When you arrive at the end of your original array, you merge the two temporary arrays according to his criteria and then start again, at the beginning of the input array, until you only write to one temporary array only (in which case the numbers are sorted).

**KONI** · 04-10-2007

To complete my sorting algorithm collection, I tried implementing the natural mergesort that iMalc suggested, following the exact description he gave in this thread (code is below here). The thing is that this is incredibly slow, 0.87 seconds on an array of size 10k compared to bubbleSort that performed in 0.86 seconds on the same values. A classic mergeSort takes 0.0 seconds (value to small to be visible) for the same values.

I wonder if I have a mistake in my code (I tested it with debugging and it works just as expected) or is that thing really that slow ??

Code:

static void naturalMerge(int **in, int *out, size_t size, int *tmpSize);

void naturalMergeSort(int *array, int size)
{
    int i = 0;
    /* switchArray points alternately to our 2 tmp arrays */
    int switchArray;

    /* two temporary arrays */
    int **tmp;

    /* two index, one for each temporary array */
    int index[2];

    tmp = malloc(2 * sizeof(*tmp));
    tmp[0] = malloc(size * sizeof(**tmp));
    tmp[1] = malloc(size * sizeof(**tmp));

    do
    {
        /* reset everything to the start */
        i = 0;
        switchArray = 0;
        index[0] = 0;
        index[1] = -1;

        /* write the first element of our array into the first tmp array */
        tmp[switchArray][index[switchArray]] = array[i];
        for (i = 1; i < size; i++)
        {
            /* check if we can still build a monotonic sequence */
            if (array[i] >= tmp[switchArray][index[switchArray]])
            {
                tmp[switchArray][++index[switchArray]] = array[i];
            }
            else
            {
                switchArray = (++switchArray)&#37;2;
                tmp[switchArray][++index[switchArray]] = array[i];
            }
        }
        /* merge the two temporary arrays into our original array */
        naturalMerge(tmp, array, size, index);
    } while (index[1] != -1);
    free(tmp[0]);
    free(tmp[1]);
    free(tmp);
}

/**
 * in[0] and in[1] are our two temporary arrays we want to merge
 * out is the original array the two temporary arrays are merged into
 * size is the size of the original array
 * tmpSize is the size of in[0], in[1]
 */

void naturalMerge(int **in, int *out, size_t size, int *tmpSize)
{
    int i, j = 0, k = 0;
    for (i = 0; i < size; i++)
    {
        if (j > tmpSize[0] || k > tmpSize[1])
            break;
        if(in[0][j] < in[1][k])
        {
            out[i] = in[0][j++];
        }
        else
        {
            out[i] = in[1][k++];
        }
    }

    while (j <= tmpSize[0])
        out[i++] = in[0][j++];

    while (k <= tmpSize[1])
        out[i++] = in[1][k++];
}

**iMalc** · 04-10-2007

Actually I left one thing out of my description.
When merging the two arrays back into one, you have 3 cases:

1. Both items in the front of the queues are higher than the last item added to the merged queue. In this case it is obviously best to add the smaller of the two to the merged queue.

2. Both items in the front of the queues are lower than the last item added to the merged queue. In this case it is also best to add the smaller of the two to the merged queue so that the next run is as long as possible.

3. One of the queues has a front item less than the last item in the merged queue, but the other has a front item greater than the last item in the merged queue. However it currently puts the smaller item in the queue, causing it to prematurely start a new run. Correcting this actually requires adding two extra comparisons, but it brings it back to being an O(nlogn) sort instead of an O(n*n) which you will currently be experiencing. Sorry this was missing from my original description.
What you have so far looks pretty good.

I actually found this problem in the implementation on Prelude's site some time ago and let her know, and she promptly corrected it. So it was silly of me to not remember it this time.

Once again, sorry for attacking you, I definitely got the wrong impression and can't have been in the best of moods or something.

I actually didn't have the natural bitonic sort in my sorting demo program (not sure why), but now I do. Would you believe that it now contains over 65 array sorting algorithms! And I stil haven't got the darn thing on my site yet...

**wuzzo87** · 04-11-2007

KONI,
I wanted to test your algorithm, but, well, i failed my own test....

I changed it a bit to suit my record structure.

Code:

void mergesortN(Record *a, int n, int (*comp_func)(const, const))
{
    printf("Debug 0\n");
    Record *b = malloc(sizeof(Record)*n);
    Record *c = malloc(sizeof(Record)*n);

    int i=0;
    int j=0;
    int k=-1;

    *(b+j) = *(a+i);

    printf("Debug 1\n");

    while(i<=n)
    {
            for(i=1; i<n; i++)
            {
                if(comp_func((a+i),(b+j)) >= 0)
                {
                        *(b+(++j)) = *(a+i);
                }
                else
                {
                        *(c+(++k)) = *(a+i);
                }
            }
            printf("Debug 2\n");
            mergeN(a, b, c,  n, j, k, comp_func);
    }

    free(b);
    free(c);
}


void mergeN(Record *a, Record *b, Record *c, int n, int index1, int index2, int (*comp_func)(const, const))
{
        int i, j=0, k=0;

        printf("Debug 3\n");

        for(i=0; i<n; i++)
        {
                if((j > index1) || (k > index2))
                        break;
                if(comp_func((b+j), (c+k)) < 0)
                        *(a+i) = *(b+j++);
                else
                        *(a+i) = *(c+k++);
        }

        printf("Debug 4\n");

        while(j <= index1)
                *(a+i++) = *(b+j++);
        while(k <= index2)
                *(a+i++) = *(c+k++);
}

I hope i'm not doing it terribly wrong, but why do i get a 'Segmentation Fault' ?
Do you think that there's a more efficient method than using 200% space?

**wuzzo87** · 04-11-2007

Originally Posted by KONI

Seriously, I tried to make sense of what you wrote but I just don't understand it. Just split your initial array recursively in half, cutting at the middle.

KONI, you mention about splitting it in half recursively at the middle, and to exploit naturally present bitonic sequences,

So, would it be that i split it recursively like a top-down mergesort, but,
merge(function) it differently?

**KONI** · 04-11-2007

Originally Posted by wuzzo87

KONI, you mention about splitting it in half recursively at the middle, and to exploit naturally present bitonic sequences,

So, would it be that i split it recursively like a top-down mergesort, but,
merge(function) it differently?

Ignore that post, that was about the classic merge sort that doesn't exploit any natural order. Please refer to the code I posted above and also to the comment iMalc wrote that fixed the O(n*n) into O(n*log n) complexity.

**wuzzo87** · 04-12-2007

Hey guys, i wanted to the performance in terms of comparisons of this algorithm from http://www.inf.fh-flensburg.de/lang/...e/natmerge.htm
against my other bottom-up mergesort program.
I implemented a static int that's pass from main to the functions.
But the result is always 0.... why?

Here's my translation of C++ to C:

Code:

#include<stdio.h>
#include<stdlib.h>
#include<stdbool.h>

void sort(int array[],int size, int comp);
static bool mergeruns(int a[], int b[], int n, int comp);
void merge(int a[], int b[], int lo, int hi, bool asc, int comp);
void naturalmergesort(int a[], int b[], int n, int comp);


int main(int argc, char **argv)
{
        int array[50];
        int item,j, i=0;
        static int comp=0;

        while(scanf("%d", &item) > 0)
        {
                array[i] = item;
                i++;
        }

        sort(array,i,comp);

        for(j=0; j < i; j++)
                printf("%d ", array[j]);

        printf("\n");
        printf("%d comparisons made\n", comp);

        return 0;
}

void sort(int array[],int size, int comp)
{
        int b[50];
        int n=size;
        naturalmergesort(array, b, n, comp);
}


static bool mergeruns(int a[], int b[], int n, int comp)
{
        int i=0, k=0, x;
        bool asc=true;

        while (i<n)
        {
            k=i;
            do x=a[i++]; while (i<n && x<=a[i]);
            while (i<n && x>=a[i]) x=a[i++];
            merge (a, b, k, i-1, asc, comp);
            asc=!asc;
        }
        return k==0;
}


void merge(int a[], int b[], int lo, int hi, bool asc, int comp)
{
        int l = lo; int r = hi;
        int k=asc ? l : r;
        int c=asc ? 1 : -1;
        int i=lo, j=hi;

        while (i<=j)
        {
            if (a[i]<=a[j])
            {
                b[k]=a[i++];
                comp++;
            }
            else
            {
                b[k]=a[j--];
                comp++;
            }
            k+=c;
        }
}


void naturalmergesort(int a[], int b[], int n, int comp)
{
        while (!mergeruns(a, b, n, comp) & !mergeruns(b, a, n, comp));
}

**quzah** · 04-12-2007

What is it you think you're getting by using the keyword static? Explain what you think it's doing for you, and we'll explain what it's actually doing for you.

It will also give us an idea as to what you think your code should be doing. We may be able to locate the source of your confusion.

Quzah.

**wuzzo87** · 04-12-2007

Hmm...
Ok, I used static thinking that, it'll be initialised to 0 once, in the start of the main func,
then it get passed to the functions, [to sort() then naturalmergesort () then mergeruns() then merge() - where it gets increamented] and since it's static so the value to its final increament should hold where i then print it out once all the sorting is done.
Then i should know how many comparisons were made when merging.

Tht's what i think.....at least

**quzah** · 04-12-2007

No, the static keyword doesn't persist through function calls. It's only local to that function (or as a file-scope limiter if used on a "global"). Its use is so that repeat calls to the same function will retain its value. Since you don't actually call main, there's no point in it being static.

What you're looking for is a pointer.

Quzah.

**KONI** · 04-13-2007

If you want to get a performance reference other than time, you could simply measure the clock() difference between the start and the end of the program and not divide that by CLOCKS_PER_SEC. In that case you will know exactly how many clocks the algorithm needed.

**dwks** · 04-13-2007

If you want to get a performance reference other than time, you could simply measure the clock() difference between the start and the end of the program and not divide that by CLOCKS_PER_SEC. In that case you will know exactly how many clocks the algorithm needed.

Some compilers however make CLOCKS_PER_SEC different values; this means that (time_1)1 might mean one second or one millisecond or one 18.2th of a second. It could mean anything. I would suggest time*1000/CLOCKS_PER_SEC to get milliseconds; that way, if CLOCKS_PER_SEC is 1, you don't lose any data, but if it's 1000 or something else, you don't get rid of the milliseconds.

Code:

double t = difftime(end, start) * 1000.0 / CLOCKS_PER_SEC;

Actually, if I wanted that kind of detail, I'd probably use platform-specific functions. Here's some code I wrote a long time ago for Windows:

Code:

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <windows.h>

enum {LOOPS = 100};

LARGE_INTEGER stime;

void to_time(void);
void start_time(void);
long end_time(void);

int main(void) {
    int x;
    long ttime = 0, t;

    for(x = 0; x < LOOPS; x ++) {
        start_time();
        to_time();

        t = end_time();
        ttime += t;
        printf("\r(&#37;5i/%5i) %li ms", x, LOOPS, t);
    }

    printf("\rAverage: %.4f ms \n", (double)ttime/LOOPS);

    return 0;
}

void to_time(void) {
    int x;

    for(x = 0; x < 1000000; x ++);
}

void start_time(void) {
    QueryPerformanceCounter(&stime);
}

long end_time(void) {
    LARGE_INTEGER diff;
    LARGE_INTEGER freq;

    QueryPerformanceCounter(&diff);
    diff.QuadPart -= stime.QuadPart;
    diff.QuadPart *= 1000;  /* Adjust to milliseconds, shouldn't overflow */
    QueryPerformanceFrequency(&freq);

    return ((long)(diff.QuadPart / freq.QuadPart));
}

**wuzzo87** · 04-13-2007

Hi guys,
I actually need to also count the comparisons, but I thank you for the timing suggestion too , of which i will be using and is of great help.

However, what units will the result be for the time difference of clock()? My result is always 0.0000 .... I'm running it on Mandrake 10.1....
It's under >> man clock? ...because they said, clock ticks are miliseconds.

And apart from that, can i get a lil help on pointers again? My comparison pointer keeps on printing 0....

Here's what i did..

Code:

int main(int argc, char **argv)
{
        int array[50];
        int item,j, i=0;
        int *comp = malloc(sizeof(int));
        int comp_result;
        clock_t start, stop;
        double t;

        while(scanf("%d", &item) > 0)
        {
                array[i] = item;
                i++;
        }

        start = clock();

        sort(array,i,comp);

        stop = clock();

        t = (double)(stop-start);

        comp_result = *comp;

        for(j=0; j < i; j++)
                printf("%d ", array[j]);

        printf("\n");
        printf("%d comparisons made\n", comp_result);
        printf("Time taken : %.10lf\n", t);


        return 0;
}

void sort(int array[],int size, int *comp)
{
        int b[50];
        int n=size;
        naturalmergesort(array, b, n, comp);
}


static bool mergeruns(int a[], int b[], int n, int *comp)
{
        int i=0, k=0, x;
        bool asc=true;

        while (i<n)
        {
            k=i;
            do x=a[i++]; while (i<n && x<=a[i]);
            while (i<n && x>=a[i]) x=a[i++];
            merge (a, b, k, i-1, asc, comp);
            asc=!asc;
        }
        return k==0;
}

void merge(int a[], int b[], int lo, int hi, bool asc, int *comp)
{
        int l = lo; int r = hi;
        int k=asc ? l : r;
        int c=asc ? 1 : -1;
        int i=lo, j=hi;
        int comparisons = 0;

        while (i<=j)
        {
            if (a[i]<=a[j])
            {
                b[k]=a[i++];
                comparisons++;
            }
            else
            {
                b[k]=a[j--];
                comparisons++;
            }
            k+=c;
        }
        comp = &comparisons;
}


void naturalmergesort(int a[], int b[], int n, int *comp)
{
        while (!mergeruns(a, b, n, comp) & !mergeruns(b, a, n, comp));
}

**quzah** · 04-13-2007

Originally Posted by wuzzo87

Hi guys,
I actually need to also count the comparisons, but I thank you for the timing suggestion too , of which i will be using and is of great help.

However, what units will the result be for the time difference of clock()? My result is always 0.0000

Yes, we were discussing this very thing.

Quzah.

**wuzzo87** · 04-14-2007

Originally Posted by wuzzo87

And apart from that, can i get a lil help on pointers again? My comparison pointer keeps on printing 0....

Someone help me with pointers plz...
it's the last bit of my prog and it's all done. waahhaahaa...

Thread: Natural Mergesort

Thread Tools

Search Thread

Display

static counter for comparisons

pointer help! & clock timing units

Similar Threads

Mergesort in C

How do I do MergeSort versus QuickSort instead?

Mergesort

Linked Lists + MergeSort....(how?)

natural leg movement