Thread: Natural Mergesort

  1. #16
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444
    iMalc wrote that you need 200% more space than the original array, since you need two temporary arrays, that you alternately write to (depending on the condition he explained). When you arrive at the end of your original array, you merge the two temporary arrays according to his criteria and then start again, at the beginning of the input array, until you only write to one temporary array only (in which case the numbers are sorted).

  2. #17
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444
    To complete my sorting algorithm collection, I tried implementing the natural mergesort that iMalc suggested, following the exact description he gave in this thread (code is below here). The thing is that this is incredibly slow, 0.87 seconds on an array of size 10k compared to bubbleSort that performed in 0.86 seconds on the same values. A classic mergeSort takes 0.0 seconds (value to small to be visible) for the same values.

    I wonder if I have a mistake in my code (I tested it with debugging and it works just as expected) or is that thing really that slow ??

    Code:
    static void naturalMerge(int **in, int *out, size_t size, int *tmpSize);
    
    void naturalMergeSort(int *array, int size)
    {
        int i = 0;
        /* switchArray points alternately to our 2 tmp arrays */
        int switchArray;
    
        /* two temporary arrays */
        int **tmp;
    
        /* two index, one for each temporary array */
        int index[2];
    
        tmp = malloc(2 * sizeof(*tmp));
        tmp[0] = malloc(size * sizeof(**tmp));
        tmp[1] = malloc(size * sizeof(**tmp));
    
        do
        {
            /* reset everything to the start */
            i = 0;
            switchArray = 0;
            index[0] = 0;
            index[1] = -1;
    
            /* write the first element of our array into the first tmp array */
            tmp[switchArray][index[switchArray]] = array[i];
            for (i = 1; i < size; i++)
            {
                /* check if we can still build a monotonic sequence */
                if (array[i] >= tmp[switchArray][index[switchArray]])
                {
                    tmp[switchArray][++index[switchArray]] = array[i];
                }
                else
                {
                    switchArray = (++switchArray)&#37;2;
                    tmp[switchArray][++index[switchArray]] = array[i];
                }
            }
            /* merge the two temporary arrays into our original array */
            naturalMerge(tmp, array, size, index);
        } while (index[1] != -1);
        free(tmp[0]);
        free(tmp[1]);
        free(tmp);
    }
    
    /**
     * in[0] and in[1] are our two temporary arrays we want to merge
     * out is the original array the two temporary arrays are merged into
     * size is the size of the original array
     * tmpSize is the size of in[0], in[1]
     */
    
    void naturalMerge(int **in, int *out, size_t size, int *tmpSize)
    {
        int i, j = 0, k = 0;
        for (i = 0; i < size; i++)
        {
            if (j > tmpSize[0] || k > tmpSize[1])
                break;
            if(in[0][j] < in[1][k])
            {
                out[i] = in[0][j++];
            }
            else
            {
                out[i] = in[1][k++];
            }
        }
    
        while (j <= tmpSize[0])
            out[i++] = in[0][j++];
    
        while (k <= tmpSize[1])
            out[i++] = in[1][k++];
    }

  3. #18
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Actually I left one thing out of my description.
    When merging the two arrays back into one, you have 3 cases:

    1. Both items in the front of the queues are higher than the last item added to the merged queue. In this case it is obviously best to add the smaller of the two to the merged queue.

    2. Both items in the front of the queues are lower than the last item added to the merged queue. In this case it is also best to add the smaller of the two to the merged queue so that the next run is as long as possible.

    3. One of the queues has a front item less than the last item in the merged queue, but the other has a front item greater than the last item in the merged queue. However it currently puts the smaller item in the queue, causing it to prematurely start a new run. Correcting this actually requires adding two extra comparisons, but it brings it back to being an O(nlogn) sort instead of an O(n*n) which you will currently be experiencing. Sorry this was missing from my original description.
    What you have so far looks pretty good.

    I actually found this problem in the implementation on Prelude's site some time ago and let her know, and she promptly corrected it. So it was silly of me to not remember it this time.

    Once again, sorry for attacking you, I definitely got the wrong impression and can't have been in the best of moods or something.

    I actually didn't have the natural bitonic sort in my sorting demo program (not sure why), but now I do. Would you believe that it now contains over 65 array sorting algorithms! And I stil haven't got the darn thing on my site yet...
    Last edited by iMalc; 04-10-2007 at 01:29 PM.

  4. #19
    Registered User
    Join Date
    Sep 2006
    Posts
    51
    KONI,
    I wanted to test your algorithm, but, well, i failed my own test....
    I changed it a bit to suit my record structure.
    Code:
    void mergesortN(Record *a, int n, int (*comp_func)(const, const))
    {
        printf("Debug 0\n");
        Record *b = malloc(sizeof(Record)*n);
        Record *c = malloc(sizeof(Record)*n);
    
        int i=0;
        int j=0;
        int k=-1;
    
        *(b+j) = *(a+i);
    
        printf("Debug 1\n");
    
        while(i<=n)
        {
                for(i=1; i<n; i++)
                {
                    if(comp_func((a+i),(b+j)) >= 0)
                    {
                            *(b+(++j)) = *(a+i);
                    }
                    else
                    {
                            *(c+(++k)) = *(a+i);
                    }
                }
                printf("Debug 2\n");
                mergeN(a, b, c,  n, j, k, comp_func);
        }
    
        free(b);
        free(c);
    }
    
    
    void mergeN(Record *a, Record *b, Record *c, int n, int index1, int index2, int (*comp_func)(const, const))
    {
            int i, j=0, k=0;
    
            printf("Debug 3\n");
    
            for(i=0; i<n; i++)
            {
                    if((j > index1) || (k > index2))
                            break;
                    if(comp_func((b+j), (c+k)) < 0)
                            *(a+i) = *(b+j++);
                    else
                            *(a+i) = *(c+k++);
            }
    
            printf("Debug 4\n");
    
            while(j <= index1)
                    *(a+i++) = *(b+j++);
            while(k <= index2)
                    *(a+i++) = *(c+k++);
    }
    I hope i'm not doing it terribly wrong, but why do i get a 'Segmentation Fault' ?
    Do you think that there's a more efficient method than using 200% space?
    Last edited by wuzzo87; 04-11-2007 at 04:17 AM. Reason: mistake

  5. #20
    Registered User
    Join Date
    Sep 2006
    Posts
    51
    Quote Originally Posted by KONI View Post
    Seriously, I tried to make sense of what you wrote but I just don't understand it. Just split your initial array recursively in half, cutting at the middle.
    KONI, you mention about splitting it in half recursively at the middle, and to exploit naturally present bitonic sequences,

    So, would it be that i split it recursively like a top-down mergesort, but,
    merge(function) it differently?

  6. #21
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444
    Quote Originally Posted by wuzzo87 View Post
    KONI, you mention about splitting it in half recursively at the middle, and to exploit naturally present bitonic sequences,

    So, would it be that i split it recursively like a top-down mergesort, but,
    merge(function) it differently?
    Ignore that post, that was about the classic merge sort that doesn't exploit any natural order. Please refer to the code I posted above and also to the comment iMalc wrote that fixed the O(n*n) into O(n*log n) complexity.

  7. #22
    Registered User
    Join Date
    Sep 2006
    Posts
    51

    static counter for comparisons

    Hey guys, i wanted to the performance in terms of comparisons of this algorithm from http://www.inf.fh-flensburg.de/lang/...e/natmerge.htm
    against my other bottom-up mergesort program.
    I implemented a static int that's pass from main to the functions.
    But the result is always 0.... why?

    Here's my translation of C++ to C:
    Code:
    #include<stdio.h>
    #include<stdlib.h>
    #include<stdbool.h>
    
    void sort(int array[],int size, int comp);
    static bool mergeruns(int a[], int b[], int n, int comp);
    void merge(int a[], int b[], int lo, int hi, bool asc, int comp);
    void naturalmergesort(int a[], int b[], int n, int comp);
    
    
    int main(int argc, char **argv)
    {
            int array[50];
            int item,j, i=0;
            static int comp=0;
    
            while(scanf("%d", &item) > 0)
            {
                    array[i] = item;
                    i++;
            }
    
            sort(array,i,comp);
    
            for(j=0; j < i; j++)
                    printf("%d ", array[j]);
    
            printf("\n");
            printf("%d comparisons made\n", comp);
    
            return 0;
    }
    
    void sort(int array[],int size, int comp)
    {
            int b[50];
            int n=size;
            naturalmergesort(array, b, n, comp);
    }
    
    
    static bool mergeruns(int a[], int b[], int n, int comp)
    {
            int i=0, k=0, x;
            bool asc=true;
    
            while (i<n)
            {
                k=i;
                do x=a[i++]; while (i<n && x<=a[i]);
                while (i<n && x>=a[i]) x=a[i++];
                merge (a, b, k, i-1, asc, comp);
                asc=!asc;
            }
            return k==0;
    }
    
    
    void merge(int a[], int b[], int lo, int hi, bool asc, int comp)
    {
            int l = lo; int r = hi;
            int k=asc ? l : r;
            int c=asc ? 1 : -1;
            int i=lo, j=hi;
    
            while (i<=j)
            {
                if (a[i]<=a[j])
                {
                    b[k]=a[i++];
                    comp++;
                }
                else
                {
                    b[k]=a[j--];
                    comp++;
                }
                k+=c;
            }
    }
    
    
    void naturalmergesort(int a[], int b[], int n, int comp)
    {
            while (!mergeruns(a, b, n, comp) & !mergeruns(b, a, n, comp));
    }

  8. #23
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    What is it you think you're getting by using the keyword static? Explain what you think it's doing for you, and we'll explain what it's actually doing for you. It will also give us an idea as to what you think your code should be doing. We may be able to locate the source of your confusion.


    Quzah.
    Hope is the first step on the road to disappointment.

  9. #24
    Registered User
    Join Date
    Sep 2006
    Posts
    51
    Hmm...
    Ok, I used static thinking that, it'll be initialised to 0 once, in the start of the main func,
    then it get passed to the functions, [to sort() then naturalmergesort () then mergeruns() then merge() - where it gets increamented] and since it's static so the value to its final increament should hold where i then print it out once all the sorting is done.
    Then i should know how many comparisons were made when merging.

    Tht's what i think.....at least

  10. #25
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    No, the static keyword doesn't persist through function calls. It's only local to that function (or as a file-scope limiter if used on a "global"). Its use is so that repeat calls to the same function will retain its value. Since you don't actually call main, there's no point in it being static.

    What you're looking for is a pointer.


    Quzah.
    Hope is the first step on the road to disappointment.

  11. #26
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444
    If you want to get a performance reference other than time, you could simply measure the clock() difference between the start and the end of the program and not divide that by CLOCKS_PER_SEC. In that case you will know exactly how many clocks the algorithm needed.

  12. #27
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    If you want to get a performance reference other than time, you could simply measure the clock() difference between the start and the end of the program and not divide that by CLOCKS_PER_SEC. In that case you will know exactly how many clocks the algorithm needed.
    Some compilers however make CLOCKS_PER_SEC different values; this means that (time_1)1 might mean one second or one millisecond or one 18.2th of a second. It could mean anything. I would suggest time*1000/CLOCKS_PER_SEC to get milliseconds; that way, if CLOCKS_PER_SEC is 1, you don't lose any data, but if it's 1000 or something else, you don't get rid of the milliseconds.
    Code:
    double t = difftime(end, start) * 1000.0 / CLOCKS_PER_SEC;
    Actually, if I wanted that kind of detail, I'd probably use platform-specific functions. Here's some code I wrote a long time ago for Windows:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <stddef.h>
    #include <windows.h>
    
    enum {LOOPS = 100};
    
    LARGE_INTEGER stime;
    
    void to_time(void);
    void start_time(void);
    long end_time(void);
    
    int main(void) {
        int x;
        long ttime = 0, t;
    
        for(x = 0; x < LOOPS; x ++) {
            start_time();
            to_time();
    
            t = end_time();
            ttime += t;
            printf("\r(&#37;5i/%5i) %li ms", x, LOOPS, t);
        }
    
        printf("\rAverage: %.4f ms \n", (double)ttime/LOOPS);
    
        return 0;
    }
    
    void to_time(void) {
        int x;
    
        for(x = 0; x < 1000000; x ++);
    }
    
    void start_time(void) {
        QueryPerformanceCounter(&stime);
    }
    
    long end_time(void) {
        LARGE_INTEGER diff;
        LARGE_INTEGER freq;
    
        QueryPerformanceCounter(&diff);
        diff.QuadPart -= stime.QuadPart;
        diff.QuadPart *= 1000;  /* Adjust to milliseconds, shouldn't overflow */
        QueryPerformanceFrequency(&freq);
    
        return ((long)(diff.QuadPart / freq.QuadPart));
    }
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  13. #28
    Registered User
    Join Date
    Sep 2006
    Posts
    51

    pointer help! & clock timing units

    Hi guys,
    I actually need to also count the comparisons, but I thank you for the timing suggestion too , of which i will be using and is of great help.

    However, what units will the result be for the time difference of clock()? My result is always 0.0000 .... I'm running it on Mandrake 10.1....
    It's under >> man clock? ...because they said, clock ticks are miliseconds.

    And apart from that, can i get a lil help on pointers again? My comparison pointer keeps on printing 0....

    Here's what i did..

    Code:
    int main(int argc, char **argv)
    {
            int array[50];
            int item,j, i=0;
            int *comp = malloc(sizeof(int));
            int comp_result;
            clock_t start, stop;
            double t;
    
            while(scanf("%d", &item) > 0)
            {
                    array[i] = item;
                    i++;
            }
    
            start = clock();
    
            sort(array,i,comp);
    
            stop = clock();
    
            t = (double)(stop-start);
    
            comp_result = *comp;
    
            for(j=0; j < i; j++)
                    printf("%d ", array[j]);
    
            printf("\n");
            printf("%d comparisons made\n", comp_result);
            printf("Time taken : %.10lf\n", t);
    
    
            return 0;
    }
    
    void sort(int array[],int size, int *comp)
    {
            int b[50];
            int n=size;
            naturalmergesort(array, b, n, comp);
    }
    
    
    static bool mergeruns(int a[], int b[], int n, int *comp)
    {
            int i=0, k=0, x;
            bool asc=true;
    
            while (i<n)
            {
                k=i;
                do x=a[i++]; while (i<n && x<=a[i]);
                while (i<n && x>=a[i]) x=a[i++];
                merge (a, b, k, i-1, asc, comp);
                asc=!asc;
            }
            return k==0;
    }
    
    void merge(int a[], int b[], int lo, int hi, bool asc, int *comp)
    {
            int l = lo; int r = hi;
            int k=asc ? l : r;
            int c=asc ? 1 : -1;
            int i=lo, j=hi;
            int comparisons = 0;
    
            while (i<=j)
            {
                if (a[i]<=a[j])
                {
                    b[k]=a[i++];
                    comparisons++;
                }
                else
                {
                    b[k]=a[j--];
                    comparisons++;
                }
                k+=c;
            }
            comp = &comparisons;
    }
    
    
    void naturalmergesort(int a[], int b[], int n, int *comp)
    {
            while (!mergeruns(a, b, n, comp) & !mergeruns(b, a, n, comp));
    }

  14. #29
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by wuzzo87 View Post
    Hi guys,
    I actually need to also count the comparisons, but I thank you for the timing suggestion too , of which i will be using and is of great help.

    However, what units will the result be for the time difference of clock()? My result is always 0.0000
    Yes, we were discussing this very thing.


    Quzah.
    Hope is the first step on the road to disappointment.

  15. #30
    Registered User
    Join Date
    Sep 2006
    Posts
    51
    Quote Originally Posted by wuzzo87 View Post

    And apart from that, can i get a lil help on pointers again? My comparison pointer keeps on printing 0....
    Someone help me with pointers plz...
    it's the last bit of my prog and it's all done. waahhaahaa...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Mergesort in C
    By Oduig in forum C Programming
    Replies: 2
    Last Post: 09-14-2008, 11:30 AM
  2. How do I do MergeSort versus QuickSort instead?
    By dxfist in forum C++ Programming
    Replies: 9
    Last Post: 03-06-2008, 12:12 PM
  3. Mergesort
    By swanley007 in forum C++ Programming
    Replies: 6
    Last Post: 10-26-2005, 11:45 PM
  4. Linked Lists + MergeSort....(how?)
    By JoshR in forum C++ Programming
    Replies: 4
    Last Post: 06-03-2005, 02:40 PM
  5. natural leg movement
    By DavidP in forum Game Programming
    Replies: 32
    Last Post: 01-11-2004, 09:01 AM