Merge-Sort Algorithm in C

**Quant89** · 07-24-2013

Hi,

I'm trying to implement the Merge-Sort algorithm. I only had the pseudocode for it and have some problems coding this into C.

I'm still relatively new to C, have only covered pointers recently and I tried using them, which did not work. I started with the code for the merge algorithm and only used a 10 element array, which was already divided into two sorted subarrays:

Code:

#include <stdio.h>#include <stdlib.h>




int main()
{
    
    int a[5]={1,5,6,10,13}, b[5]={4,8,9,10,14},c[10], *i,*j,k;
    i=&a[0];
    j=&b[0];
    for (k=0;k<10;k++){
        if (*i<*j){
            c[k]=*i;
            i++;}
        else {
            c[k]=*j;
            j++;}
        }
    for (k=0;k<10;k++)
        printf("%d ",c[k]);
    return 0;
}

This is the result that I get:

Code:

1 4 5 6 8 9 10 10 13 0

So I think the problem occurs because in the second to last loop i is incremented again, but the end of the array is already reached, and the compiler has no element a[6] to compare with *j in the last run of the loop.

Does anybody know how I can solve this problem? Is there generally a better way to implement Merge?

**std10093** · 07-24-2013

I had implemented the algorithm here. You can take a look if you like

**Malcolm McLean** · 07-24-2013

With merge sort you have a set of N unsorted elements. A list of 1 is by definition sorted, so that gives you N sorted lists. Join each one with a neighbour, so that gives you N/2 unsorted lists, maybe plus one if N is odd. Now we run N/2 routines to join two sorted lists, which you do by keeping two pointers i and j to the current positions (you can only insert from the head of the lists, because lista[i+1] and listb[j+1] must always be bigger than lists[i] and listb[j]). Whilst there are more efficient answers, just malloc() a temporary buffer for now to build the new list, then copy it.
Now we;ve got N/2 sorted lists, so join each list to a neighbour, and go through the process again. The number of lists gradually decreases, until you have only one list.

So the main control routine should be while(Nlists > 1). You always know the lengths of the lists, but get the code working first on exact powers of two, then add the fiddly code to get the length of the last list when it's not a power of two when everything else is OK.

**Quant89** · 07-24-2013

Originally Posted by Malcolm McLean

With merge sort you have a set of N unsorted elements. A list of 1 is by definition sorted, so that gives you N sorted lists. Join each one with a neighbour, so that gives you N/2 unsorted lists, maybe plus one if N is odd. Now we run N/2 routines to join two sorted lists, which you do by keeping two pointers i and j to the current positions (you can only insert from the head of the lists, because lista[i+1] and listb[j+1] must always be bigger than lists[i] and listb[j]). Whilst there are more efficient answers, just malloc() a temporary buffer for now to build the new list, then copy it.
Now we;ve got N/2 sorted lists, so join each list to a neighbour, and go through the process again. The number of lists gradually decreases, until you have only one list.

So the main control routine should be while(Nlists > 1). You always know the lengths of the lists, but get the code working first on exact powers of two, then add the fiddly code to get the length of the last list when it's not a power of two when everything else is OK.

Thanks for your answer.

Why should I use lists and not arrays? Can't I do the same with arrays?

This is the pseudocode I got:

Code:

C = output [N]
A = 1st sorted array [N/2]
B = 2nd sorted array [N/2]
i = 1
j = 1

for k=1 to N
 if A(i)<B(j)
  C(k) = A(i)
  i++
 else
  C(k) = B(i)
  j++
end

where i and j are pointers

So I would like to do that this way, but I got the problem that once I reach the end of one array, I cannot find a way to automatically insert the remaining elements of the array for which the pointer has not yet reached the end of the array. I don't know much of the malloc() function besides that you can somehow create memory space that way and let a pointer point to that memory. Do I somehow have to use this here?

**vart** · 07-24-2013

Code:

for k=1 to N
 if (j == N/2) || (i < N/2 && A(i)<B(j))
  C(k) = A(i)
  i++
 else
  C(k) = B(j) //typo
  j++
end

**guraknugen** · 07-24-2013

What is going wrong here, seems to be that you don't check for when you reach the end of those arrays (a and b). In this case a ends first and i points to a place with an undefined content. In your case it was 0, in my case when testing your code, it was 4…
When one of the arrays a and b reach is end, you only need to fill c with the rest of the other one.

For example:

Code:

int a[5]={1,13,15,17,18};
int b[5]={7,19,25,27,28};

Now, when we reach the value of 18 in a, there are still four variables left in b, so all we need to do is to fill the rest of c with these…

I wrote my own test and it worked eventually (I'm kind of a beginner too…), but I found the code quite clumsy so I won't bother you with it…

**rcgldr** · 07-24-2013

A quick fix would just add checks for end of arrays to the if used to compare the values in the arrays:

Code:

/* ... */
        if((j >= &b[5]) || (i < &a[5]) && (*i < *j)){
            c[k] = *i;
            i++;}
        else{
            c[k] = *j;
            j++;}

This adds the unneeded overhead of checking for the end of arrays on every loop. Note that std100093's example code also has this same unneeded overhead. You could put the checks after to reduce this. Since a large array can be sorted in a few seconds, the overhead of the extra checks isn't a big deal, but I provided example code with post checks below.

Originally Posted by Quant89

Why should I use lists and not arrays?

You can implement "lists" as parts of an array.

Example merge array code with the checks after a move, using indexes instead of pointers. This code assumes that both a[] and b[] have at least one value. To handle the case where a[] or b[] start off with zero values, do a one time check at the start of the merge array code.

Code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    int a[5]={1,5,6,10,13}, b[5]={4,8,9,10,14},c[10], i,j,k;
    i = 0;
    j = 0;
    k = 0;
    while(1){
        if(a[i] < b[j]){
            c[k++] = a[i++];
            if(i >= 5){
                do
                    c[k++] = b[j++];
                while(j < 5);
                break;
            }
        }
        else{
            c[k++] = b[j++];
            if(j >= 5){
                do
                    c[k++] = a[i++];
                while(i < 5);
                break;
            }
        }
    }
    for (k=0; k<10; k++)
        printf("%d ",c[k]);
    return 0;
}

**Quant89** · 07-25-2013

Originally Posted by rcgldr

A quick fix would just add checks for end of arrays to the if used to compare the values in the arrays:

Code:

/* ... */
        if((j >= &b[5]) || (i < &a[5]) && (*i < *j)){
            c[k] = *i;
            i++;}
        else{
            c[k] = *j;
            j++;}

This adds the unneeded overhead of checking for the end of arrays on every loop. Note that std100093's example code also has this same unneeded overhead. You could put the checks after to reduce this. Since a large array can be sorted in a few seconds, the overhead of the extra checks isn't a big deal, but I provided example code with post checks below.

You can implement "lists" as parts of an array.

Example merge array code with the checks after a move, using indexes instead of pointers. This code assumes that both a[] and b[] have at least one value. To handle the case where a[] or b[] start off with zero values, do a one time check at the start of the merge array code.

Code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    int a[5]={1,5,6,10,13}, b[5]={4,8,9,10,14},c[10], i,j,k;
    i = 0;
    j = 0;
    k = 0;
    while(1){
        if(a[i] < b[j]){
            c[k++] = a[i++];
            if(i >= 5){
                do
                    c[k++] = b[j++];
                while(j < 5);
                break;
            }
        }
        else{
            c[k++] = b[j++];
            if(j >= 5){
                do
                    c[k++] = a[i++];
                while(i < 5);
                break;
            }
        }
    }
    for (k=0; k<10; k++)
        printf("%d ",c[k]);
    return 0;
}

Hey thanks a lot guys, got my code to run as intended now, really appreciate your answers.

One more question though, I got my code to run using both pointers and simply integer variables i and j as indices. Is there any particular benefit using pointers in arrays, besides speed?

**rcgldr** · 07-25-2013

Originally Posted by Quant89

Is there any particular benefit using pointers in arrays, besides speed?

For most programs, the main benefit is slightly faster speed using pointers (on X86 systems, performance would be affected more on a processor that did not have native indexing). You can look at your two programs and the number of lines of code will be about the same.

Note the example code I posted merges two pre-sorted arrays. A merge sort would start off with an unsorted array. You could post your code if you want, either the indexing version or the pointer version.

**Adak** · 07-25-2013

Indices are simply offsets from the array's starting location. Before you say "pointers are faster than indices", be sure to test that idea thoroughly. There are a LOT of good sounding theories about what is faster, that simply don't hold up under testing.

**rcgldr** · 07-25-2013

Originally Posted by Adak

Before you say "pointers are faster than indices", be sure to test that idea thoroughly.

On an X86, indexing into a static array specifies an index register and an offset. The offset requires slightly more code space than a pointer instruction. For a dynamic array, if the offset to the array is already in a register, then the indexing instruction won't need an offset. The worst case scenario is the offset of the array had to be loaded, and the index also loaded. In the case of a pointer, no offset is needed, so many instructions will take fewer bytes. The worst case scenario only requires one load. It's probably not much difference. In a test program I wrote to sort a vector of 1 million 20 character records, using indexes was about 5% slower than using pointers.

Thread: Merge-Sort Algorithm in C

Thread Tools

Search Thread

Display

Merge-Sort Algorithm in C

Similar Threads

Merge sort fails to sort

Merge Sort the best sorting algorithm?

Quick Sort or Merge Sort???

merge sort and selection sort and time spent on both

sort algorithm (merge?)