# Thread: Merge-Sort Algorithm in C

1. ## Merge-Sort Algorithm in C

Hi,

I'm trying to implement the Merge-Sort algorithm. I only had the pseudocode for it and have some problems coding this into C.

I'm still relatively new to C, have only covered pointers recently and I tried using them, which did not work. I started with the code for the merge algorithm and only used a 10 element array, which was already divided into two sorted subarrays:

Code:
```#include <stdio.h>#include <stdlib.h>

int main()
{

int a[5]={1,5,6,10,13}, b[5]={4,8,9,10,14},c[10], *i,*j,k;
i=&a[0];
j=&b[0];
for (k=0;k<10;k++){
if (*i<*j){
c[k]=*i;
i++;}
else {
c[k]=*j;
j++;}
}
for (k=0;k<10;k++)
printf("%d ",c[k]);
return 0;
}```
This is the result that I get:

Code:
`1 4 5 6 8 9 10 10 13 0`
So I think the problem occurs because in the second to last loop i is incremented again, but the end of the array is already reached, and the compiler has no element a[6] to compare with *j in the last run of the loop.

Does anybody know how I can solve this problem? Is there generally a better way to implement Merge?

2. I had implemented the algorithm here. You can take a look if you like

3. With merge sort you have a set of N unsorted elements. A list of 1 is by definition sorted, so that gives you N sorted lists. Join each one with a neighbour, so that gives you N/2 unsorted lists, maybe plus one if N is odd. Now we run N/2 routines to join two sorted lists, which you do by keeping two pointers i and j to the current positions (you can only insert from the head of the lists, because lista[i+1] and listb[j+1] must always be bigger than lists[i] and listb[j]). Whilst there are more efficient answers, just malloc() a temporary buffer for now to build the new list, then copy it.
Now we;ve got N/2 sorted lists, so join each list to a neighbour, and go through the process again. The number of lists gradually decreases, until you have only one list.

So the main control routine should be while(Nlists > 1). You always know the lengths of the lists, but get the code working first on exact powers of two, then add the fiddly code to get the length of the last list when it's not a power of two when everything else is OK.

4. Originally Posted by Malcolm McLean
With merge sort you have a set of N unsorted elements. A list of 1 is by definition sorted, so that gives you N sorted lists. Join each one with a neighbour, so that gives you N/2 unsorted lists, maybe plus one if N is odd. Now we run N/2 routines to join two sorted lists, which you do by keeping two pointers i and j to the current positions (you can only insert from the head of the lists, because lista[i+1] and listb[j+1] must always be bigger than lists[i] and listb[j]). Whilst there are more efficient answers, just malloc() a temporary buffer for now to build the new list, then copy it.
Now we;ve got N/2 sorted lists, so join each list to a neighbour, and go through the process again. The number of lists gradually decreases, until you have only one list.

So the main control routine should be while(Nlists > 1). You always know the lengths of the lists, but get the code working first on exact powers of two, then add the fiddly code to get the length of the last list when it's not a power of two when everything else is OK.

Why should I use lists and not arrays? Can't I do the same with arrays?

This is the pseudocode I got:

Code:
```C = output [N]
A = 1st sorted array [N/2]
B = 2nd sorted array [N/2]
i = 1
j = 1

for k=1 to N
if A(i)<B(j)
C(k) = A(i)
i++
else
C(k) = B(i)
j++
end

where i and j are pointers```
So I would like to do that this way, but I got the problem that once I reach the end of one array, I cannot find a way to automatically insert the remaining elements of the array for which the pointer has not yet reached the end of the array. I don't know much of the malloc() function besides that you can somehow create memory space that way and let a pointer point to that memory. Do I somehow have to use this here?

5. Code:
```for k=1 to N
if (j == N/2) || (i < N/2 && A(i)<B(j))
C(k) = A(i)
i++
else
C(k) = B(j) //typo
j++
end```

6. What is going wrong here, seems to be that you don't check for when you reach the end of those arrays (a and b). In this case a ends first and i points to a place with an undefined content. In your case it was 0, in my case when testing your code, it was 4…
When one of the arrays a and b reach is end, you only need to fill c with the rest of the other one.

For example:
Code:
```int a[5]={1,13,15,17,18};
int b[5]={7,19,25,27,28};```
Now, when we reach the value of 18 in a, there are still four variables left in b, so all we need to do is to fill the rest of c with these…

I wrote my own test and it worked eventually (I'm kind of a beginner too…), but I found the code quite clumsy so I won't bother you with it…

7. A quick fix would just add checks for end of arrays to the if used to compare the values in the arrays:

Code:
```/* ... */
if((j >= &b[5]) || (i < &a[5]) && (*i < *j)){
c[k] = *i;
i++;}
else{
c[k] = *j;
j++;}```
This adds the unneeded overhead of checking for the end of arrays on every loop. Note that std100093's example code also has this same unneeded overhead. You could put the checks after to reduce this. Since a large array can be sorted in a few seconds, the overhead of the extra checks isn't a big deal, but I provided example code with post checks below.

Originally Posted by Quant89
Why should I use lists and not arrays?
You can implement "lists" as parts of an array.

Example merge array code with the checks after a move, using indexes instead of pointers. This code assumes that both a[] and b[] have at least one value. To handle the case where a[] or b[] start off with zero values, do a one time check at the start of the merge array code.

Code:
```#include <stdio.h>
#include <stdlib.h>

int main()
{
int a[5]={1,5,6,10,13}, b[5]={4,8,9,10,14},c[10], i,j,k;
i = 0;
j = 0;
k = 0;
while(1){
if(a[i] < b[j]){
c[k++] = a[i++];
if(i >= 5){
do
c[k++] = b[j++];
while(j < 5);
break;
}
}
else{
c[k++] = b[j++];
if(j >= 5){
do
c[k++] = a[i++];
while(i < 5);
break;
}
}
}
for (k=0; k<10; k++)
printf("%d ",c[k]);
return 0;
}```

8. Originally Posted by rcgldr
A quick fix would just add checks for end of arrays to the if used to compare the values in the arrays:

Code:
```/* ... */
if((j >= &b[5]) || (i < &a[5]) && (*i < *j)){
c[k] = *i;
i++;}
else{
c[k] = *j;
j++;}```
This adds the unneeded overhead of checking for the end of arrays on every loop. Note that std100093's example code also has this same unneeded overhead. You could put the checks after to reduce this. Since a large array can be sorted in a few seconds, the overhead of the extra checks isn't a big deal, but I provided example code with post checks below.

You can implement "lists" as parts of an array.

Example merge array code with the checks after a move, using indexes instead of pointers. This code assumes that both a[] and b[] have at least one value. To handle the case where a[] or b[] start off with zero values, do a one time check at the start of the merge array code.

Code:
```#include <stdio.h>
#include <stdlib.h>

int main()
{
int a[5]={1,5,6,10,13}, b[5]={4,8,9,10,14},c[10], i,j,k;
i = 0;
j = 0;
k = 0;
while(1){
if(a[i] < b[j]){
c[k++] = a[i++];
if(i >= 5){
do
c[k++] = b[j++];
while(j < 5);
break;
}
}
else{
c[k++] = b[j++];
if(j >= 5){
do
c[k++] = a[i++];
while(i < 5);
break;
}
}
}
for (k=0; k<10; k++)
printf("%d ",c[k]);
return 0;
}```
Hey thanks a lot guys, got my code to run as intended now, really appreciate your answers.

One more question though, I got my code to run using both pointers and simply integer variables i and j as indices. Is there any particular benefit using pointers in arrays, besides speed?

9. Originally Posted by Quant89
Is there any particular benefit using pointers in arrays, besides speed?
For most programs, the main benefit is slightly faster speed using pointers (on X86 systems, performance would be affected more on a processor that did not have native indexing). You can look at your two programs and the number of lines of code will be about the same.

Note the example code I posted merges two pre-sorted arrays. A merge sort would start off with an unsorted array. You could post your code if you want, either the indexing version or the pointer version.

10. Indices are simply offsets from the array's starting location. Before you say "pointers are faster than indices", be sure to test that idea thoroughly. There are a LOT of good sounding theories about what is faster, that simply don't hold up under testing.