# Thread: Any optimizations I could make to my generic merge sort?

1. ## Any optimizations I could make to my generic merge sort?

Besides using insertion sort when my array becomes nearly sorted, what other optimizations could I make to my code that would improve the run time of my program? Doesn't have to be the merge sort algorithm itself. Maybe there are some areas in my code that aren't as optimized and I would really appreciate if someone were to take a quick look at it.

main.c

Code:
```#include <stdio.h>
#include <stdlib.h>
#include "MergeSort.h"

int cmp(const void *, const void *);

int main()
{
size_t i;
int arr[] = {19, 56, 25, 17, 100, 58, 72, 1000, 98, 36, 14, 78, 125};

size_t nitems = sizeof(arr) / sizeof(arr);

if(nitems > 1 && !isSorted(arr, nitems, sizeof(int), cmp))
{
mergeSort(arr, nitems, sizeof(int), cmp);
}

printf("Sorted Array: ");

for( i =0; i < nitems; i++)
{
printf("%d ", arr[i]);
}

printf("\n");

return 0;
}

/*
A > B = 1, A < B = -1, (A == B) = 0
*/
int cmp(const void *a, const void *b)
{
const int *A = a;
const int *B = b;
return (*A > *B) - (*A < *B);
}```
MergeSort.h

Code:
```#ifndef MERGESORT_H_INCLUDED
#define MERGESORT_H_INCLUDED

/*
This threshold signifies when we should stop merge sort and call insertion sort. Say for example we try to sort 10000 items, by the time
we have a sub array of 10 elements, the array is short and sorted enough that insertion sort would perform better than merge sort.
*/
#define THRESHOLD 10

typedef int (*cmp_t)(const void *, const void *);

void mergeSort(void *, size_t, size_t, cmp_t);
int isSorted(void *, size_t, size_t, cmp_t);
void byteSwap(void *, void *, size_t);

#endif```
MergeSort.c

Code:
```#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "MergeSort.h"

static void mergeSort_recurse(void *, void *, int, int, size_t, cmp_t);
static void merge(void *, void *, int, int, int, size_t, cmp_t);
static void insertionSort(void *, size_t, size_t, cmp_t);

void mergeSort(void *base, size_t nitems, size_t memSize, cmp_t cmp)
{
/*
Merge sort requires a temporary array to write the values that it sorted back to the original array.
Rather than mallocing a temporary array each time we decide to merge, we only need to malloc this temp
array once. This will speed up our execution time by a good amount but our space complexity will still be
big-oh(N).
*/
char *tempArr = (char *)malloc(sizeof(char) * memSize * nitems);

if(tempArr != NULL)
{
mergeSort_recurse(base, tempArr, 0, nitems - 1, memSize, cmp);
free(tempArr);
insertionSort(base, nitems, memSize, cmp);
}
}

static void mergeSort_recurse(void *base, void *tempArr, int start, int end, size_t memSize, cmp_t cmp)
{
if(end - start < THRESHOLD) return;

int splitIndex = start + ((end - start) / 2); /*Makes it less likely for an overflow to occur.*/

/*
Break the array into two sub arrays; one which the represents the left most side of the array up
until the splitIndex and one which the represents the right most side of the array up until the end
variable. Then, once we recurse far enough that we only have one array to sort, we start merging each
sub array until our array is sorted.
*/
mergeSort_recurse(base, tempArr, start, splitIndex, memSize, cmp);
mergeSort_recurse(base, tempArr, splitIndex + 1, end, memSize, cmp);
merge(base, tempArr, start, splitIndex, end, memSize, cmp);
}

static void merge(void *base, void *tempArr, int start, int splitIndex, int end, size_t memSize, cmp_t cmp)
{
char *leftElement;
char *rightElement;
char *tempElement;

int leftIndex = start; /*Starting index for the left sub array.*/
int rightIndex = splitIndex + 1;/*Starting index for the right sub array. */
int tempArrIndex = start; /*Starting index for the temp array. */

/*while we haven't reached the end of either the left or right sub array. */
while(leftIndex <= splitIndex && rightIndex <= end)
{
leftElement = ((char *)base + (leftIndex * memSize)); /*offsetting to the element at left and right Index.*/
rightElement = ((char *)base + (rightIndex * memSize));

if(cmp(leftElement, rightElement) < 0)
{
tempElement = ((char *)tempArr + (tempArrIndex * memSize));
memcpy(tempElement, leftElement, memSize);
leftIndex++;
tempArrIndex++;
}
else
{
tempElement = ((char *)tempArr + (tempArrIndex * memSize));
memcpy(tempElement, rightElement, memSize);
rightIndex++;
tempArrIndex++;
}
}

/*
Once the first while loop up above finishes, only one of these loops will execute and copy the elements that were left
over from either the left or right sub array.
*/
while(leftIndex <= splitIndex)
{
leftElement = ((char *)base + (leftIndex * memSize));
tempElement = ((char *)tempArr + (tempArrIndex * memSize));
memcpy(tempElement, leftElement, memSize);
leftIndex++;
tempArrIndex++;
}

while(rightIndex <= end)
{
rightElement = ((char *)base + (rightIndex * memSize));
tempElement = ((char *)tempArr + (tempArrIndex * memSize));
memcpy(tempElement, rightElement, memSize);
rightIndex++;
tempArrIndex++;
}

for(leftIndex = start; leftIndex <= end; leftIndex++)
{
leftElement = ((char *)base + (leftIndex * memSize));
tempElement = ((char *)tempArr + (leftIndex * memSize));
memcpy(leftElement, tempElement, memSize);
}
}

static void insertionSort(void *base, size_t nitems, size_t memSize, int (*cmp)(const void *, const void *))
{
char *start = base;
char *lastPtr = start + (nitems * memSize);
char *top;

for(top = start + memSize; top < lastPtr; top += memSize)
{
char *previous = top;
char *current = previous - memSize;

while(current >= start && cmp(previous, current) < 0)
{
byteSwap(previous, current, memSize);
previous = current;
current -= memSize;
}
}
}

int isSorted(void *base, size_t nitems, size_t memSize, cmp_t cmp)
{
char *start = base;
char *lastPtr = (start + (nitems * memSize)) - memSize;

for(; start < lastPtr; start += memSize)
{
/* Simply check if the current element is greater than the next element. */
if(cmp(start, start + memSize) > 0)
{
return 0;
}
}

return 1;
}

/*
The main way that this function is swapping is by swapping out the bytes in each
iteration.
*/
void byteSwap(void *a, void *b, size_t memSize)
{
char *aa = a;
char *bb = b;

do
{
char tmp = *aa;
*aa++ = *bb;
*bb++ = tmp;
}
while(--memSize > 0);
}``` 2. Get a profile of the number of times memcpy is called, where it is called from, and the size of memory copied.

> if(cmp(leftElement, rightElement) < 0)
If this is true 10 times in a row say (which will be common as the array becomes nearly sorted), then consider counting the run lengths.

> memcpy(tempElement, leftElement, memSize);
Because
memcpy(tempElement, leftElement, memSize * runLength);
is likely to be more efficient than calling memcpy lots of times in a loop copying a small amount of memory. 3. Interesting suggestion salem, I could add an additional two variables called leftSideLength and rightSideLength. Then I would need to count the number of comparisons from the left and right side and use those lengths to extend amount of locations I need to write. The ideal situation that I'm looking for would be to try to get one memcpy from the original to the temp array and one memcpy from the temp array back to the original array but I don't know if that is possible. What do you think? 4. Try something and find out - that's my suggestion. Popular pages Recent additions array, int, size_t, sort, void 