Originally Posted by

**iMalc**
MergeSortAtoA starts with the data in one buffer and ends up with the data in the same buffer. It calls MergeSortAtoB twice which performs the usual merges from the first buffer to the second, then with the two resulting runs of data in the second buffer, it performs a final merge back to the first buffer. I guess you could say that there is mutual recursion between them.

Originally Posted by

**rcgldr**
I didn't mention this before, the bottom up merge has an optimization, if the number of merge passes will be odd, it swaps pairs in a[] (only if needed), replacing the first merge pass, so that the back and forth merge process ends up merging back into a[].

The top down merge handles an odd number of passes by copying single elements from a[] to b[]:

Code:

void MergeSortAtoA(T a[], T b[], const int l, const int r) {
if (r - l > 1) {
const int m = (l + r)>>1; //midpoint
MergeSortAtoB(a, b, l, m);
MergeSortAtoB(a, b, m, r);
MergeAtoB(b, a, l, m, r); //Note a & b reversed (this is B->A)
}
}
void MergeSortAtoB(T a[], T b[], const int l, const int r) {
if (r - l > 1) {
const int m = (l + r)>>1; //midpoint
MergeSortAtoA(a, b, l, m);
MergeSortAtoA(a, b, m, r);
MergeAtoB(a, b, l, m, r);
} else if (r - l == 1) { // if run size == 1
b[l] = a[l]; // copy single element from a[] to b[]
}
}

By changing these functions to duplicate the bottom up optimization to swap in a[] for an odd number of passes, the performance increased by 9.8%, reducing the difference in performance between top down and bottom up to 2% (bottom up only slightly faster).

Code:

template <class T>
void MergeSortAtoA(T a[], T b[], const int l, const int r) {
if (r - l == 2) { // if run size == 2
if( a[l] > a[r-1]){ // swap in a[]
T x = a[l];
a[l] = a[r-1];
a[r-1] = x;
}
} else if (r - l > 1) {
const int m = (l + r)>>1; //midpoint
MergeSortAtoB(a, b, l, m);
MergeSortAtoB(a, b, m, r);
MergeAtoB(b, a, l, m, r); //Note a & b reversed (this is B->A)
}
}
template <class T>
void MergeSortAtoB(T a[], T b[], const int l, const int r) {
if (r - l > 1) {
const int m = (l + r)>>1; //midpoint
MergeSortAtoA(a, b, l, m);
MergeSortAtoA(a, b, m, r);
MergeAtoB(a, b, l, m, r);
}
}