I'm sorry, I didn't read your code too carefully. You are absolutely right.
As to the speed difference, I don't know. The function is working practically directly on the array too? Are you sure you tested with exactly the same set of data? Did you test multiple times (may-be your computer was busy doing something else at the same time)?
Anyway, qsort is much quicker basically because it does a whole lot less comparisons and a whole lot less swaps.
In bubble-sort, let's suppose the first member is the largest: it takes size-1 swaps to get it to the right place. In qsort, it is swapped into the second half of the array (approximately the right place ) in just one swap (iteration).
I recently implemented merge sort, if you are interested. The good thing about it is that most work is done by STL inplace_merge algorithm
 One thing you could do: you got to set changed to false each time through the outer loop. And may-be you could initialize changed to true, and replace the outer for loop with while (changed) to avoid an additional check for if (!changed).
//a - array to be sorted
//begin - index of first element of the range
//end - one past end of the range
template <typename T>
void MergeSort(T* a, int begin, int end)
int middle = (begin+end)/2;
if (begin+1 < middle) MergeSort(a, begin, middle);
if (middle+1 < end) MergeSort(a, middle, end);
std::inplace_merge(a+begin, a+middle, a+end); //#include <algorithm>