Perhaps the issue is related to both threads accessing data from the same cache line. This would become slightly less of an issue as the objects themselves are larger.
Also, when the time taken for a comparisons or asignments/swaps is larger, then multithreading will help more.
So both the size and speed of comparison make sorting an array of ints very slanted towards single-threading being best. Sorting a structure containing a string, and several other non-key fields, might just as easily show the threaded approach to be far superrior.
Lastly, I don't know what the overhead of creating a second thread is, but it is probably much more than I had guessed and the cutoff I arbitrarily picked may be much too low. Threads aren't really meant to be created and destroyed so rapidly. Some things maintain a thread-pool for this reason.
I'm very thorough with converting recursive code to iterative code, whenever it doesn't involve maintaining an explicit stack. All you have to do is modify the input variables to the state they should be for the recursive call and then loop instead. Usually you don't need to change all of the input variables either, as in the case here where either begin or end stays the same.