Several algorithms for fast sorting of strings in internal memory have been described in recent years,
including improvements to the standard quicksort (Bentley and McIlroy, 1993), multikey quicksort (Bentley
and Sedgewick, 1997), and variants of radix sort (Andersson and Nilsson, 1998). The fastest such
algorithm is our burstsort (Sinha and Zobel, 2004a), in which strings are processed depth-first rather than
breadth-first and a complete trie of common prefixes is used to distribute strings amongst buckets. In all of
these methods, the strings themselves are left in place, and only pointers are moved into sort order. This
pointerized approach to string sorting has become a standard solution to the length and length variability
of strings, as expressed by Andersson and Nilsson (1998):
"to implement a string sorting algorithm efficiently we should not move the strings themselves
but only pointers to them. In this way each string movement is guaranteed to take constant
time."
In this argument, moving the variable-length strings in the course of sorting is considered inefficient. Sorting
uses an array of pointers to the strings, and, at the completion of sorting, only the pointers need to be
in sort order.
However, evolutionary developments in computer architecture mean that the efficiency of sorting strings
via pointers needs to be reexamined. Use of pointers is efficient if they are smaller than strings and if the
costs of accessing and copying strings and pointers are uniform, but the speed of memory access has fallen
behind processor speeds and most computers now have significant memory latency. Each string access
can potentially result in a cache miss; that is, if a collection is too large to fit in cache memory, the cost
of out-of-cache references may dominate other factors, making it worthwhile to move even long strings in
order to increase their spatial locality. It is the cost of accessing strings via pointers that gives burstsort
(which uses no fewer instructions than other methods) a speed advantage.