I haven't currently run any tests on it, but the performance of the method you suggest varies based on a few factors:
- How your C library implements qsort. C doesn't specify which sorting algorithm must be used, the choice is up to the implementation. The results may be significantly different if one implementation uses insertion sort and another uses quick sort.
- How you choose to allocate memory for the array and store it. This isn't likely to be significant performance-wise, but it's an extra problem to worry about.
And there's also the amount of data as well as the order it's presented in, which is likely to vary regardless of what algorithms/data structures you decide to use.
With that said, I decided to compare it against the following code:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LINES 6000000
static char *dup(const char *s)
{
char *r;
if (!(r = malloc(strlen(s) + 1)))
exit(EXIT_FAILURE);
return strcpy(r, s);
}
static int compar(const void *a, const void *b)
{
return strcmp(*(char *const *) a, *(char *const *) b);
}
static void print(char *const *s, size_t n)
{
while (n--)
fputs(*s++, stdout);
}
int main(void)
{
static char *a[MAX_LINES];
char buf[8192];
size_t i = 0;
while (fgets(buf, sizeof buf, stdin) && i < sizeof a / sizeof a[0])
a[i++] = dup(buf);
qsort(a, i, sizeof a[0], compar);
print(a, i);
return 0;
}
and I had the following results (btree.c is the code I linked in my previous post, and array.c is the code pasted above):
~ $ gcc -std=c99 -march=native -Wall -pedantic array.c -o array
~ $ gcc -std=c99 -march=native -Wall -pedantic btree.c -o btree
~ $ time ./array < test > tsta
real 0m5.601s
user 0m5.111s
sys 0m0.210s
~ $ time ./btree < test > tstb
real 0m3.616s
user 0m3.197s
sys 0m0.289s
~ $ diff tsta tstb
~ $ time ./array < test > /dev/null
real 0m5.154s
user 0m5.025s
sys 0m0.117s
~ $ time ./btree < test > /dev/null
real 0m3.369s
user 0m3.200s
sys 0m0.163s
Note that those were compiled without optimisations, and the file test is basically just the plaintext version of the C89 standard catted 360 times (5,129,640 lines, with each line < 80 characters.) The array version has shorter and perhaps simpler code, but it comes at the price of a fixed maximum number of lines. The times made by both programs are fairly good, at least in comparison with other methods, but I'd still say that a binary tree is the most practical choice in this case for its ease of implementation, insertion, and speed.