For anyone else thinking of similar things, the K&R C book (Ansi C edition) has a pretty good example (page 145 in my copy) to get started.
This hashing function seems to produce a very equal distribution (for my data), even at very low numbers of hash table buckets.
Code:
unsigned hash(char *s) {
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;
}
Based on inserting 100,000 char[32] guids into a 10 bucket hash table:
Code:
[0] 9787 items
[1] 10067 items
[2] 10064 items
[3] 10061 items
[4] 10108 items
[5] 9962 items
[6] 10044 items
[7] 9999 items
[8] 9988 items
[9] 9920 items
takes about 8 seconds on my iMac/Vagrant. But increasing the bucket count to 1000 and running the same test takes only 0.1 seconds, so easy to bring down the time to insert then walk through the records.
cheers,