I'd use a C99 structure for each string, something like
Code:
struct string {
unsigned long hash;
size_t size; /* Allocated */
size_t used; /* Used */
unsigned char data[];
};
and the XOR variant of the Bernstein hash function, hashn = (33 hashn-1) XOR charn, with hash-1 = 0. For example,
Code:
static inline unsigned long hash_data(const void *const data, const size_t size)
{
const unsigned char *head = (const unsigned char *)data;
const unsigned char *const tail = (const unsigned char *)data + size;
unsigned long curr = 0UL;
while (head < tail)
curr = (33UL * curr) ^ (*(head++));
return curr;
}
The hash values stored for each string never change, unless the string changes. If you have a hash table with N entries, the index for the entry with hash H is H modulo N, i.e. (H % N) in C. Thus, to grow or shrink a hash table, you only need to move and relink the entries, not rehash any data. (And, when looking for a match, you don't need to compare the contents of most of the strings, as the unsigned long hash is likely to differ, even though the hashes map to the same hash table entry. Very efficient, well worth the 4 or 8 bytes per string it costs.)
If you wish to use a linked list for the hash entries, you could just add a next pointer to the string struct.
You can use the hash value zero, 0UL, as a special "unknown" hash value. Then, your string manipulation functions can simply reset the hash to zero, instead of rehashing the string. Whenever you do need the hash function, you recalculate the hash if it is zero. (The zero hash result is so rare you don't need to worry about it. If you want, you can have your hash functions return a fixed constant instead of zero for non-empty data, but I would not bother.) If you append to a string, you don't need to rehash the existing string; you can just rehash the appended part if you know the hash of the existing part.
When comparing two strings for equivalence, first check if their lengths match. If they don't, then they do not match. If both have a nonzero hash, and the hashes do not match, then the strings do not match. If the string data compares nonzero using memcmp(), the strings do not match. Otherwise, they do match.
Note: this approach supports any binary data, including embedded NUL bytes ('\0'). Very useful if you use POSIX getline() or getdelim() to read input lines.