
Originally Posted by
Aslaville
What's the best way to generate keys ?
Ouch.
Obviously, the "best" way would be the one that generates values 0 .. N-1 for the N entries you are going to use. Since we do not know the entries beforehand, there is no "best" way.
That's probably why you haven't gotten any suggestions yet.
If you were to ask for reasonable suggestions for hashing these entries, I could perhaps suggest something.
(Hey, I'm not being snarky, I'm trying to be helpful here! No smirking, either. I'm serious!
)
Ahem.
If I had those structures, and I'd always have fully filled structures -- that means no searches like "all structures where hwaddr is ..." -- I'd most likely map them to three 32-bit uint32_t's, add a fourth fixed one (maybe 0xDEADBEEF or 0xB0B5CAFE or something similarly "funny"), and use that 128-bit state as a seed to Xorshift128, and use the, say, 13th through 16th generated 32-bit unsigned integers, exclusive-ored together, as the hash. (Which would be 32-bit. If you look at the function itself, you'll see it all unrolls very nicely, too.)
The Xorshift generator passes the diehard tests, and even just four rounds should create a nice bit avalanche. (Meaning, even a single bit change should change many bits in the result. Which is exactly what you want in a hash function, in addition to equal distributions per output bit and whatnot, what Xorshift should already provide to a pretty good degree.)
This is a very fast operation, involving just bit shifts and exclusive-ors (you could even have an SSE2/SSE3-optimized version, since four elements are massaged the exact same way in parallel, if efficiency is paramount).
I prefer my hash table entries to be linked lists, so my hash table entries would likely be
Code:
struct item {
struct item *next;
uint64_t hwaddr;
uint32_t hash;
uint16_t domid;
uint16_t devid;
};
struct hashtable {
uint32_t size;
struct item **item;
};
where the index to the item array of pointers in struct hashtable is always (hash % size). This allows hash table expansion without recalculating any hashes, as well as fast comparison for inequality (if hashes differ, the entries cannot be the same). Efficient.
If your hash function always returns a nonzero hash (returns say ~0U if the hash itself computes to zero), you can use the struct item in other places, too, like returning lists; with zero hash meaning "unknown" or "not relevant". (Just remember to add the zero check to the function where you compare two items for equality, too.)