Thread: Generate unique hashtable keys

  1. #1
    Tweaking master Aslaville's Avatar
    Join Date
    Sep 2012
    Location
    Rogueport
    Posts
    528

    Generate unique hashtable keys

    Am creating a hashtable whose contents are something like:

    Code:
    struct{
       uint16_t domid;
       uint16_t devid;
       uint64_t hwaddr;
    }
    I will have the above three value when it comes to looking up the item.

    What's the best way to generate keys ?

  2. #2
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Aslaville View Post
    What's the best way to generate keys ?
    Ouch.

    Obviously, the "best" way would be the one that generates values 0 .. N-1 for the N entries you are going to use. Since we do not know the entries beforehand, there is no "best" way.

    That's probably why you haven't gotten any suggestions yet.

    If you were to ask for reasonable suggestions for hashing these entries, I could perhaps suggest something.

    (Hey, I'm not being snarky, I'm trying to be helpful here! No smirking, either. I'm serious! )


    Ahem.

    If I had those structures, and I'd always have fully filled structures -- that means no searches like "all structures where hwaddr is ..." -- I'd most likely map them to three 32-bit uint32_t's, add a fourth fixed one (maybe 0xDEADBEEF or 0xB0B5CAFE or something similarly "funny"), and use that 128-bit state as a seed to Xorshift128, and use the, say, 13th through 16th generated 32-bit unsigned integers, exclusive-ored together, as the hash. (Which would be 32-bit. If you look at the function itself, you'll see it all unrolls very nicely, too.)

    The Xorshift generator passes the diehard tests, and even just four rounds should create a nice bit avalanche. (Meaning, even a single bit change should change many bits in the result. Which is exactly what you want in a hash function, in addition to equal distributions per output bit and whatnot, what Xorshift should already provide to a pretty good degree.)

    This is a very fast operation, involving just bit shifts and exclusive-ors (you could even have an SSE2/SSE3-optimized version, since four elements are massaged the exact same way in parallel, if efficiency is paramount).

    I prefer my hash table entries to be linked lists, so my hash table entries would likely be
    Code:
    struct item {
        struct item  *next;
        uint64_t      hwaddr;
        uint32_t      hash;
        uint16_t      domid;
        uint16_t      devid;
    };
    
    struct hashtable {
        uint32_t      size;
        struct item **item;
    };
    where the index to the item array of pointers in struct hashtable is always (hash % size). This allows hash table expansion without recalculating any hashes, as well as fast comparison for inequality (if hashes differ, the entries cannot be the same). Efficient.

    If your hash function always returns a nonzero hash (returns say ~0U if the hash itself computes to zero), you can use the struct item in other places, too, like returning lists; with zero hash meaning "unknown" or "not relevant". (Just remember to add the zero check to the function where you compare two items for equality, too.)

  3. #3
    Tweaking master Aslaville's Avatar
    Join Date
    Sep 2012
    Location
    Rogueport
    Posts
    528
    Hi Nominal and thanks for the very informative reply

    Quote Originally Posted by Nominal Animal View Post

    Ouch.

    Obviously, the "best" way would be the one that generates values 0 .. N-1 for the N entries you are going to use. Since we do not know the entries beforehand, there is no "best" way.

    That's probably why you haven't gotten any suggestions yet.

    If you were to ask for reasonable suggestions for hashing these entries, I could perhaps suggest something.

    (Hey, I'm not being snarky, I'm trying to be helpful here! No smirking, either. I'm serious! )
    Am trying to generate cache for an emulated memory management unit (mainly for testing purposes) meaning I don't have to keep all the entries.

    I could have a maximum hashtable size whereby if the hashtable 'fills' I dump all the contents and start caching afresh.

    I haven't actually written my own hashtable from scratch, am using glib GHashTable.

    Before considering the 'second' solution let's hear 'reasonable suggestions'.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. hashtable with seg fault
    By yen in forum C Programming
    Replies: 1
    Last Post: 10-16-2010, 02:34 AM
  2. Initializing a hashtable?
    By mrsirpoopsalot in forum C++ Programming
    Replies: 3
    Last Post: 11-19-2009, 01:19 AM
  3. HashTable help?
    By -EquinoX- in forum C Programming
    Replies: 66
    Last Post: 03-31-2008, 12:15 AM
  4. HashTable in a struct
    By Shotgun in forum C Programming
    Replies: 1
    Last Post: 05-31-2006, 10:00 PM
  5. game hashtable
    By disks86 in forum C++ Programming
    Replies: 1
    Last Post: 12-16-2005, 01:42 PM