Thread: Help understanding hash function

  1. #1
    Registered User
    Join Date
    May 2009
    Location
    Look in your attic, I am there...
    Posts
    92

    Help understanding hash function

    I am looking over code that implements a linked list where one of its members is a string value -- to deal with this they used a hash table to quickly find elements. Although I do understand hash tables I am unable to fully understand the reasoning behind certain sections of code particularly the Com_HashKey.

    cvar_t is an element of the global linked list of cvar_t named cvar_vars

    Code declaring some size constants and the global variables
    Code:
    #define CVARS_HASH_SIZE  1024
    #define MAX_LIST_CVARS   4096
    cvar_t  *cvar_varsHashTable[CVARS_HASH_SIZE];
    cvar_t  *cvar_vars;
    Code used to initialize / find a cvar_t
    Code:
    /*
     =================
     Cvar_Get
    
     If the variable already exists, the value will be set to the latched
     value (if any).
     The flags will be OR'ed in if the variable exists.
     =================
    */
    cvar_t *Cvar_Get (const char *name, const char *value, int flags, const char *description){
    	cvar_t *cvar;
    	unsigned hash;
    
    	cvar = Cvar_FindVariable(name);
    	if (cvar){
    		// Update latched variables
    		...
    		// Reset value is always set internally
    		...
    		// Description is always set internally
    		...
    		return cvar;
    	}
    	// Check for command override
    	...
    	// Check for invalid name
    	...
    	// Check for invalid value
    	...
    	// Allocate a new cvar
    	...
    	// Link the variable in
    	cvar->next = cvar_vars;
    	cvar_vars = cvar;
    
    	// Add to hash table
    	hash = Com_HashKey(name, CVARS_HASH_SIZE);
    
    	cvar->nextHash = cvar_varsHashTable[hash];
    	cvar_varsHashTable[hash] = cvar;
    	return cvar;
    }
    The function Com_HashKey which is where most of my confusion stems
    Code:
    /*
     =================
     Com_HashKey
    
     Returns a hash value for string
     =================
    */
    unsigned Com_HashKey (const char *string, unsigned hashSize){
    
    	unsigned	hash = 0;
    	int			i;
    
    	for (i = 0; string[i]; i++)
    		hash = (hash + i) * 37 + tolower(string[i]);
    
    	return (hash % hashSize);
    }
    Code used to find a cvar_t from cvar_vars using the hash table
    Code:
    /*
     =================
     Cvar_FindVariable
     =================
    */
    cvar_t *Cvar_FindVariable (const char *name){
    	cvar_t		*cvar;
    	unsigned	hash;
    	hash = Com_HashKey(name, CVARS_HASH_SIZE);
    	for (cvar = cvar_varsHashTable[hash]; cvar; cvar = cvar->nextHash){
    		if (!Q_stricmp(cvar->name, name))
    			return cvar;
    	}
    	return NULL;
    }

  2. #2
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    I can't give you the exact reasoning that Com_HashKey is written the way it is. All I can tell you is designing a good hash function is an art best left to people with fancy degrees in math and comp sci. The basic idea is take some data, and run it through a function to produce an index into your hash table. The big thing you want to do is avoid collisions, a collision being two different inputs that produce the same hash key. Collisions make hash tables inefficient.

    My guess is that Com_HashKey is a simple hash function they cobbled together because they weren't too worried about performance. The fact that they tolower the string means they want "ABCDE", "Abcde", "abcde", etc to not only have the same hash key, but to actually refer to the exact same element (this is backed up by the use of Q_stricmp in Cvar_FindVariable). The modulus at the end is to make sure your hash key is within bounds for the hash table. I'm guessing they picked 37 as a multiplier because it's a prime. Maybe somebody else can give a better analysis. Otherwise, I think it's best to just take it for what it is -- a quick and dirty hash function.

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I agree with the above analysis.
    If you are in the position to change this code then I recommend switching to the FNV-1a hash instead. It is very similiar but simpler and better at the same time. You would have no trouble switching to it.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    Registered User
    Join Date
    May 2009
    Location
    Look in your attic, I am there...
    Posts
    92
    Thanks for the suggestions and analysis , this is mostly a personal project that I am doing for educational purposes so I would like to create my own that improves on the given code.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Hash function to hash key into geographic coordinate
    By dominic_tran201 in forum C++ Programming
    Replies: 2
    Last Post: 02-11-2011, 10:03 AM
  2. Help understanding Hash Table
    By boblettoj99 in forum C Programming
    Replies: 3
    Last Post: 12-29-2009, 01:23 PM
  3. Hash Function
    By Elkvis in forum C++ Programming
    Replies: 4
    Last Post: 10-14-2008, 03:59 AM
  4. Hash function
    By g_p in forum C Programming
    Replies: 1
    Last Post: 05-29-2007, 05:49 AM
  5. about hash function
    By Abdi in forum C Programming
    Replies: 1
    Last Post: 05-22-2002, 03:37 PM