Thread: Hash algorithm suggestions

  1. #1
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218

    Hash algorithm suggestions

    I wrote a wrapper for malloc() that tracks each allocation. For each call it records the name of the file that the calling function resides in. (i.e. A function in a file called foo.c calls my_alloc(10);. my_alloc() is a macro that looks like this:
    Code:
    #define my_alloc(x) (my_alloc_internal(__FILE__, __LINE__, x))
    Anyway, here's my get_filename_hash() function that's called by my_alloc_internal():
    Code:
    struct filename_hash_bucket
    {
      char *name;
      struct filename_hash_bucket *next;
    };
    
    #define MVM_HASH_BUCKETS         50
    static struct filename_hash_bucket *filename_hash[MVM_HASH_BUCKETS] = { NULL };
    
    static struct filename_hash_bucket *get_filename_hash(char *str)
    {
      char *s;
      unsigned int hash = 0;
      struct filename_hash_bucket *hb;
    
      for(s = str;*s;s++)
        hash += *s;
      hash %= MVM_HASH_BUCKETS;
    
      for(hb = filename_hash[hash];hb;hb = hb->next)
        if(!strcmp(hb->name, str))
          break;
    
      if(hb)
        return hb;
    
      hb = malloc(sizeof(struct filename_hash_bucket));
      hb->name = malloc(strlen(str)+1);
      strcpy(hb->name, str);
      hb->next = filename_hash[hash];
      filename_hash[hash] = hb;
    
      return hb;
    }
    The filename to be hashed is passed to the function. What I'm looking to do is generate a very fast hash algorithm that produces as few collisions as possible. The one I'm using now (add the value of each character in the filename) doesn't seem too good. Anyone have any ideas? Or any ideas on how to optomize this function for speed as much as possible?

    EDIT: BTW, the program I wrote this for has the abillity to load "plugins" via dl_open() calls so building a list of filenames for the hashtable at program startup isn't possible unless the plugin loading routine added the filename to the hashtable, but I'd like to keep the internals of this malloc() wrapper as invisible as possible to outside functions.

    EDIT2: I'm really only looking at the section of the function where it successfully finds a match since that opens far more frequently than needing to add a filename to the table. Should I dump the linked list and go with an array of strings? What's a better way to get a hash value than what I'm doing (read: creates fewer collisions and is fast)?

    Thanks for any help.
    Last edited by itsme86; 08-05-2004 at 09:08 PM.

  2. #2
    vae victus! skorman00's Avatar
    Join Date
    Nov 2003
    Posts
    594
    My suggestion would be to simply hash them based on the first letter. More than likely, this file and line tracking will be something you only want in debug, so you can use some preprocessor magic that will disable it in a release build.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    http://www.nist.gov/dads/HTML/hash.html

    > My suggestion would be to simply hash them based on the first letter
    All of my __FILE__ strings begin with either . or /
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    vae victus! skorman00's Avatar
    Join Date
    Nov 2003
    Posts
    594
    you're correct salem, but those aren't letters

  5. #5
    Registered User
    Join Date
    Aug 2004
    Posts
    4
    Why dont you try the djb2 algorithm? It's really good. Heres the function:
    Code:
        unsigned long hash(unsigned char *str)
        {
            unsigned long hash = 5381;
            int c;
    
            while (c = *str++)
                hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    
            return hash;
        }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Group Project Help/Volunteer
    By DarkDot in forum C++ Programming
    Replies: 3
    Last Post: 04-24-2007, 11:36 PM
  2. Hash Table algorithm
    By matott in forum C Programming
    Replies: 2
    Last Post: 09-14-2005, 09:23 AM
  3. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  4. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 10:33 AM