Hash algorithm suggestions

**itsme86** · 08-05-2004

I wrote a wrapper for malloc() that tracks each allocation. For each call it records the name of the file that the calling function resides in. (i.e. A function in a file called foo.c calls my_alloc(10);. my_alloc() is a macro that looks like this:

Code:

#define my_alloc(x) (my_alloc_internal(__FILE__, __LINE__, x))

Anyway, here's my get_filename_hash() function that's called by my_alloc_internal():

Code:

struct filename_hash_bucket
{
  char *name;
  struct filename_hash_bucket *next;
};

#define MVM_HASH_BUCKETS         50
static struct filename_hash_bucket *filename_hash[MVM_HASH_BUCKETS] = { NULL };

static struct filename_hash_bucket *get_filename_hash(char *str)
{
  char *s;
  unsigned int hash = 0;
  struct filename_hash_bucket *hb;

  for(s = str;*s;s++)
    hash += *s;
  hash %= MVM_HASH_BUCKETS;

  for(hb = filename_hash[hash];hb;hb = hb->next)
    if(!strcmp(hb->name, str))
      break;

  if(hb)
    return hb;

  hb = malloc(sizeof(struct filename_hash_bucket));
  hb->name = malloc(strlen(str)+1);
  strcpy(hb->name, str);
  hb->next = filename_hash[hash];
  filename_hash[hash] = hb;

  return hb;
}

The filename to be hashed is passed to the function. What I'm looking to do is generate a very fast hash algorithm that produces as few collisions as possible. The one I'm using now (add the value of each character in the filename) doesn't seem too good. Anyone have any ideas? Or any ideas on how to optomize this function for speed as much as possible?

EDIT: BTW, the program I wrote this for has the abillity to load "plugins" via dl_open() calls so building a list of filenames for the hashtable at program startup isn't possible unless the plugin loading routine added the filename to the hashtable, but I'd like to keep the internals of this malloc() wrapper as invisible as possible to outside functions.

EDIT2: I'm really only looking at the section of the function where it successfully finds a match since that opens far more frequently than needing to add a filename to the table. Should I dump the linked list and go with an array of strings? What's a better way to get a hash value than what I'm doing (read: creates fewer collisions and is fast)?

Thanks for any help.

**skorman00** · 08-05-2004

My suggestion would be to simply hash them based on the first letter. More than likely, this file and line tracking will be something you only want in debug, so you can use some preprocessor magic that will disable it in a release build.

**Salem** · 08-06-2004

http://www.nist.gov/dads/HTML/hash.html

> My suggestion would be to simply hash them based on the first letter
All of my __FILE__ strings begin with either . or /

**skorman00** · 08-06-2004

you're correct salem, but those aren't letters

**lonewarrior** · 08-06-2004

Why dont you try the djb2 algorithm? It's really good. Heres the function:

Code:

    unsigned long hash(unsigned char *str)
    {
        unsigned long hash = 5381;
        int c;

        while (c = *str++)
            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

        return hash;
    }

Thread: Hash algorithm suggestions

Thread Tools

Search Thread

Display

Hash algorithm suggestions

Similar Threads

Group Project Help/Volunteer

Hash Table algorithm

Binary Search Trees Part III

Request for comments