I wrote a wrapper for malloc() that tracks each allocation. For each call it records the name of the file that the calling function resides in. (i.e. A function in a file called foo.c calls my_alloc(10);. my_alloc() is a macro that looks like this:
Code:
#define my_alloc(x) (my_alloc_internal(__FILE__, __LINE__, x))
Anyway, here's my get_filename_hash() function that's called by my_alloc_internal():
Code:
struct filename_hash_bucket
{
char *name;
struct filename_hash_bucket *next;
};
#define MVM_HASH_BUCKETS 50
static struct filename_hash_bucket *filename_hash[MVM_HASH_BUCKETS] = { NULL };
static struct filename_hash_bucket *get_filename_hash(char *str)
{
char *s;
unsigned int hash = 0;
struct filename_hash_bucket *hb;
for(s = str;*s;s++)
hash += *s;
hash %= MVM_HASH_BUCKETS;
for(hb = filename_hash[hash];hb;hb = hb->next)
if(!strcmp(hb->name, str))
break;
if(hb)
return hb;
hb = malloc(sizeof(struct filename_hash_bucket));
hb->name = malloc(strlen(str)+1);
strcpy(hb->name, str);
hb->next = filename_hash[hash];
filename_hash[hash] = hb;
return hb;
}
The filename to be hashed is passed to the function. What I'm looking to do is generate a very fast hash algorithm that produces as few collisions as possible. The one I'm using now (add the value of each character in the filename) doesn't seem too good. Anyone have any ideas? Or any ideas on how to optomize this function for speed as much as possible?
EDIT: BTW, the program I wrote this for has the abillity to load "plugins" via dl_open() calls so building a list of filenames for the hashtable at program startup isn't possible unless the plugin loading routine added the filename to the hashtable, but I'd like to keep the internals of this malloc() wrapper as invisible as possible to outside functions.
EDIT2: I'm really only looking at the section of the function where it successfully finds a match since that opens far more frequently than needing to add a filename to the table. Should I dump the linked list and go with an array of strings? What's a better way to get a hash value than what I'm doing (read: creates fewer collisions and is fast)?
Thanks for any help.