Hashing Functions

**Bella_Rosa** · 05-21-2005

Ok I know how to write the actuall hash function what I'm having trouble with is the techniqe to resolve the collision there is a way called Dynamic Hashing I'm having trouble on how to write it. I understand how it works I just don't know were to start .

here is the srtucture of the HashTable

Code:

 typedef struct {
   int key ; 
   char FName[10]; //First Name //
   char LName[10]; // Last name // 
   char Tel[7]; //telephone number // 
   char city[10] ; 
} element ;

and I've decided to use the mid square hash function

Code:

int hash( element Civil )
{

 unsigned int v , x , y , w ;

 
  w = Civil.key * Civil.key ; // this squares the Civil Id //
  y = w << 16 ;   // this takes the three bits in the key 
  x = y >> 29 ;  // which are in the middle // 
   printf("x = %d \n" , x);
 return (x);
}

in the main program it's a normal input of data using scanf then calling the function

Code:

 int main()
{ 
 int i , size ;  // size of the elements we want to enter // 
 element Civil  ; 
 element hash_table[TABLE_SIZE];
 // Inserting data //
 printf("Enter number of records you want to read: ");
 scanf("%d" , & size);
 
 for ( i = 0 ; i < size ;i ++)
 {
   printf("Enter civil ID:\n");
   scanf("%d" , &Civil.key);
   printf("Enter first name:\n");
   scanf("%s" , &Civil.FName);
   printf("Enter Last name :\n");
   scanf("%s" , &Civil.LName);
   printf("Enter telephone # :");
   scanf("%s" , &Civil.Tel);
   printf("Enter City:");
   scanf("%s" , & Civil.city);
 
   Dynamic_insert( Civil , hash_table);  // this is the function  I //
                                                               // want to create //
 }
return(0);
}

Any one got any ideas on what I should do next . Is it a good idea to strat making it look like a binary tree ???

**Mezzano** · 05-21-2005

Originally Posted by Bella_Rosa

Ok I know how to write the actuall hash function what I'm having trouble with is the techniqe to resolve the collision there is a way called Dynamic Hashing I'm having trouble on how to write it. I understand how it works I just don't know were to start .

here is the srtucture of the HashTable

Code:

 typedef struct {
   int key ;
   char FName[10]; //First Name //
   char LName[10]; // Last name //
   char Tel[7]; //telephone number //
   char city[10] ;
} element ;

To be technical that is the structure of an element in your hash table, the hash table itself is likely just an array of these things, this is what gives it O(1) lookup, on average.

Originally Posted by Bella_Rosa

Any one got any ideas on what I should do next . Is it a good idea to strat making it look like a binary tree ???

No making it look like a binary tree is a horrible idea

That would lead to possible O(N^2) lookups...bad bad bad. Basically there are two collision resolution ideas. The first, the one I prefer, is chained hashing. In this scheme each entry in the hast table is really a linked list and collisions are handled by adding to the end of it (which can be done in O(1) if you keep a tail pointer). I think I remember reading with this scheme the average look up time turns out to be O(N/2) where N is the length of the linked list in this case. There are of course drawbacks to this approach. Traversing a linked list can be bad, since you are dealing with pointers ever pointer traversal can be horrendous because you could get cache misses, or worse yet memory that has been paged out. This strategy tends to avoid primary clustering based on how good your hash function is.

The other kind of resolution strategy, the one you mention, is dynamic hashing in which, when you have a collision you resize the table and use a different hashing function. This helps avoid primary clustering but it can lead to secondary clustering (there is no free lunch

. It has been awhile since I have used this method, hell it has been awhile since I have made a hash table, but the basic idea is this. When you get a collision you want to resize the hash table (since in some ways a collision is a sign of dwindling space...that or a sign your hash function sucks

, and rehash everything using a different hashing function. The theory is the bigger size table will lead to less collisions. There is a hashlib, called hashlib interestingly enough, that implements this approach I believe, you can get the source here Again I am not as knowledgeable in this kind of has table but you function basically need to do this (assuming the hash table is initially full of nulls so you can tell if there is a collision or not). Take the object you want to hash, generate a position from your hash function. If it is not null you have a collision. At this point you need to resize your hash table. Now try to insert the item again using a different hash function, if you get a collision, try a different one yet again

The idea is that eventually you will find a slot, and you can place it there. Then on lookup you simply repeat the same sequence of hash function until you find it, or find null, which would mean it is not in the table. A good treatment of this area can be found here and it has pretty pictures.

Sorry for the long ass response, just wanted to get all the info out at once to make it more clear for you....not sure if I succeeded

Mezzano

**Salem** · 05-22-2005

> and I've decided to use the mid square hash function
Why?
It seems a very poor choice to me (unless you know something about civil keys you're not sharing).
For example, all keys from 0 to 90 return the same value.

> TABLE_SIZE
You need to try and evenly distribute your input data over this range.
A simple result would be civil.key % TABLE_SIZE;

In any event, the hash function has to incorporate the current table size, otherwise you're simply not using the space available. Your function with its >>29 has only 16 possible values no matter what your input data is.

> Dynamic_insert( Civil , hash_table);
A couple of points
1. TABLE_SIZE can no longer be a constant.
http://www.nist.gov/dads/HTML/dynamicHashing.html suggests that the table changes size.

So to start with, you would need

Code:

 int hash_size;
 element *hash_table;
 // Inserting data //
 printf("Enter number of records you want to read: ");
 scanf("%d" , & size);
 hash_size = size * 3;
 hash_table = calloc( hash_size, sizeof *hash_table );

Since dynamic insert can change the hash table, you'd need a call which would allow you to modify the hash table itself.
Say

Code:

 hash_table = Dynamic_insert( Civil, hash_table, &hash_size );

When you get a collision, you
- determine a new hash_size
- allocate a larger table
- rehash all the entries in the old table into the new table
- hash the new (collision) entry into the table

**Bella_Rosa** · 05-22-2005

Thanx For the Help

. I'm starting to get a better idea of what I should do . ( the point of this program is to try diffrent load factors to see the average number of collisions & the average length of collided sequence ) . I forgot to mention that my hash table is fixed ( it's size is 8) so that I can change the load factors I want by entering the # of elements . For example in my first run I'm gonna enter 2 elements Load factor = .25 then I'm gonna try it with 5 elements load factor should be .62 and so on and yes the civil.key is in certain range

**Mezzano** · 05-22-2005

Originally Posted by Bella_Rosa

Thanx For the Help

. I'm starting to get a better idea of what I should do . ( the point of this program is to try diffrent load factors to see the average number of collisions & the average length of collided sequence ) . I forgot to mention that my hash table is fixed ( it's size is 8) so that I can change the load factors I want by entering the # of elements . For example in my first run I'm gonna enter 2 elements Load factor = .25 then I'm gonna try it with 5 elements load factor should be .62 and so on and yes the civil.key is in certain range

Hashing on a fixed size table without chaining seems a bit impossible. If you ever get up to entering 8 elements you are done, if you don't chain or rehash your table is full, game over. I realize this is just a school exercise and you aren't intending to make a fully general hash table, it just seems odd that is all.

Also I think there is a way to avoid re-hashing the table after resizing. All you would need is a list containing all the past sizes of the hash table, ie it would start out as a list just containing 8 and then as you resize he hash table you add the "new" size to it.

The theory is this, you use hash function #1 until you get a collision, you then resize the table (leaving the old entries in their same places) and rehash the collided element based on a new hash function (more likely just the old one plus a constant, say i^2 for i = 0 to inf ...in theory

Then on look up you start with the base hash function using the original table size, hash function #1, if you find a null in the location it gives you you are done as you know it can't be in the table. If you find an item there are two possibilities. First if it is the item you are looking for you are done. Second, if it is NOT the item you are looking for then you start probing, ie you start doing the same thing you did on insert (adding i^2 for example to the result of every hash and remember to use the next table size in the list as your mod value), and looking in those slots. As soon as you find a null you are done because that would mean there is no way for the item to be in the table. If you find an item just repeat what you did in the case of the first collision, check if it is what you are looking for, if not continue probing.

Mezzano

**Bella_Rosa** · 05-24-2005

I was just wondering if I use a midsquare function and I use only 3 bits woldn't there be a problem if I have the table size more than 8 caus 2^3 = 8 which should be the table size ???

Thread: Hashing Functions

Thread Tools

Search Thread

Display

Hashing Functions

Thanx

Thanx 4 all the input ppl

Similar Threads

Functions and Classes - What did I do wrong?

calling functions within functions

Factory Functions HOWTO

Shell functions on Win XP

functions - please help!!!!