# data structures / hash functions

• 11-11-2001
rickc77
data structures / hash functions
I need to understand the hashing function of data structures.

With the help of other members on this site, I have been supplied with links to good references, but for some reason I cannot grasp the concept of hashing.

For example I need to understand how to determine a hash address, and how to determine if a collision has occurred, and of course how to handle the collision.

My books are really not helping me, so i'm hoping that someone can either point me in a good direction of reading material, or explain to me what is going on.

Any help would be greatly appreciated.
• 11-11-2001
Salem
• 11-11-2001
rickc77
hash functions
thanks salem.......i previously looked at this write-up per stoned-coder, but I even though i'm reading it, i'm just having a problem comprehending it.

what i see it as is:

an array of size of ??(whatever)

if i have 3 values that need to be placed into this array (2,4,6)
it would take the first value and using some formula it would place the vlue into a specific position. It would then take the next value (4), and test to make sure that the position is available, if not it just created a colision, and will go to the next available position. At this time it would take the last value (6) and again test to make sure that the position is empty if yes, it would enter the value, if no, it would take the next position and perform the test again until it finds an empty position.

My first piece of confusion is how does it know what position to place it in for the first attempt?
#2 If it uses the same logic, wouldn't the first attempt always cause a colision.
#3 If you have set a value for the array..say [5] when it gets through the array would it go back to one, or would you want to send the user a message to expand the array?
• 11-11-2001
C_Coder
A hashing algoritm is just an expression to create a unique value for given data(most hashing algorithms i have seen only produce a limited amount of unique values).

an example of a hashing algorithm I have uses vehicle registrations as its data:

say you had the reg number B123BCD:

the algorithm is, take the integer value of the first letter 'B' and subtract 'A' from it, then multiply the result by 1000. So far we have a value of 1000. Next we add the next 3 digits 123, so now we have a value of 1123. This is our unique value(actually this method only produces 26000 different values). So you would store the registration number in position 1123 of the array/file.

You need to make sure your data is in the correct format and if data already exists in the position 1123, i would write the new data to an error file or just tell the user, i wouldn't try to store it elsewhere because retriving it using the same algorithm would get the wrong data.

Hope this helps.
:D
• 11-11-2001
Stoned_Coder
1)It knows where to put the value because you have written a hash function that tries to give every value a unique index into the array.Writing this function is the most difficult part of implementing a hash table.
2) you are inputting 2,4, and 6 so a good hash will always allocate a different index for each of them so there should be no collisions here. Problems come when you have many more used slots in the array. That is why a hash table is normally kept below 70% full.
3)if the tables approaching 70% full then expand.
• 11-11-2001
QuestionC
A has table basically revolves around having a hash function which is.. well, let's look at a linked list implementation of a hash table (see 6.6 of K&R for a more complete discussion). Obviously, you'll need to understand linked lists to do this code.
Code:

```#define HASHSIZE 1000 typedef struct _node {  char * name;  int score;  struct _node * next; } NODE; // returns a value from 0 to 999 int hash (char * name); void addVal (char * name, int score); int readVal (char * name); void add2tail (NODE * list, NODE * addMe); NODE * makeNode (char * name, int score); // Global for simplicity NODE * hTable [HASHSIZE];```
The use of this program is going to be to keep track of the grades of several students in a class.
Code:

```int main (void) {  int JonnyScore;  addVal ("Jonny", 68);  addVal ("Susie", 24);  addVal ("QuestionC", 101);  JonnyScore = readVal ("Jonny");  return 0; } void addVal (char * name, int score) {  NODE * p;  int hashval;  p = makeNode (name, score);  hashval = hash (name);  add2tail (hTable[hashval], p);  return; } int readVal (char * name) {  int hashval;  NODE * p;  hashval = hash (name);  // Search untill you find the node with the same name.  for (p = hTable[hashval]; strcmp (p -> name, name) != 0; p = p -> next);  return p -> score; }```
No need to post the makeNode and add2tail functions I hope. Hopefully the code is complete enought to get the idea across. Let's discuss what the hash function should do. Obviously, as long as this function returns a value from 0 to 999, the program is going to work. In theory, the function could just return 0 every time, and the program would work, it would just be a rather bulky linked list program then. The point of the hash however is to replace what we'd normally do with a linked list with a table so that we don't have to perform the same old linear search. So ideally, we want our hash function to not return any duplicate values. If it does return a duplicate value, then we'll try to store two values in the same hash table entry. This program handles that by treating each hash table entry as a linked list. There are other ways.

Anyhow, it all revolves around our function hash, which we would like to have a fairly unbiased distribution (this is from K&R)
Code:

```int hash (char * name) {  int hashval;  for (hashval = 0; *name != '\0'; name++)      hashval = *name + 31 * hashval;   return hashval % HASHSIZE; }```
This function pretty much generates a number which is random with respect to name (obviously from 0 to HASHSIZE - 1 because of the return statement), which is what the function needs to be efficient.

Incidentally, I should say that the code I posted wasn't tested to see if it would compile, and even if it does, it still has an awful lot of problems with it (like, what happens if you try to add duplicate names to the table?, or try to read the score of a name not in the table), so just take if for explanatory value, and try and patch it up if you feel like making a hash table program. :)

Different hash table implementations pretty much differ on two points.
1) Their hash function.
2) How they handle collisions. Looking at that posted page, this example uses 'chaining'.

EDIT: Now to think of, the code certainly won't compile since it doesn't have the linked list functions.