Help me check my hash function

**lancetky** · 04-17-2010

how do u decide ur hash value??
#include <stdio.h>
#include <stdlib.h>
#define HASH 53//how u determine this value??
int hashfunc (unsigned int ID);
void hash (int user);
int main()
{
struct rec *head=NULL,*curr=head;
int i, user;
printf("Enter Student ID:");
scanf("%d", &user);
hash(user);
return 0;
}
void hash (int user)
{
int k[2], i, sum, index, temp=user;
unsigned int ID;
for(i=0;i<2;i++)
{
k[i]=temp%100000;
temp=temp/100000;
}
sum=k[0]+k[1];
ID=sum%1000;
index=hashfunc(ID);
printf("The index of %d is %d", user, index);
}
int hashfunc (unsigned int ID)
{
int remain;
remain = ID%HASH;
return remain;
}

**tabstop** · 04-17-2010

Probably came to him in a dream.

(Although, with that particular hash function, the value of HASH sets how many different bins there are.)

**lancetky** · 04-17-2010

Originally Posted by tabstop

Probably came to him in a dream.

(Although, with that particular hash function, the value of HASH sets how many different bins there are.)

how u determine the bins required??? i not really sure how to get the value

**tabstop** · 04-17-2010

Look at your hashfunc. What does it do? What possible values can come out of that function?

**lancetky** · 04-17-2010

Originally Posted by tabstop

Look at your hashfunc. What does it do? What possible values can come out of that function?

find the index value for hash table?? 0 till 52

**tabstop** · 04-17-2010

Originally Posted by lancetky

find the index value for hash table?? 0 till 52

And so that's how you know how there are 53 bins required.

**MK27** · 04-17-2010

How many bins do you want? These are also called "buckets".

A hash table is a fixed size array (of buckets). What the buckets are depends on the implementation, they could be linked lists. In any case, while the table is of a fixed size, the buckets need to hold zero or more items.

Each item goes into a specific bucket determined by the hashing formula. So, if the array has 53 buckets (0-52), you would determine this:

some_value%53

That will yield an answer in the range 0-52. That's the items "hash key". For a discussion about "ideal" hash table sizes, you might look at this short thread: String hashing

**lancetky** · 04-17-2010

Originally Posted by tabstop

And so that's how you know how there are 53 bins required.

i not sure whether it is 53 cause in books in write there it should be prime number that 4 times the number of records. i dun really get wat it mean http://cboard.cprogramming.com/images/icons/icon9.gif

**MK27** · 04-17-2010

The number of bins is arbitrary and decided by the programmer. "A prime number 4 times the number of records" is a very wacky idea IMO. Generally, you would want the number of buckets to be substantially less than the number of records.

I'll admit all the descriptions of hash tables I've seen are totally atrocious and do not describe anything, so it would not surprise me to hear there is one more out there that has someone confused

**tabstop** · 04-17-2010

You can make the number of bins be whatever you want it to be. There's not a number out there that is The One True Number of Bins. You just need to use what you know about your data to try to choose an appropriate size.

**lancetky** · 04-17-2010

oo i see. thx for ur guidance =D

**~~Overworked_PhD~~** · 04-17-2010

This might not be on topic, but here is one of my old co-workers speel on hashes

"It's all about the distribution of entries in the table. The idea is that
you want to spread them out throughout the table as best as you can, so that
you minimize collisions. If the multiplier and the table size are relatively
prime, then you end up distributing entries across every "slot" in the table,
assuming "sufficiently random" data.

The easiest way to get the multiplier and hash table size relatively prime,
meaning they share no common factors, or that if n = the table size is a
positive integer greater than 1 and m = the multiplier is likewise a
positive integer greater than 1, then gcd(n, m) = 1, where gcd(n, m) is the
positive greatest common divisor of n and m, is to make one or both prime
numbers. Recall that a mod n = r (a, n positive integers) means that for
some integer s, a = ns + r. We assume further that n > m.

To show why this relatively prime business is important, suppose we are
doing open hashing as in my earlier example with some other method for
collision resolution (e.g., chaining). Now, let d = gcd(n, m) >= 1.
Given an input datum consisting of the string a_1 a_2 a_3 ... a_k, where
0 <= a_k <= B for some constant positive integer B, the hash becomes
the nonnegative integer,

k
---
r = ( > m a ) mod n
--- i
i=1

Let u be the sum over i = 1 to k of a_i; then the above becomes:

r = (mu) mod n,

since m is constant with respect to the sum, and thus can be factored out.

But, since m has d as a factor, we know there exists some positive integer q
such that m = dq and thus the hash is equivalent to r = (dqu) mod n which
implies that dqu = sn + r => dqu - r = sn => s | (dqu - r), where ``a | b''
stands for "a divides b" (or b has a as a factor), and s is as before. But,
recalling that n also has d as a factor, we see that there exists some
positive integer t such that n = dt. Then, dqu = sdt + r which implies that
dqu - r = sdt => sd | (dqu - r) => there exists some integer v such that
v = (dqu - r) / sd => vs = (dqu - r) / d is also an integer (by closure of
multiplication in the integers). Then, vs = (dqu - r) / d = dqu/d - r/d.
But, clearly d | dqu and dqu/d = qu is clearly an integer by closure, and
since vs is an integer, it follows that r/d is also an integer and hence
d | r => every hash value must be a multiple of d. It follows that every
slot in the hash table that is not an integer multiple of d is unused.

If d > 1 => n, m are not relatively prime, then only every d'th slot in
the table will be used. Consider the case where n = 8 and m = 4 and then
gcd(n, m) = 4 implies that only the 0'th and 4'th slots in the table will
be used, and the hash table is only as effective as one with two slots.

On the other hand, if n and m are relatively prime, then by definition,
gcd(n, m) = 1 and every slot in the table will be used.

Of course, this still all assumes that the data we're hashing cooperates and
doesn't give us some weird values. For instance, if the data were always
some multiple of the modulus when shifted and summed, you'd end up in the
same boat. You can try and ameliorate this case by careful choice of
collision resolution techniques, using double hashing with hash algorithms,
etc. It gets into some real black magic.

In practice, however, for most string data the function I posted is quite
good.
"

**jwenting** · 04-17-2010

they had hashing algorithms in Ur?

**~~Overworked_PhD~~** · 04-17-2010

And some more of his speel on hashes

More properly, if the hash size is a prime p, you have a
finite field F that isomorphic to Z/pZ, but any finite ring of order n not a
prime cannot be a field, which is really what you'd like to have here. Of
course, even with a field, I can choose a multiplier that is, say, a
multiple of p and still get bad results (even though that would be a pretty
dumb thing to do). When you choose a multiplier and a table size that
aren't relatively prime, you still get a useful ring, but it's smaller than
the table size.

But it would be better to think of this in terms of finite Abelian groups.
Any finite Abelian group is, of course, isomorphic to the direct product of
a set of finite cyclic groups, and what you want to do here is minimize the
number of groups in the product (which I agree means maximizing the size of
the field, which, again, means making the underlying group of prime order).

And the ring explanation isn't the end of the story. What you really want
to do is minimize the number of times individual elements of the ring pop
up in the hash function, not just worry about 0 divisors. For instance,
how many 2 and 3 divisors are there? It's mostly the same problem, though.

**Subsonics** · 04-17-2010

Since the key in this case is an integer, student ID, that supposedly is unique. Then you could use the ID as a hash directly and have a perfect hash table. If I'm not missing something here.

Thread: Help me check my hash function

Thread Tools

Search Thread

Display

Help me check my hash function

Similar Threads

Compiling sample DarkGDK Program

Undefined Reference Compiling Error

We Got _DEBUG Errors

Calling a Thread with a Function Pointer.

Problem with Visual C++ Object-Oriented Programming Book.