Thread: Help me check my hash function

  1. #1
    Registered User
    Join Date
    Apr 2010
    Posts
    8

    Help me check my hash function

    how do u decide ur hash value??
    #include <stdio.h>
    #include <stdlib.h>
    #define HASH 53//how u determine this value??
    int hashfunc (unsigned int ID);
    void hash (int user);
    int main()
    {
    struct rec *head=NULL,*curr=head;
    int i, user;
    printf("Enter Student ID:");
    scanf("%d", &user);
    hash(user);
    return 0;
    }
    void hash (int user)
    {
    int k[2], i, sum, index, temp=user;
    unsigned int ID;
    for(i=0;i<2;i++)
    {
    k[i]=temp%100000;
    temp=temp/100000;
    }
    sum=k[0]+k[1];
    ID=sum%1000;
    index=hashfunc(ID);
    printf("The index of %d is %d", user, index);
    }
    int hashfunc (unsigned int ID)
    {
    int remain;
    remain = ID%HASH;
    return remain;
    }

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Probably came to him in a dream.

    (Although, with that particular hash function, the value of HASH sets how many different bins there are.)

  3. #3
    Registered User
    Join Date
    Apr 2010
    Posts
    8
    Quote Originally Posted by tabstop View Post
    Probably came to him in a dream.

    (Although, with that particular hash function, the value of HASH sets how many different bins there are.)
    how u determine the bins required??? i not really sure how to get the value

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Look at your hashfunc. What does it do? What possible values can come out of that function?

  5. #5
    Registered User
    Join Date
    Apr 2010
    Posts
    8
    Quote Originally Posted by tabstop View Post
    Look at your hashfunc. What does it do? What possible values can come out of that function?
    find the index value for hash table?? 0 till 52

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by lancetky View Post
    find the index value for hash table?? 0 till 52
    And so that's how you know how there are 53 bins required.

  7. #7
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    How many bins do you want? These are also called "buckets".

    A hash table is a fixed size array (of buckets). What the buckets are depends on the implementation, they could be linked lists. In any case, while the table is of a fixed size, the buckets need to hold zero or more items.

    Each item goes into a specific bucket determined by the hashing formula. So, if the array has 53 buckets (0-52), you would determine this:

    some_value%53

    That will yield an answer in the range 0-52. That's the items "hash key". For a discussion about "ideal" hash table sizes, you might look at this short thread: String hashing
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #8
    Registered User
    Join Date
    Apr 2010
    Posts
    8

    Unhappy

    Quote Originally Posted by tabstop View Post
    And so that's how you know how there are 53 bins required.
    i not sure whether it is 53 cause in books in write there it should be prime number that 4 times the number of records. i dun really get wat it mean http://cboard.cprogramming.com/images/icons/icon9.gif

  9. #9
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    The number of bins is arbitrary and decided by the programmer. "A prime number 4 times the number of records" is a very wacky idea IMO. Generally, you would want the number of buckets to be substantially less than the number of records.

    I'll admit all the descriptions of hash tables I've seen are totally atrocious and do not describe anything, so it would not surprise me to hear there is one more out there that has someone confused
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  10. #10
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    You can make the number of bins be whatever you want it to be. There's not a number out there that is The One True Number of Bins. You just need to use what you know about your data to try to choose an appropriate size.

  11. #11
    Registered User
    Join Date
    Apr 2010
    Posts
    8
    oo i see. thx for ur guidance =D

  12. #12
    Banned
    Join Date
    May 2007
    Location
    Berkeley, CA
    Posts
    329
    This might not be on topic, but here is one of my old co-workers speel on hashes

    "It's all about the distribution of entries in the table. The idea is that
    you want to spread them out throughout the table as best as you can, so that
    you minimize collisions. If the multiplier and the table size are relatively
    prime, then you end up distributing entries across every "slot" in the table,
    assuming "sufficiently random" data.

    The easiest way to get the multiplier and hash table size relatively prime,
    meaning they share no common factors, or that if n = the table size is a
    positive integer greater than 1 and m = the multiplier is likewise a
    positive integer greater than 1, then gcd(n, m) = 1, where gcd(n, m) is the
    positive greatest common divisor of n and m, is to make one or both prime
    numbers. Recall that a mod n = r (a, n positive integers) means that for
    some integer s, a = ns + r. We assume further that n > m.

    To show why this relatively prime business is important, suppose we are
    doing open hashing as in my earlier example with some other method for
    collision resolution (e.g., chaining). Now, let d = gcd(n, m) >= 1.
    Given an input datum consisting of the string a_1 a_2 a_3 ... a_k, where
    0 <= a_k <= B for some constant positive integer B, the hash becomes
    the nonnegative integer,

    k
    ---
    r = ( > m a ) mod n
    --- i
    i=1

    Let u be the sum over i = 1 to k of a_i; then the above becomes:

    r = (mu) mod n,

    since m is constant with respect to the sum, and thus can be factored out.

    But, since m has d as a factor, we know there exists some positive integer q
    such that m = dq and thus the hash is equivalent to r = (dqu) mod n which
    implies that dqu = sn + r => dqu - r = sn => s | (dqu - r), where ``a | b''
    stands for "a divides b" (or b has a as a factor), and s is as before. But,
    recalling that n also has d as a factor, we see that there exists some
    positive integer t such that n = dt. Then, dqu = sdt + r which implies that
    dqu - r = sdt => sd | (dqu - r) => there exists some integer v such that
    v = (dqu - r) / sd => vs = (dqu - r) / d is also an integer (by closure of
    multiplication in the integers). Then, vs = (dqu - r) / d = dqu/d - r/d.
    But, clearly d | dqu and dqu/d = qu is clearly an integer by closure, and
    since vs is an integer, it follows that r/d is also an integer and hence
    d | r => every hash value must be a multiple of d. It follows that every
    slot in the hash table that is not an integer multiple of d is unused.

    If d > 1 => n, m are not relatively prime, then only every d'th slot in
    the table will be used. Consider the case where n = 8 and m = 4 and then
    gcd(n, m) = 4 implies that only the 0'th and 4'th slots in the table will
    be used, and the hash table is only as effective as one with two slots.

    On the other hand, if n and m are relatively prime, then by definition,
    gcd(n, m) = 1 and every slot in the table will be used.

    Of course, this still all assumes that the data we're hashing cooperates and
    doesn't give us some weird values. For instance, if the data were always
    some multiple of the modulus when shifted and summed, you'd end up in the
    same boat. You can try and ameliorate this case by careful choice of
    collision resolution techniques, using double hashing with hash algorithms,
    etc. It gets into some real black magic.

    In practice, however, for most string data the function I posted is quite
    good.
    "

  13. #13
    chococoder
    Join Date
    Nov 2004
    Posts
    515
    they had hashing algorithms in Ur?

  14. #14
    Banned
    Join Date
    May 2007
    Location
    Berkeley, CA
    Posts
    329
    And some more of his speel on hashes

    More properly, if the hash size is a prime p, you have a
    finite field F that isomorphic to Z/pZ, but any finite ring of order n not a
    prime cannot be a field, which is really what you'd like to have here. Of
    course, even with a field, I can choose a multiplier that is, say, a
    multiple of p and still get bad results (even though that would be a pretty
    dumb thing to do). When you choose a multiplier and a table size that
    aren't relatively prime, you still get a useful ring, but it's smaller than
    the table size.

    But it would be better to think of this in terms of finite Abelian groups.
    Any finite Abelian group is, of course, isomorphic to the direct product of
    a set of finite cyclic groups, and what you want to do here is minimize the
    number of groups in the product (which I agree means maximizing the size of
    the field, which, again, means making the underlying group of prime order).

    And the ring explanation isn't the end of the story. What you really want
    to do is minimize the number of times individual elements of the ring pop
    up in the hash function, not just worry about 0 divisors. For instance,
    how many 2 and 3 divisors are there? It's mostly the same problem, though.

  15. #15
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Since the key in this case is an integer, student ID, that supposedly is unique. Then you could use the ID as a hash directly and have a perfect hash table. If I'm not missing something here.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Compiling sample DarkGDK Program
    By Phyxashun in forum Game Programming
    Replies: 6
    Last Post: 01-27-2009, 03:07 AM
  2. Undefined Reference Compiling Error
    By AlakaAlaki in forum C++ Programming
    Replies: 1
    Last Post: 06-27-2008, 11:45 AM
  3. We Got _DEBUG Errors
    By Tonto in forum Windows Programming
    Replies: 5
    Last Post: 12-22-2006, 05:45 PM
  4. Calling a Thread with a Function Pointer.
    By ScrollMaster in forum Windows Programming
    Replies: 6
    Last Post: 06-10-2006, 08:56 AM
  5. Problem with Visual C++ Object-Oriented Programming Book.
    By GameGenie in forum C++ Programming
    Replies: 9
    Last Post: 08-29-2005, 11:21 PM