Thread: Stupid newbie Q on hash functions.

  1. #1
    Registered User
    Join Date
    Apr 2008
    Posts
    14

    Stupid newbie Q on hash functions.

    Array to hash function.

    Hi im trying to get my head around reading into a hash function.

    I know i must be doing something stupid so please excuse the dumb questions...............

    Now i want to read from an array of ints that have a value between 90000-99999. and put them in an array of 64 ints.

    Now lets just say for talkings sake i create the index's by simply adding all the numbers(i know this isnt a good way of going about it). I.E 99998 would = 9+9+9+9+8, index 44. How on earth do i go about getting the value 99998 back from index 44?

    Any help would be greats, thanks.

  2. #2
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by iAmFedor View Post
    Now lets just say for talkings sake i create the index's by simply adding all the numbers(i know this isnt a good way of going about it). I.E 99998 would = 9+9+9+9+8, index 44. How on earth do i go about getting the value 99998 back from index 44?
    You can't. That hash function is not reversible.

  3. #3
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    How on earth do i go about getting the value 99998 back from index 44?
    You can't, for the same reason that you can't solve this equation.
    Code:
    x + y = z
    In your case, 44 could be 99998 or 99989. You can't tell.

    Note that it is indeed a bad hash function because you have duplicate values -- 99998 is the same as 99989, so if you stored both of those numbers you'd run into trouble.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  4. #4
    Registered User
    Join Date
    Apr 2008
    Posts
    14
    ok thank god, im not going mad!

    would any of you nice folks be able to direct me to where i can get info on creating hash functions i can reverse?

    thanks for the replys

  5. #5
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    >> so if you stored both of those numbers you'd run into trouble.
    Not really. That's just a collision. You run into trouble if you have too many collisions, and too many collisions is a sign of a bad hash function (or a hash table that is too small).

    When storing something in a hash table, you store the actual value in the table. So you don't have to go from 44 to 99998, you just look at the value(s) listed at index 44 and see what they are.

    It's not necessarily a good idea to use a hash function that you can reverse, since that kind of defeats the purpose. A hash table takes a large range of possible values and converts them to a smaller range.

  6. #6
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by iAmFedor View Post
    ok thank god, im not going mad!

    would any of you nice folks be able to direct me to where i can get info on creating hash functions i can reverse?
    That depends on the sort of data you are hashing. There are well-known reversible hash functions for integers, because integers are limited in size. For something like a string, though, the string could be any arbitrary length, and so the pigeonhole principle dictates that the hash can't possibly be reversible if it of fixed size. So in general, there is no such thing as a reversible hash for a string, but there is for integers.

    For integers, this page has lots of good information:

    http://www.concentric.net/~Ttwang/tech/inthash.htm

  7. #7
    Registered User
    Join Date
    Apr 2008
    Posts
    14
    Quote Originally Posted by Daved View Post
    >> so if you stored both of those numbers you'd run into trouble.
    Not really. That's just a collision. You run into trouble if you have too many collisions, and too many collisions is a sign of a bad hash function (or a hash table that is too small).

    When storing something in a hash table, you store the actual value in the table. So you don't have to go from 44 to 99998, you just look at the value(s) listed at index 44 and see what they are.

    It's not necessarily a good idea to use a hash function that you can reverse, since that kind of defeats the purpose. A hash table takes a large range of possible values and converts them to a smaller range.
    Of course, ha told you it was a stupid question.

    i run the method that creates the index of my number, store that index number as a temp int, and put it into my array at myArray[temp] = number.

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Daved View Post
    It's not necessarily a good idea to use a hash function that you can reverse, since that kind of defeats the purpose. A hash table takes a large range of possible values and converts them to a smaller range.
    I don't think it defeats the purpose. In fact, it guarantees that the hash function will have no collisions, since if it is reversible, it must be one-to-one.

    That doesn't mean you'll have no collisions in the hash table but there will be no collisions of raw hash values.

  9. #9
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    >> That doesn't mean you'll have no collisions in the hash table but there will be no collisions of raw hash values.

    I would consider the work you do between getting your "raw hash value" and finding the appropriate location in the table to still be part of the overall hash function.

    True, in the general sense a hash function can be reversible, but only if the result range is at least as big as the input range. My thought was that if you're attempting to use it for a hash table then it doesn't make much sense for that to be the case.

  10. #10
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Daved View Post
    True, in the general sense a hash function can be reversible, but only if the result range is at least as big as the input range. My thought was that if you're attempting to use it for a hash table then it doesn't make much sense for that to be the case.
    There seems to be a correlation between reversibility and overall quality of the hash function, although this is just empirical. The discussion on the page I linked is pretty complete on the topic.

    EDIT: Also, I'm not sure I consider "bucketizing" the hash value to be part of the hash function itself, since this happens differently when the table is resized, even though the underlying function doesn't change.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Memory handling functions (newbie questions)
    By PaulBlay in forum C Programming
    Replies: 6
    Last Post: 03-10-2009, 06:37 AM
  2. Newbie Q? how do u turn bool functions off?!!
    By rderiu in forum C++ Programming
    Replies: 6
    Last Post: 01-11-2003, 10:34 PM
  3. a Stupid C Newbie Question
    By Unregistered in forum C Programming
    Replies: 8
    Last Post: 03-08-2002, 05:35 PM
  4. data Structures / Hash functions
    By rickc77 in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 11-11-2001, 03:46 PM
  5. data structures / hash functions
    By rickc77 in forum C Programming
    Replies: 5
    Last Post: 11-11-2001, 01:54 PM