Thread: Stupid newbie Q on hash functions.

1. Stupid newbie Q on hash functions.

Array to hash function.

Hi im trying to get my head around reading into a hash function.

I know i must be doing something stupid so please excuse the dumb questions...............

Now i want to read from an array of ints that have a value between 90000-99999. and put them in an array of 64 ints.

Now lets just say for talkings sake i create the index's by simply adding all the numbers(i know this isnt a good way of going about it). I.E 99998 would = 9+9+9+9+8, index 44. How on earth do i go about getting the value 99998 back from index 44?

Any help would be greats, thanks.

2. Originally Posted by iAmFedor
Now lets just say for talkings sake i create the index's by simply adding all the numbers(i know this isnt a good way of going about it). I.E 99998 would = 9+9+9+9+8, index 44. How on earth do i go about getting the value 99998 back from index 44?
You can't. That hash function is not reversible.

3. How on earth do i go about getting the value 99998 back from index 44?
You can't, for the same reason that you can't solve this equation.
Code:
`x + y = z`
In your case, 44 could be 99998 or 99989. You can't tell.

Note that it is indeed a bad hash function because you have duplicate values -- 99998 is the same as 99989, so if you stored both of those numbers you'd run into trouble.

4. ok thank god, im not going mad!

would any of you nice folks be able to direct me to where i can get info on creating hash functions i can reverse?

5. >> so if you stored both of those numbers you'd run into trouble.
Not really. That's just a collision. You run into trouble if you have too many collisions, and too many collisions is a sign of a bad hash function (or a hash table that is too small).

When storing something in a hash table, you store the actual value in the table. So you don't have to go from 44 to 99998, you just look at the value(s) listed at index 44 and see what they are.

It's not necessarily a good idea to use a hash function that you can reverse, since that kind of defeats the purpose. A hash table takes a large range of possible values and converts them to a smaller range.

6. Originally Posted by iAmFedor
ok thank god, im not going mad!

would any of you nice folks be able to direct me to where i can get info on creating hash functions i can reverse?
That depends on the sort of data you are hashing. There are well-known reversible hash functions for integers, because integers are limited in size. For something like a string, though, the string could be any arbitrary length, and so the pigeonhole principle dictates that the hash can't possibly be reversible if it of fixed size. So in general, there is no such thing as a reversible hash for a string, but there is for integers.

http://www.concentric.net/~Ttwang/tech/inthash.htm

7. Originally Posted by Daved
>> so if you stored both of those numbers you'd run into trouble.
Not really. That's just a collision. You run into trouble if you have too many collisions, and too many collisions is a sign of a bad hash function (or a hash table that is too small).

When storing something in a hash table, you store the actual value in the table. So you don't have to go from 44 to 99998, you just look at the value(s) listed at index 44 and see what they are.

It's not necessarily a good idea to use a hash function that you can reverse, since that kind of defeats the purpose. A hash table takes a large range of possible values and converts them to a smaller range.
Of course, ha told you it was a stupid question.

i run the method that creates the index of my number, store that index number as a temp int, and put it into my array at myArray[temp] = number.

8. Originally Posted by Daved
It's not necessarily a good idea to use a hash function that you can reverse, since that kind of defeats the purpose. A hash table takes a large range of possible values and converts them to a smaller range.
I don't think it defeats the purpose. In fact, it guarantees that the hash function will have no collisions, since if it is reversible, it must be one-to-one.

That doesn't mean you'll have no collisions in the hash table but there will be no collisions of raw hash values.

9. >> That doesn't mean you'll have no collisions in the hash table but there will be no collisions of raw hash values.

I would consider the work you do between getting your "raw hash value" and finding the appropriate location in the table to still be part of the overall hash function.

True, in the general sense a hash function can be reversible, but only if the result range is at least as big as the input range. My thought was that if you're attempting to use it for a hash table then it doesn't make much sense for that to be the case.

10. Originally Posted by Daved
True, in the general sense a hash function can be reversible, but only if the result range is at least as big as the input range. My thought was that if you're attempting to use it for a hash table then it doesn't make much sense for that to be the case.
There seems to be a correlation between reversibility and overall quality of the hash function, although this is just empirical. The discussion on the page I linked is pretty complete on the topic.

EDIT: Also, I'm not sure I consider "bucketizing" the hash value to be part of the hash function itself, since this happens differently when the table is resized, even though the underlying function doesn't change.