Thread: How to use string hash function

  1. #1
    Registered User
    Join Date
    Jul 2012
    Posts
    87

    How to use string hash function

    For some reason, it's hashing to a very large number like
    1241200051
    for an index (i.e. bucket). I'm building a simple hash table via open addressing w/ linear probing and it's phonebooks where the keys are string names, with 7-digit phone nums as values.

    here is the tested-and-tried string hash function FNV I found @:
    Eternally Confuzzled - The Art of Hashing
    Code:
    unsigned fnv_hash ( void *key, int len )//Q: how do I use generic ptr key?
    {
       unsigned char* p = (unsigned char*)key;
       unsigned h = 2166136261U;
       int i;
     
       for ( i = 0; i < len; i++ )
         h = ( h * 16777619 ) ^ p[i];
     
       return h;
    }
    Code:
    int main()
    {
       int phonebook_1[10] = {0,0,0,0,0,0,0,0,0,0};//always initialize size since specific compilers say its ok so don't get comfy w/ bad practices
        
       size_t size_1 = 10;
        
       string nam_1 = "Cooke";
       void* p_to_nam_1 = &nam_1;
        
       unsigned int bucket= fnv_hash( p_to_nam_1, size_1 );
        
       cout << bucket << endl;
       return 0;
    }
    EDIT: actually I found the official website on how to set the params and values @ FNV Hash, so I'lll take a look first before posting further...
    Last edited by monkey_c_monkey; 07-11-2012 at 09:11 PM.

  2. #2
    Registered User
    Join Date
    Jul 2012
    Posts
    87
    based on the FNV algorithm, what causes the hash value that is returned to keep changing. I thought of using the returned value then mod size_1but the unstable hash function prevents this...

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    You're telling it to hash 10 characters but the string is only 5 bytes long (well 6 including the nul-terminator, but you don't need to hash that).

    So you're hashing whatever random garbage comes after that, as well.

    Worse still, you're hashing the std::string object itself. But a std::string is itself not a char array. It holds one internally through pointers, but it is not itself actually just a char array. You need to call .c_str() to get at that.
    Last edited by iMalc; 07-11-2012 at 10:45 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    You should be doing,
    auto bucket = fnv_hash( nam_1.c_str(), nam_1.size() );
    Also,
    This
    int phonebook_1[10] = {0,0,0,0,0,0,0,0,0,0};
    is the same as
    int phonebook_1[10] = {};
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  5. #5
    Registered User
    Join Date
    Jul 2012
    Posts
    87
    I tried:
    Code:
    auto int bucket= fnv_hash( &nam_1, nam_1.size() );
    b/c
    Code:
    auto int bucket= fnv_hash( nam_1.c_str(), nam_1.size() );
    doesn't compile since first argument expects a generic ptr BUT the former statement still outputs a very large value...

  6. #6
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Any pointer can be converted to void* and back, so the second should compile. Otherwise post the error.
    You've already been told not to use the former since it's wrong, wrong, wrong and utterly wrong in all sorts of ways. You should not hash C++ objects. The internals of an object should not be hashed. Furthermore, the size of the string object is not num_1.size(), so the hash function will even read bytes which do not exist!

    Oh, and remove the "int".
    It should be
    auto bucket = fnv_hash( nam_1.c_str(), nam_1.size() );
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  7. #7
    Registered User
    Join Date
    Jul 2012
    Posts
    87
    This is the errors compiler shows:
    hash_tables.cpp:287: error: ISO C++ forbids declaration of 'bucket' with no type
    hash_tables.cpp:287: error: invalid conversion from 'const void*' to 'void*'
    hash_tables.cpp:287: error: initializing argument 1 of 'unsigned int fnv_hash(void*, int)'


    BTW: line 287 is:
    Code:
    auto bucket= fnv_hash( nam_1.c_str(), nam_1.size() );
    My compiler version:
    Thread model: win32
    gcc version 4.4.1 (TDM-2 mingw32)

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Well, the hashing function is broken.
    Will attempt to fix:
    Code:
    unsigned int fnv_hash(const void* key, int len)
    {
       const unsigned char* p = static_cast<const unsigned char*>(key);
       unsigned int h = 2166136261U;
      
       for (int i = 0; i < len; i++)
         h = (h * 16777619) ^ p[i];
      
       return h;
    }
    As for the first compile error, be sure to use C++11 (or C++0x). Add -std=c++0x if you are compiling manually or find the option to enable it in the compiler options of your IDE.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  9. #9
    Registered User
    Join Date
    Jul 2012
    Posts
    87
    Now it compiles, but it still outputs a large value like 205991382. I did manually use standard c++0x so: g++ -std=c++0x -o hash_tables hash_tables.cpp.

    The parameters to use are corrrect according to the official site @ http://www.isthe.com/chongo/tech/comp/fnv/ ... I'll look around for other string hash functions and try to resolve this...

    Btw, have you tried to implement the code on your end?

  10. #10
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Why do you think it shouldn't output a large value? AFAIK, it's pretty normal with hash functions.
    Also, what is wrong with large values?
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  11. #11
    Registered User
    Join Date
    Jul 2012
    Posts
    87
    You're right, I just reduce the large value from the FNV hash with mod size and now the large value doesn't keep changing (that was my problem b/c of the garbage bits which I'll look into).

    Btw, my compiler it must be:
    Code:
    auto int bucket = fnv_hash(nam_1.c_str(), nam_1.size() );

  12. #12
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Don't mod the hash value unless you want collisions. In a hash table, it is inevitable that you must do so, but remember the consequences: you will get collisions.
    >>auto int bucket = fnv_hash(nam_1.c_str(), nam_1.size() );
    That's wrong. Remove the int. If you compile it properly with the C++11 standard, it will work. Otherwise the auto keyword is useless.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  13. #13
    Registered User
    Join Date
    Jul 2012
    Posts
    87
    I added the param: std=c++0x and now I can do w/o the int, and the only reason I am compressing it w/ mod size is b/c my phonebook_1 hashtable is size 10. I understand collision is inevitable and that perfect hashing is only acheivable as long as you know from the get-go the size of the table and all the keys so that you can tinker like a mad scientist (black art) to figure out how to distribute all the keys w/o collisions. But I'm just building a simple open addressing hash table w/ linear probing method for practice.

  14. #14
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Alright, that is fine. That is typically how hash tables are implemented.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  15. #15
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Quote Originally Posted by Elysia
    Don't mod the hash value unless you want collisions.
    The implication here is that hash algorithms don't produce collisions alone. They do.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Hash function to hash key into geographic coordinate
    By dominic_tran201 in forum C++ Programming
    Replies: 2
    Last Post: 02-11-2011, 10:03 AM
  2. hash function - int to char string + vice versa
    By kerrymaid in forum C Programming
    Replies: 2
    Last Post: 05-29-2010, 11:01 AM
  3. String input looping my hash function
    By iAmFedor in forum C++ Programming
    Replies: 6
    Last Post: 04-25-2008, 04:20 PM
  4. Need help with string hash function
    By pityocamptes in forum C Programming
    Replies: 1
    Last Post: 04-21-2006, 10:23 PM
  5. Performing a SHA-512 hash on a string
    By mjt in forum C Programming
    Replies: 8
    Last Post: 10-23-2005, 12:19 AM