Thread: Hash function in C, need to port it to C++

  1. #16
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    1. You use C++ strings but did not #include <string>.
    2. As pointed out earlier, you should pass key.c_str() to hash().
    3. h is uninitialised in hash().

    I think that if you are going to alter your original code anyway, then it makes sense to just change it to use C++ strings, e.g.,
    Code:
    unsigned int hash(const std::string& key)
    {
    	unsigned int h = 0;
    	const unsigned char* p = reinterpret_cast<const unsigned char*>(key.c_str());
    
    	int len = key.length();
    	for(int i=0; i<len; i++)
    	{
    		h = 33 * h + p[i];
    	}
    	return h;
    }
    This way, you can just call hash(key) in your main function.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  2. #17
    Advanced Novice linucksrox's Avatar
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    198
    Just curious: is one better than the other as far as using the address of the first element or using the c_str() function? My answer would be that using c_str() is easier to read.
    "What are all you parallelograms doing here?" - Peter Griffin (to Joe and his wheelchair buddies)

  3. #18
    the hat of redundancy hat nvoigt's Avatar
    Join Date
    Aug 2001
    Location
    Hannover, Germany
    Posts
    3,130
    Code:
    #include <iostream>
    #include <string>
    
    using namespace std;
    
    unsigned hash( const void* key, int len );
    
    int main()
    {
    	string text;
    
    	cout << "String: ";
    	cin >> text; 
    	
    	cout << "Hashed: " << hash( text.c_str(), text.length() );
    
    	cin.ignore();
    
    	return 0;
    }
    
    unsigned hash( const void* key, int len)
    {
    	const unsigned char* p = (const unsigned char*)key;
    
    	unsigned h=0;
    
    	int i;
    
    	for(i=0; i<len; i++)
    	{
    		h=33*h+p[i];
    	}
    
    	return h;
    }
    Those are minimal changes to your code, take a close look and find out what has changed and why. Changes are in the function declaration, it now takes a const pointer, because aside from the compiler error, it's the correct way of doing stuff.
    hth
    -nv

    She was so Blonde, she spent 20 minutes looking at the orange juice can because it said "Concentrate."

    When in doubt, read the FAQ.
    Then ask a smart question.

  4. #19
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    >> Just curious: is one better than the other as far as using the address of the first element or using the c_str() function? My answer would be that using c_str() is easier to read.

    Yes, c_str() is better. Passing an address to the first element is not guaranteed to provide a valid C style string. It is legal and possible (although somewhat unlikely) that parts of the string are stored in separate places rather than a contiguous array of characters. Only vector is guaranteed to have contiguous storage, so taking the address of the first element is valid only for vector.

    The other, more practical issue is that the C++ string's internal data might not be null terminated. There are implementations that do not null terminate the string internally all the time. In those cases you will get garbage after the string until the first null is encountered in memory that doesn't belong to you.

    So c_str() is easier to read, and taking the address of the first element of the string is not guaranteed to work.

  5. #20
    Registered User
    Join Date
    Nov 2004
    Location
    Pennsylvania
    Posts
    434
    Thanks nvoight your code works.
    "Anyone can aspire to greatness if they try hard enough."
    - Me

  6. #21
    Advanced Novice linucksrox's Avatar
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    198
    So c_str() is easier to read, and taking the address of the first element of the string is not guaranteed to work.
    Why is it possible for c_str() to be guaranteed if passing the address of the first element is not? You say that the underlying storage of a string may not be stored contiguously, so what does c_str() really do different besides return an address? Does it make sure that the characters are arranged in order and null terminated?
    "What are all you parallelograms doing here?" - Peter Griffin (to Joe and his wheelchair buddies)

  7. #22
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    Yes. It is allowed for an implementation to move the string (or make a copy) to somewhere else so that c_str() returns contiguous and null terminated characters. The implementation can wait to do this until c_str() is actually called, which would be an optimization if you don't call c_str().

    In practice, at least one library I've seen leaves the null-terminator off the end, and then just adds it if c_str() is called. I don't remember which one, but I've seen issues where people get garbage when they use data(), which is the same as c_str() but doesn't null terminate automatically.

  8. #23
    Advanced Novice linucksrox's Avatar
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    198
    Thanks Daved, that makes sense.
    "What are all you parallelograms doing here?" - Peter Griffin (to Joe and his wheelchair buddies)

  9. #24
    Registered User
    Join Date
    Nov 2004
    Location
    Pennsylvania
    Posts
    434
    Could someone explain what type of data h is?

    It's just declared: unsigned. What does that mean?

    Thanks!
    "Anyone can aspire to greatness if they try hard enough."
    - Me

  10. #25
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    h is an unsigned int.

  11. #26
    Registered User
    Join Date
    Nov 2004
    Location
    Pennsylvania
    Posts
    434
    So the compiler just assumes an int then?
    "Anyone can aspire to greatness if they try hard enough."
    - Me

  12. #27
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    unsigned if a modifier of an integral type. If there is no length specifier (short, long, char, int) given, the compiler assumes int. So "unsigned x;" is functionally equivalent to "unsigned int x;"

  13. #28
    Registered User Joelito's Avatar
    Join Date
    Mar 2005
    Location
    Tijuana, BC, México
    Posts
    310
    Hi... does this gives the same results:
    Code:
    #include <iostream>
    #include <string>
    
    using std::cout;
    using std::cin;
    using std::endl;
    using std::string;
    
    int Hash(const string& key) {
    	int h = 0, i = 0;
    	for(i=0; i<key.size(); i++) {
    		h=33*h+key.c_str()[i];
    	}
    	return h;
    }
    
    int main(void) {
    	string data;
    	cout << "String: "; cin >> data; // Joel
    	cout << "Hashed: " << Hash(data) << endl; // 2783658
    	return 0;
    }
    * PC: Intel Core 2 DUO E6550 @ 2.33 GHz with 2 GB RAM: Archlinux-i686 with xfce4.
    * Laptop: Intel Core 2 DUO T6600 @ 2.20 GHz with 4 GB RAM: Archlinux-x86-64 with xfce4.

  14. #29
    The larch
    Join Date
    May 2006
    Posts
    3,573
    Probably not, because now the return type is signed int.

    Code:
    33*h+key.c_str()[i];
    Because this wraps over (to negative values) if h becomes too large.

    But why would you go through all the pain to get the character if [] is meant for that?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 05-13-2011, 08:28 AM
  2. Hash function
    By g_p in forum C Programming
    Replies: 1
    Last Post: 05-29-2007, 05:49 AM
  3. Replies: 28
    Last Post: 07-16-2006, 11:35 PM
  4. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM
  5. Replies: 5
    Last Post: 02-08-2003, 07:42 PM