Thread: Hash function... translated from C++

  1. #1
    Registered User
    Join Date
    Nov 2008
    Posts
    83

    Hash function... translated from C++

    Hey,
    I am trying to translate a C++ implementation of the DJB Hash function into C.
    Here is the original:
    Code:
    unsigned int DJBHash(const std::string& str)
    {
       unsigned int hash = 5381;
    
       for(std::size_t i = 0; i < str.length(); i++)
       {
          hash = ((hash << 5) + hash) + str[i];
       }
    
       return (hash & 0x7FFFFFFF);
    }
    would the following code in C work, I'm confused about a few features...

    Code:
    unsigned int DJBHash(unsigned char *str)
    {
       unsigned int hash = 5381;
    
       for(unsigned int i = 0; i < strlen(str); i++)
       {
          hash = ((hash << 5) + hash) + str[i];
       }
    
       return (hash & 0x7FFFFFFF);
    }
    My main confusion is the "<<" operator. Thanks

  2. #2
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    The '<<' operator is left shift, and it's the same in C and C++. The only problem with your translation is that you call strlen() every pass through the loop, which seriously hurts performance. Call strlen() once up front and store its value.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  3. #3
    Registered User
    Join Date
    Aug 2008
    Location
    Belgrade, Serbia
    Posts
    163
    The << operator is a stl instruction in assembly and it's valid for both C and C++.
    EDIT: Too late..
    Vanity of vanities, saith the Preacher, vanity of vanities; all is vanity.
    What profit hath a man of all his labour which he taketh under the sun?
    All the rivers run into the sea; yet the sea is not full; unto the place from whence the rivers come, thither they return again.
    For in much wisdom is much grief: and he that increaseth knowledge increaseth sorrow.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Couple of points.
    1. C++ str allows \0 to be stored (IIRC), and str.length() does the right thing. C strlen() would stop at the first \0
    2. Calling strlen() every time around the loop is wasteful, especially as the answer is constant. C++ str.length() is less of a problem since it's declared as 'const', meaning it's use can be optimised knowing that the result doesn't change so long as str itself doesn't change.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User
    Join Date
    Nov 2008
    Posts
    83
    Code:
    unsigned int DJBHash(unsigned char *str)
    {
       unsigned int hash = 5381;
       int n = strlen(str);
    
       for(unsigned int i = 0; i < n; i++)
       {
          hash = ((hash << 5) + hash) + str[i];
       }
    
       return (hash & 0x7FFFFFFF);
    }
    So this should fix the inefficiency problem.
    Not sure exactly what you meant in Point 1, Salem. This function in C will stop at the first null terminator. ...?

    Is this code now right?
    Thanks

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Not sure exactly what you meant in Point 1, Salem. This function in C will stop at the first null terminator. ...?
    strlen() takes the length of a null terminated string. However, C++ std::string can contain embedded null characters. This means that the functionality of the original hash function and your C version may be different if null embedded strings are valid input. One solution is to take the length of the string as a second argument to the function.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Registered User
    Join Date
    Nov 2008
    Posts
    83
    Oh ok, I see. But this should not be a problem if I know my string does not contain embedded null characters?

    Also, what does 0x7FFFFFFF add to the return?
    return (hash & 0x7FFFFFFF);

  8. #8
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > int n = strlen(str);
    Being pedantic, n (and by extension i) should be size_t

    & & is bitwise-&
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  9. #9
    Registered User
    Join Date
    Nov 2008
    Posts
    83
    what do you mean they should be size_t... sorry, not familiar.
    Do you mean rather than declaring them as int, they should be size_t?

  10. #10
    Registered User
    Join Date
    Nov 2008
    Posts
    83
    what would that look like in context?

    Thanks for the help!

  11. #11
    Registered User
    Join Date
    Nov 2008
    Posts
    83
    Code:
    unsigned int DJBHash(unsigned char *str)
    {
       unsigned int hash = 5381;
    size_t i;   
    size_t n = strlen(str);
    
       for(i = 0; i < n; i++)
       {
          hash = ((hash << 5) + hash) + str[i];
       }
    
       return (hash & 0x7FFFFFFF);
    }
    Is this what you mean?

  12. #12
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Yes, something like that.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Getting an error with OpenGL: collect2: ld returned 1 exit status
    By Lorgon Jortle in forum C++ Programming
    Replies: 6
    Last Post: 05-08-2009, 08:18 PM
  2. Screwy Linker Error - VC2005
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 06-19-2007, 02:39 PM
  3. Hash function
    By g_p in forum C Programming
    Replies: 1
    Last Post: 05-29-2007, 05:49 AM
  4. Change this program so it uses function??
    By stormfront in forum C Programming
    Replies: 8
    Last Post: 11-01-2005, 08:55 AM
  5. Please Help - Problem with Compilers
    By toonlover in forum C++ Programming
    Replies: 5
    Last Post: 07-23-2005, 10:03 AM