Thread: Question about Hash (once more)

  1. #1
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489

    Question Question about Hash (once more)

    Anyone know how important Hash is? Hash table, etc.

    I've read in many (Compiler/Interpreter Construction)/Parsing books, including The Dragon Book, that to match series of string with one string, we must use hashes.

    For instance, matching the current token with symbols in symbol table.

    Because of my stupid-ness and I don't know how those hashing algorithm works.
    The only one that can I do is just TEST them. And that spend 3 days (3 x 24 hours) to test those algorithm. Fortunately I have 3 computers in my house with 2 cores each (so I can do multithreading).

    http://cboard.cprogramming.com/showthread.php?t=110908

    Somebody has made a program like that, but with random strings and I absolutely don't trust to those random things. I just need the exact one that guarantee anything.

    And what is the result?
    RS, JS, PJW, ELF, BKDR, SDBM, DJB, DJB2, DEK and AP hash algorithms still have collisions even with C variable naming character set: A..Z, a..z, 0..9, _, $

    What's wrong with linear-strings searching?

    Any comments would be appreciated.

    Thanks in advance.
    Last edited by audinue; 01-12-2009 at 06:50 AM.
    Just GET it OFF out my mind!!

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Hashing is a very important technique in many problems.

    However, unless you have a perfect hasher (and finding an formulaic perfect hasher is nearly impossible), hash comparison can only be used as a fast negative chooser! A search in a hash table first looks up candidates by hash, and then does a normal comparison to find a definite match.

    This means that a few collisions don't matter. Unless it's a cryptographic hash, but that's a completely different area.

    Comparison of a series of known strings with one unknown string means that you have precomputed the hashes of the known strings. So you compute the hash of the new string and then can do just a single comparison with every string of the series. If you do normal string comparison, you have to compare the full strings every time. This is a lot slower.
    Or you do a hash lookup. Since hashes are numbers, you can modulo them to an array index and quickly find possible candidates for matching. Thus, you avoid comparing against every string in the set. If you use normal string comparison, the best you can do is a tree of the strings.

    So, for M strings and a string length of N (for simplicity, we assume that all strings are the same length), the complexity of completely naive search is M*N. For naive comparison in a map, it's log(M) * N. For hashing with naive search, assuming no collisions in the particular string set you have (which is likely), the cost is M+N (N for calculating the hash of the search string, M for comparing it against every hash, and N again (but that doesn't matter) for doing the validation comparison). For a hash map, the cost is N, for calculating the hash of the search string and doing the validation comparison.

    So as you can easily see, the hash map has by far the best complexity.

    You're just hung up on the details.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489
    However, unless you have a perfect hasher (and finding an formulaic perfect hasher is nearly impossible), hash comparison can only be used as a fast negative chooser! A search in a hash table first looks up candidates by hash, and then does a normal comparison to find a definite match.
    I got it! Thanks.

    You're just hung up on the details.
    Thank you for remind me, CornedBee.

    By the way, do you have any suggestion for choosing hash algorithm?

    I quiet don't understand about them...
    Last edited by audinue; 01-12-2009 at 10:31 AM.
    Just GET it OFF out my mind!!

  4. #4
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Hash collisions are no big deal. If one occurs you can decide what to do but it is my opinion that instead of re-hashing or doing something else that it is the user of the hash table, not the hash table object itself, to use a different string. This works in most applications and is far simpler to code.

    Hashes are extremely important when it comes to strings. Sure you could use a map but you are looking up a string value every time you access the map. With a hash you just hash the string once and return it to the caller. Now the caller accesses the object via the hash and you get O(1) access as opposed to O(log N) with a map.

    I have made hash tables that will compute a random string and re-hash and all in all it really does not take long to create random strings to resolve collisions. In either case the user of the hash table will receive it's ID back from the hash table object just the same. There is another approach where you can add the object to a list at the hashed index. This gives O(1) to find the index of the object and O(n) to resolve collisions.


    There are some excellent hash algorithms available on the internet.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    It depends on the situation. When the strings you add to the table come from some external source, you cannot well ask it to use a different string, so you'll go with the link bucket hash map.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489
    link bucket hash map
    What is it, CornedBee?

    Btw, I'm quiet traumatic with hashes since I use JScript regularly to maintain my computer and I know that JScript compares 2 string with hash, there are a lot of unexpected result with it.

    For instance, checking files existence, based on one huge file name list, one-by-one is better than loading the file name list into array of string and then check them using standard linear searching.
    Just GET it OFF out my mind!!

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    There are basically two ways to react to collisions in a hash map. One is to use chained hashing - if you get a collision, you apply further calculations (often based on a second hash function) to find a different place for the element. This can take a very long time if the hash map is nearly full, and you can't have more elements than buckets. (So such a hash map would be forced to rehash.)

    The other way is to make every bucket (every slot of the hash map) a linked list. If two elements hash to the same value they both get inserted into this linked list. This has advantages (especially in simplicity of implementation and use), but also disadvantages (allocation is more complicated). This is what I referred to as 'link bucket hash map'.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File I/O - Hash files
    By I BLcK I in forum C++ Programming
    Replies: 3
    Last Post: 04-29-2009, 01:30 AM
  2. another do while question
    By kbpsu in forum C++ Programming
    Replies: 3
    Last Post: 03-23-2009, 12:14 PM
  3. Question...
    By TechWins in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 07-28-2003, 09:47 PM
  4. opengl DC question
    By SAMSAM in forum Game Programming
    Replies: 6
    Last Post: 02-26-2003, 09:22 PM
  5. Hash tables
    By PJYelton in forum C++ Programming
    Replies: 0
    Last Post: 11-27-2002, 10:42 AM