Thread: Doing proper hashing in a hash table?

  1. #16
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    I see. So I should consider this equal to an average hash table solution. Not too shabby; not perfect.
    Maybe I should try to find out the average chain length.
    Right now, I am just treating all elements as buckets. One bucket keeps one element in the perfect scenario.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  2. #17
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Chaining is only one way of dealing with collisions. Another option is linear probing -- if the hash of the element is H, and the table[H] already exists, then insert it at table[H+1] instead (and if that's filled, go to the next and the next until an empty spot is found).

    To generalize linear probe, if table[H] collides, set H = collide(H) and try again. Linear probing means collide(H) = H+1.

    Or, multi-hashing. If Hash(element) is H, and H is a collision, then use Hash2(element), and if that collides go to Hash3(element), for as many hash functions as you've got.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  3. #18
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    I know, but it's not like it's always better. It all depends on the circumstances, so I am simply going to keep it simple and go with the chaining method.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  4. #19
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by brewbuck View Post
    Chaining is only one way of dealing with collisions. Another option is linear probing -- if the hash of the element is H, and the table[H] already exists, then insert it at table[H+1] instead (and if that's filled, go to the next and the next until an empty spot is found).

    To generalize linear probe, if table[H] collides, set H = collide(H) and try again. Linear probing means collide(H) = H+1.

    Or, multi-hashing. If Hash(element) is H, and H is a collision, then use Hash2(element), and if that collides go to Hash3(element), for as many hash functions as you've got.
    The only hash tables I've implemented used either a link list or a b-tree in the bucket, so I'm curious: how do you then find this item later? If I insert a key "abracadabra" and it gets multi-hashed, when someone retrieves abracadabra, do you go thru the same process (try hash1, if that doesn't return abracadabra, try hash2, etc)? Seems to me this is no advantage over a linked list but has a clear disadvantage (the number of multi-hashes must still be finite).

    WRT linear probing, isn't it problematic? What if abracadabra wasn't actually added and someone asked for it? When do you know when to stop looking? At the end of the table?? So now instead of an O(1) lookup you have a O(n) array.

    I presume there is a clever solution here
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #20
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by MK27 View Post
    The only hash tables I've implemented used either a link list or a b-tree in the bucket, so I'm curious: how do you then find this item later? If I insert a key "abracadabra" and it gets multi-hashed, when someone retrieves abracadabra, do you go thru the same process (try hash1, if that doesn't return abracadabra, try hash2, etc)?
    Pretty much.

    Seems to me this is no advantage over a linked list but has a clear disadvantage (the number of multi-hashes must still be finite).
    No overhead for inserting into a linked list, and potentially faster, depending on speed of hashing and, of course, how much it avoids collisions.

    WRT linear probing, isn't it problematic? What if abracadabra wasn't actually added and someone asked for it? When do you know when to stop looking? At the end of the table?? So now instead of an O(1) lookup you have a O(n) array.
    I assume that's why linear probing is a dumb algorithm only worth studying for academic purposes, like insertion sort.
    The same thing happens on insert: it can take forever (theoretically) to find an empty bucket.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  6. #21
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Linear probing is rarely implemented in such a way as to result in actual linear performance. Expansion (increase in the number of buckets) is a necessary evil to get good performance out of simple linear probing as with any hypothetical hash table that always expands on collision.

    Repeated hashing for hash table implementations mostly come in two forms.

    "Double Hashing" (common) uses two entirely different hash algorithms. If the first search fails the second hashing algorithm is recursively called for the next attempts. If the number of tries is the same as the number of buckets you get loose linear behavior, but in practice the number of attempts is significantly less than the number of buckets.

    "Multi-hashing" (uncommon) uses a hash algorithm that requires the hash to be finalized in some specific way. These implementations hash for a specific number of possibilities while only being finalized if a given search doesn't succeed. Generally the values added after processing the actual key are chosen to be uniformly distributed. The number of possibilities is generally fixed for the algorithm instead of the number of buckets so this is rarely linear.

    In practice, most any hash table using simple probing may increment a bucket counter on rehashing to account for the existence of the key at a different location. Insertion suffers a constant cost for the already non-constant collision handling algorithm allowing searching to be constant if the key would never be found. Removal of a pair requires walking the table so this strategy isn't really acceptable for a general purpose hash table. This is possible with some repeated hashing strategies (mostly multi-hashing) but problematic or prohibitively expensive depending on the hash algorithms chosen.

    That said, the above only applies to fairly simple strategies. If expansion occurs at a given load and the appropriate hash algorithms are used constant time search and insert can be achieved with linear probing.

    In other words, linear probing and repeated hashing aren't the algorithm itself; they are strategies that guide the choice of algorithm. Those algorithms range from exceedingly simple (If a bucket isn't available try the very next bucket.) to alarmingly complex (Use this family of hashing functions associated with this mutation box for this number of buckets.).

    Soma

  7. #22
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    There is also quadratic probing and other forms of probing that address some of the shortcomings of linear probing all the while introducing their own set of unique issues.

  8. #23
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Yeah, I'm aware. Anyway, this project has served my needs now, so I'm archiving it.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  9. #24
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    On a side note, the most important advantage to probing hash maps (as opposed to chains or trees or other data structures rooted in the buckets) is its caching and allocation behavior. Dangling a linked list from a bucket means you have to allocate nodes, and they're gonna be all over memory. With probing, the only allocations you do are for the core array.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Hash table help
    By lemon-ice-tea in forum C++ Programming
    Replies: 1
    Last Post: 05-18-2011, 09:31 AM
  2. hash table
    By mexx in forum C++ Programming
    Replies: 0
    Last Post: 06-30-2009, 12:23 PM
  3. Replies: 8
    Last Post: 09-11-2006, 11:26 AM
  4. Hash Table
    By siavoshkc in forum C++ Programming
    Replies: 4
    Last Post: 08-29-2006, 04:29 PM
  5. Hash table
    By Unregistered in forum C Programming
    Replies: 3
    Last Post: 01-14-2002, 07:06 PM