What is hashing?

This is a discussion on What is hashing? within the A Brief History of Cprogramming.com forums, part of the Community Boards category; Hi, What is hashing? What are they used for? Pls explain in a bit more detail coz i really dont' ...

  1. #1
    Registered User Nutshell's Avatar
    Join Date
    Jan 2002
    Posts
    1,020

    What is hashing?

    Hi,

    What is hashing? What are they used for?

    Pls explain in a bit more detail coz i really dont' know what it really is, purpose, and usage.

    thnx a lot

  2. #2
    Registered User
    Join Date
    Jan 2002
    Posts
    381
    Firstly isn't the Genreal Discussion board for "Non-programming related topics"?

    Secondly hashing is a sorting algorithm. There are two types of hashing algorithms:

    Open Hashing:
    To quote a text book: "The essential idea is that the (possibly infinate) set of potential set members is partitioned into a finite number of classes."
    In english this means that if you have a stream of data (this could be anything from a short array to the constant stream of telephone billing information) you have a few set ranges (called buckets) that each element in the stream can fall into. As you process the stream you simply place each element into one of these buckets and move on.
    For example sorting a huge dictionary file into alphabetic order, you can pass through the first time just testing the first letter of each word and then placing them into different files based on that. This means you would end up with all the A's in one place, all the B's in another and so on. That can dramaticly reduce the future number of operations you need to perform.

    Closed Hashing:
    Closed hashing basicly means that each bucket can only store one element. In text book language "A closed hash keeps the members of the dictionary in the bucket table itself, rather than using the table to store list headers. As a consequance, it appears we can only put one element in any bucket. However, associated with closed hashing is a rehash strategy. If we try to place x into bucket h(x) and find it already holds an element, a situation called a collision, the rehash stratgy choses a sequance of alternate locations, h1(x), h2(x) . . . within the bucket table, in which we could place x. We try each of these location untill we find an empty one. If none is empty then the table is full and we cannot place x."
    This basicly just means that each bucket is now limited to one element. If you try to put another element into an already full bucket then there is a logical way that you go about searching the other buckets until you find an empty bucket. When you can nolonger find any empty buckets your whole thing is full and you have to start deleting elements.
    If you own a piece of land and there is an volcano on it and it ruins a
    nearby town, do you have to pay for the property damage?

  3. #3
    Blank
    Join Date
    Aug 2001
    Posts
    1,034
    Here's a fast example I did which uses hashing with seperate
    chaining. It's in c++ but there's no classes or templates.
    It works basically works by hashing a string which returns a number n and then it inserts it into A[n]. When it trys to find a string it hashes a string and then it looks for the string
    in A[n]. For seperate chaining having the table size being
    the smallest prime number greater than the table size is good. The code to hash a string was from weiss's book, basically
    it uses horner's algorithm.


    Code:
    #include <cstdio>
    #include <cstdlib>
    #include <cstring>
    #include <iostream>
    #include <fstream>
    #include <string>
    using namespace std;
    
    const int TABLE_SIZE = 45007;
    
    struct Node {
        char* s;
        Node* next;
    };
    
    struct Hash_table {
        Node* buckets[TABLE_SIZE];
    };
    
    int hash(const char* s, int tbl_size);
    void hash_init(Hash_table* h);
    void hash_insert(Hash_table* h, const char* s);
    Node* hash_find(Hash_table* h, const char* s);
    void build_dictionary(Hash_table* h, char* filename);
    
    int main(void)
    {
        Hash_table ht;
        string s;
    
        hash_init(&ht);
    
        build_dictionary(&ht, "/usr/share/dict/words");
    
        while(getline(cin, s)) {
            Node* n = hash_find(&ht, s.c_str());
            if (n != 0)
                cout << "the string = " << n->s << endl;
            else
                cout << "not in the dictionary" << endl;
        }
    
        return 0;
    }
    
    
    int hash(const char* s, int tbl_size)
    {
        int r = 0;
    
        for (int i = 0; s[i] != '\0'; ++i)
            r = 37 * r + s[i];
    
        r %= tbl_size;
    
        if (r < 0)
            r += tbl_size;
        return r;
    }
    
    void hash_init(Hash_table* h)
    {
        for (int i = 0; i < TABLE_SIZE; ++i)
            h->buckets[i] = 0;
    
    }
    
    void hash_insert(Hash_table* h, const char *s)
    {
        int hash_nr = hash(s, TABLE_SIZE);
        Node** lst = &h->buckets[hash_nr];
        Node* curr = *lst;
        bool found = false;
    
    
        while(curr != 0 && !found) {
            if (!strcmp(curr->s, s))
                found = true;
            curr = curr->next;
        }
    
        // if not found insert at the beginning of the list
       if (!found) {
            Node* tmp = new Node;
            tmp->s = new char[strlen(s) + 1];
            strcpy(tmp->s, s);
    
            tmp->next = *lst;
            *lst = tmp;
        }
    }
    
    Node* hash_find(Hash_table* h, const char* s)
    {
        int hash_nr = hash(s, TABLE_SIZE);
        Node* curr = h->buckets[hash_nr];
        Node* n = 0;
    
        while(curr != 0 && n == 0) {
            if (!strcmp(curr->s, s))
                n = curr;
            curr = curr->next;
        }
    
        return n;
    }
    
    void build_dictionary(Hash_table* h, char* filename)
    {
        ifstream inf(filename);
        string s;
    
        if (!inf) {
            cout << "file not found" << endl;
            exit(0);
        }
    
        while(getline(inf, s))
            hash_insert(h, s.c_str());
    }

  4. #4
    Blank
    Join Date
    Aug 2001
    Posts
    1,034
    Code:
    Secondly hashing is a sorting algorithm. There are two types of hashing algorithms:
    Sorting is when you have a sequence
    (a1, a2, a3, a4 ...) and you permute the sequence so
    that you have
    (b1, b2, b3, b4 ...) with
    b1 <= b2 <= b3 <= b4 so hashing really isn't
    a sorting algorithm. I think you mean searching.

  5. #5
    Registered User
    Join Date
    Jan 2002
    Posts
    381
    >>a sorting algorithm. I think you mean searching.

    Thanks, the words never seem to come out propperly at 3 in the morning.
    If you own a piece of land and there is an volcano on it and it ruins a
    nearby town, do you have to pay for the property damage?

  6. #6
    Unregistered
    Guest
    Hashing produces an ideal situation where the speed of the search is the same for everything you might search for. Of course this is not easy, and collisions have to be resolved through the use of hashing functions.

  7. #7
    geek SilentStrike's Avatar
    Join Date
    Aug 2001
    Location
    NJ
    Posts
    1,141
    I don't think these are ever deallocated...

    Code:
            Node* tmp = new Node;
            tmp->s = new char[strlen(s) + 1];
    Prove you can code in C++ or C# at TopCoder, referrer rrenaud
    Read my livejournal

  8. #8
    Registered User Nutshell's Avatar
    Join Date
    Jan 2002
    Posts
    1,020
    How do u test for algorithms that produces no or very little collisions? Everybody uses their own version of hashing functions?

    thnx

  9. #9
    Blank
    Join Date
    Aug 2001
    Posts
    1,034
    I don't think these are ever deallocated...
    When the program exits, I wrote it kind of fast.

    You can count the average number of collisions
    for each insert. It should be close to one. Other
    than that, the hash algorithm should usually use every bit of what your hashing and generate an integer [0, table_size - 1]. each key should be equally likely. Once you have the hash
    algorithm you have several techniques for what
    happens when you have collisions such as seperate chaining,
    linear probing etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Creating Linked List Using Hashing and Stack
    By m0ntana in forum C Programming
    Replies: 2
    Last Post: 04-07-2007, 07:11 AM
  2. Hashing, indexing and phonetic normalization for approximate str matching...
    By biterman in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 11-21-2006, 08:42 AM
  3. Replies: 8
    Last Post: 09-11-2006, 11:26 AM
  4. Doubt about Hashing
    By louis_mine in forum C Programming
    Replies: 1
    Last Post: 11-10-2005, 11:00 PM
  5. Double Hashing
    By carrja99 in forum C++ Programming
    Replies: 1
    Last Post: 03-28-2003, 07:36 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21