Thread: hash tables

  1. #1
    Hamster without a wheel iain's Avatar
    Join Date
    Aug 2001
    Posts
    1,385

    hash tables

    I am writing a langauge filter for network chat system im writing, i want to search the message for any disallowed words. At the moment each word is hard coded in an array, though efficient it doesnt allow modifications, i need to use them in a file. I have been recommended to use hash tables, can someone explain what these are and how i might go about using them
    thanks in adv.
    Monday - what a way to spend a seventh of your life

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    Okay, a hash table is essentially an efficient way to store data in an array according to a unique identifier number. The efficient part about it is that searching a hash table is incredibly fast. Ready to have some fun? I prefer to use a dynamic array template for the table:
    Code:
    template <class T>
    class HashTable{
    	public:
    		HashTable(int n) : size(n) {
    		array = new T[n];
    		for( int i = 0; i < n; ++i)
    			array[i] = T();
    		int size(){return size;}
    		T& operator[](int x){return array[x];}
    	protected:
    		T *array;
    		int size;
    };
    Let's say that the table will consist of the first and last names of people, just to keep it simple. You would use a struct similar to this:
    Code:
    struct Person{
    	bool isNull(){return bool(last.length() == 0);}
    	string last;
    	string first;
    };
    Then there's the part of the main program that actually creates the hash table and puts stuff in it.
    Code:
    int main(void){
    	ifstream IN("people.txt");
    	Person person;
    	HashTable<Person> table(SIZE);
    	while(get(person, IN)){
    		int i = hash(person.last + person.first);
    		while(!table[i].isNull())
    			i = (i+1) % SIZE;
    		table[i] = person;
    	}
    	print(table);
    }
    Then you define the actual hash function that turns the two concatenated names into a number.
    Code:
    int hash(string s){
    	int h = 0;
    	for(int i = 0; i < s.length(); ++i)
    		h += int(s[i]);
    	return h % SIZE;
    }
    And then of course you need the rest of the functions for the program
    Code:
    bool get(Person& person, ifstream IN){
    	char buffer[BUFSIZE], temp[BUFSIZE];
    	IN.getline(buffer, BUFSIZE);
    	if(IN.fail()) return false;
    	istrstream s(buffer);
    	s.getline(temp, BUFSIZE, '\t'); //names separated by tab
    	person.last = temp;
    	s.getline(temp, BUFSIZE, '\t');
    	person.first = temp;
    	s.getline(temp, BUFSIZE, '\t'); //eat the tab :)
    	return true;
    }
    
    void print(Person& person){
    	cout<<person.last<<", "<<person.first<<endl;
    }
    
    template <class T>
    void print(HashTable<T>& t){
    	for(int i = 0; i < t.size(); ++i){
    		cout<<setiosflags(ios::right)<<setw(4)<<i<<". "; //print index #
    		if(!t[i].isNull()) printf(t[i]);
    		else cout<<endl;
    	}
    }
    Having fun yet? Be sure to run it and toy with it, I'm not sure if there are errors because I wrote this off the top of my head, so be ready for debugging and please accept my appologies. You'll need to include <fstream>, <iomanip>, <iostream>, and <sstream>.

    -Prelude
    Last edited by Prelude; 12-08-2001 at 11:09 AM.
    My best code is written with the delete key.

  3. #3
    Hamster without a wheel iain's Avatar
    Join Date
    Aug 2001
    Posts
    1,385

    Talking

    Thanks Prelude, thats great - i understand now
    Monday - what a way to spend a seventh of your life

  4. #4
    geek SilentStrike's Avatar
    Join Date
    Aug 2001
    Location
    NJ
    Posts
    1,141
    It needs a destructor to get rid of the allocated memory.

    Also, a more efficient implementation of a hash table for containing x uses an array of x uses an array of linked lists of x rather than an array of x.

    This hash tables performance degenerates very quickly, even with a good hash function, as the number of elements in the table approaches the size of the array. When the number of elements is the size, a linear search of half the table is required on average, when there is one less element in the list than a the size of the array, a linear search of a quarter of the array is employed on average. This is because things that hash to 7 can effect the search for things that hash to 6.

    For theoritical example, consider adding the following
    blah hashes to 6, at index 6
    word hashes to 7, at index 7
    etc hashes to 6, at index 8

    Search for etc
    start at index 6
    does blah equal etc? no, go to 7
    does word equal etc? no, go to 8
    does etc equal etc? yes, return etc

    If this was implemented using a linked list, searching for items at index 6 requires searching the list for 6, and nothing else. So, even with a large amount of insertions at links slightly greater than 6, the speed of searching for an element at 6 will not slow down.

    Then again, surely the guys who wrote the freely availible (SGI) STL hash_map have me beat in performance .
    Prove you can code in C++ or C# at TopCoder, referrer rrenaud
    Read my livejournal

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Iterable hash tables
    By OnionKnight in forum Tech Board
    Replies: 5
    Last Post: 07-21-2008, 02:02 AM
  2. developing hash tables in the C Programming language
    By w108dab in forum C Programming
    Replies: 1
    Last Post: 05-20-2008, 11:20 AM
  3. need help with hash tables with direct chaining
    By agentsmith in forum C Programming
    Replies: 4
    Last Post: 01-05-2008, 04:19 AM
  4. Group Project Help/Volunteer
    By DarkDot in forum C++ Programming
    Replies: 3
    Last Post: 04-24-2007, 11:36 PM
  5. Problems with Hash Tables
    By Zildjian in forum C Programming
    Replies: 6
    Last Post: 11-06-2003, 08:53 PM