Thread: Help with beginning hash tables, please.

  1. #1
    Registered User
    Join Date
    Dec 2001
    Posts
    28

    Help with beginning hash tables, please.

    recently my mom has become obsessed with "word jumbles" in our local paper, which are scrambled letters that you have to unscramble to form a word. for example, the correct answer to "btior" is "orbit."

    sometimes she gets to one she can't solve though, (and annoys the family with requests for help) and so I decided to write a program to just do them automatically.

    I'm planning to compare the permutations that it generates to a word list, but for now, I'm just printing out all the possible combinations and letting the user sift through them.

    someone suggested setting up a hash table with the word list stored in it. I don't know anything about hash tables, however, so this would be a good learning opportunity. from what I have looked up online, I should make an array of linked lists to prevent collisions.

    here is where I get confused though. how could I dynamically create linked lists? what I mean is, you obviously can't do something like
    Code:
    for (int X = 1; X < 20; X++)
    {
         list<string> listX;
    }
    I think I know (knew) some method on how to do something like this, but it's hazy now.

    another question: will I have to create a linked list class from scratch? I've previously used linked lists, but they were the STL kind. I've never actually created my own linked list structure.

    also, if anyone has any good sites for introductions to hash tables, or any personal advice, I'd be appriciative.

    finally, here's my current code:
    Code:
    #include <iostream>
    #include <string>
    #include <algorithm>
    using namespace std;
    
    int main()
    {
    	string jumbled;
    	int i=0;
    	int loop=0;
    	int garbage = 1;
    
    	cout << "enter your string (no capitals)"<<endl;
    	cin >> jumbled;
    
    	sort(jumbled.begin(), jumbled.end());
    	
    	cout<<endl;
    	cout<< "all possible permutations of your string:"<<endl;
    
    	do 	
    	{
    		
    		for (i = 0; i < jumbled.size(); i++)
    				cout << jumbled[i];
    	
    		loop++;
    		cout << "\n";
    	
    		if (loop%20 == 0)
    		{
    			cout << "press 0 to quit, any other number to continue"<<endl;
    			cin >> garbage;
    		}
    
    		if (garbage == 0)
    			return 0;
    	}
    	while(next_permutation(jumbled.begin(), jumbled.end()));
    	
    	cout <<endl << "to exit, enter any number."<<endl;
    	cin >> garbage;
    
    	return 0;
    }

  2. #2
    Blank
    Join Date
    Aug 2001
    Posts
    1,034
    If you want to do hashing by seperate chaining you would
    create an array of links. Probably easiest to something like
    this

    Code:
    struct Node {
          char* s;
          Node* next;
    };
    
    vector<Node*> chain(N);  The chain would be initialized
    like this
    
    for (int i = 0; i < N; ++i) {
          chain[i] = 0;
    You then need to find a function that maps a string
    to an integer in [0, N - 1]. Once this is done you insert
    strings into your hash table by hashing them to an index k and if
    they arn't in chain[k] you insert them into chain[k].

  3. #3
    Registered User
    Join Date
    Dec 2001
    Posts
    28
    now I have another version that is integrated with the dictionary file. the only problem is that it takes about 2 minutes to solve a five character word.

    to speed this up, I decided to try to use the STL multimap container. now my program has no errors, but it generates about 100 warnings and gives an illegal operation half-way through. I think my code is right, but obviously something is up. I had to break standard programming practice and use global variables (because I had no idea how to pass a multimap as an argument), so maybe that has something to do with it.

    also, I realize that my generate_key function is crappy, but I just threw it together as a test, mostly.

    Code:
    #include <iostream>
    #include <string>
    #include <algorithm>
    #include <fstream>
    #include <time.h>
    #include <map>
    using namespace std;
    
    /////////////////global variables
    multimap<int, string> dict_map;	  
    multimap<int,string>::iterator dict_iter;
    /////////////////
    
    /////////////////function prototypes
    void init_map();
    int generate_key(string);
    /////////////////
    
    int main()
    {
    	string jumbled, dictword;
    	int garbage = 1;       //crap variable
    	int permute_key = 0;
    	
    	cout << "enter your string (no capitals)"<<endl;
    	cin >> jumbled;
    
    	sort(jumbled.begin(), jumbled.end());
    	init_map();
    
    	do 	                            //loop creates permutations
    	{
    		permute_key = generate_key(jumbled);
    		dict_iter = dict_map.find(permute_key);
    		
    		if (dict_iter != dict_map.end())
    			cout << "a possible answer is " << dict_iter->second <<  endl;
    
    	}while(next_permutation(jumbled.begin(), jumbled.end()));
    	
    	cout << "program execution time = " << static_cast<double>(clock())/1000 << " seconds. "<< endl;
    	cout <<endl << "to exit, enter any number."<<endl;
    	cin >> garbage;
    	return 0;
    }
    
    void init_map()
    {
    	ifstream dict("dict.txt");
    	string word;
    	int key=0;
    	int garbage = 0;
    	
    	if (!dict)
    	{
    		cout<<"dictionary file could not be opened."<<endl << " enter a number to quit the program.";
    		cin>>garbage;
    		return;
    	}
    
    	while(dict)
    		{
    			getline(dict, word);
    			transform(word.begin(), word.end(), word.begin(), tolower);
    			
    			key = generate_key(word);
    			dict_map.insert(pair<int,string>(key, word));
    		}
    	dict.close();
    }
    
    int generate_key(string word)
    {
    	int len=0;
    	len = word.length();
    	return (300%len);
    }

  4. #4
    Blank
    Join Date
    Aug 2001
    Posts
    1,034
    Code:
    unsigned int hash_str(const char* name, int n)
    {
        unsigned int h = 0;
        unsigned int t;
    
        while(*name)
        {
            h ^= (h << 4) + *name;
            t = h & 0xf0000000;
            if (t)
            {
                h ^= (t >> 24);
                h ^= t;
            }
            name++;
        }
    
        return h % n;
    }
    I don't understand what your code is trying to do.
    This is a hash function for string. It returns an integer
    between 0 and n - 1. n is the size of table. I don't
    get what your generate_key is supposed to do.
    What you can do is create a hash structure
    like this

    Code:
    const int N = 7691;
    struct Hash_table {
           list<string*> bucket[N];
    };
    For insert maybe something like this
    Code:
    void insert(Hash_table* hash_table,  const string& name)
    {
            // returns a number in [0..N-1]
            int h = hash_str(name.c_str(), N);
       
            // since a dictionaries words should be unique and
            // doesn't matter too much if we have duplicates
            // all that's neccessary is to insert them into the list
            // of words that hash to h
            string* s = new String(name);
            hash_table->bucket[h].push_back(s);
    }
    find is done similary. You want to call hash_str to get a
    number between 0 to N - 1 and then check to see if the string
    is in hash_table->bucket[h]

  5. #5
    Registered User
    Join Date
    Dec 2001
    Posts
    28
    generate_key just generates a key. I forgot to explicitly check that it's within [0,n-1], but it is.

    find and insert are taken care of by STL.

  6. #6
    Blank
    Join Date
    Aug 2001
    Posts
    1,034
    Code:
    int len=0;
    len = word.length();
    return (300%len);
    This returns an integer in [0..len-1].

    You could modify this code

    Code:
    #include <algorithm>
    #include <list>
    #include <iostream>
    #include <fstream>
    #include <string>
    using namespace std;
    
    const int N = 7691;
    struct Hash_table {
        list<string> bucket[N];
    };
    
    Hash_table hash_table;
    
    
    static unsigned int hash_str(const char* name, int n)
    {
        unsigned int h = 0;
        unsigned int t;
    
        while(*name)
        {
            h ^= (h << 4) + *name;
            t = h & 0xf0000000;
            if (t)
            {
                h ^= (t >> 24);
                h ^= t;
            }
            name++;
        }
    
        return h % n;
    }
    
    
    void insert(Hash_table* hash_table, const string& s)
    {
        // returns a number in [0..N-1]
        int h = hash_str(s.c_str(), N);
    
        // since a dictionaries words should be unique and
        // doesn't matter too much if we have duplicates
        // all that's neccessary is to insert them into the list
        // of words that hash to h
        hash_table->bucket[h].push_back(s);
    }
    
    bool lookup(const Hash_table* hash_table, const string& name)
    {
        int h = hash_str(name.c_str(), N);
        const list<string>& bucket = hash_table->bucket[h];
    
    
        return find(bucket.begin(), bucket.end(), name) != bucket.end();
    }
    void init_map()
    {
        ifstream dict("/usr/dict/words");
        string word;
        int key=0;
        int garbage = 0;
    
        if (!dict)
        {
            cout<<"dictionary file could not be opened."<< endl;
            exit(0);
        }
    
        while(dict)
        {
            getline(dict, word);
            insert(&hash_table, word);
        }
        dict.close();
    }
    
    int main(void)
    {
    
    
        init_map();
    
        cout << lookup(&hash_table, "hello") << endl;
        cout << lookup(&hash_table, "foo") << endl;
    
        return 0;
    }
    This will run pretty fast. If you need it faster you could
    try increasing the size of the table or trying a different hash
    function.

  7. #7
    I lurk
    Join Date
    Aug 2002
    Posts
    1,361
    Here's an algorithm which will print out all combinations of a given word: http://www.flashdaddee.com/Tutorials/prelude2.html (You can thank Prelude)

  8. #8
    Registered User
    Join Date
    Dec 2001
    Posts
    28
    thanks for the code, Nick. I modified it to fit my program, and it works really well (execution time went from around 2 minutes to around 6 seconds).

    looking at it though, I think I was confused as to what a hash table really does. here's the pseudocode I had while doing my attempt; sorry to keep bothering you, but could you possibly look it over to see if my thinking was correct?

    Code:
    global
    --------
    make linked list of strings (node)
    make vector of nodes (dict_map)
    
    int hash(string)
    -------------------
    generate a key for the string
    return key
    
    void init_map()
    -------------------
    read word from dictionary
    call hash(word)
    insert word into dict_map, at the location returned by hash(word)
    
    main()
    --------
    call init_map()
    
    read in word from user (jumbled)
    
    for each permutation
         look at dict_map[hash(jumbled)]
         if (dict_map[hash(jumbled)] == jumbled)
              print jumbled
    generate new permutation

  9. #9
    Blank
    Join Date
    Aug 2001
    Posts
    1,034
    What you are doing is associating each string such
    as "hello" with a number between 0 and N-1 where
    N is the size of the hash table. Some hash functions
    are better at this than others. All you really need
    to know about hash_str is that it returns for each
    string what appears to be a random number between
    0 and N-1.

    The problem with hash table occurs when some strings will hash to the same value.
    You want to minimize this but when it does happen you need
    some sort of collision stragedy. If you use seperate
    chaining all you do is store all of the strings that
    hash to the same value in a chain. You can kind of think
    of it as an array of buckets. Words that hash to 0 are
    put in the 0th bucket, words that hash to 1 are put in
    1th bucket and so on. To find a word you hash the word
    to find what bucket it is in. Then it's necessary to search
    for the word in that bucket.

    stl isn't really that good when your trying to create
    your own primitative data structures. For example
    you could write similar code that uses a red black
    tree. This is with map. I don't know what a multimap
    uses. At six seconds I suspect most of the time is spend
    reading the file and the dynamic memory allocation
    of inserting strings into the hash table. There are ways
    to even get around the slowness of dynamic memory
    allocation for the most part.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Iterable hash tables
    By OnionKnight in forum Tech Board
    Replies: 5
    Last Post: 07-21-2008, 02:02 AM
  2. developing hash tables in the C Programming language
    By w108dab in forum C Programming
    Replies: 1
    Last Post: 05-20-2008, 11:20 AM
  3. need help with hash tables with direct chaining
    By agentsmith in forum C Programming
    Replies: 4
    Last Post: 01-05-2008, 04:19 AM
  4. Group Project Help/Volunteer
    By DarkDot in forum C++ Programming
    Replies: 3
    Last Post: 04-24-2007, 11:36 PM
  5. Problems with Hash Tables
    By Zildjian in forum C Programming
    Replies: 6
    Last Post: 11-06-2003, 08:53 PM