Thread: Dynamic Hash Table

  1. #1
    Registered User
    Join Date
    Oct 2012
    Posts
    1

    Dynamic Hash Table

    Hello,

    I am attempting to code a dynamic hash table. The number of buckets is doubled when one of the buckets contains 20 or more items. If I ignore this (and thus run the program as a statically sized hash table) the program runs and completes as expected with zero memory leaks.

    However, if I attempt to resize the table, crazy things happen. I'm not attempting an in-place resize but instead creating a second hash table then iterating through every bucket in the first hash table, recalculating the new position, and placing the items into the new hash table. Afterwards I free the old hash table.

    Due to the fact that the statically sized hash table works, I have narrowed down the error to the only new function, expand(...). The code for which is below.

    This is my first post on this forum and the program is part of an exercise for University (of which I'm about two weeks ahead and thus haven't been receiving assistance from tutors). If further code/context is required I will be able to produce the code.

    Code:
    ..............
    
    
    typedef struct bucketItem {
        struct bucketItem *next;
        MEntry *entry;
    } entry;
    
    
    typedef struct bucket {
        int entries;
        struct bucketItem *head;
        struct bucketItem *tail;
    } bucket;
    
    
    struct mlist {
        unsigned long size;
        struct bucket **buckets;
    };
    
    ..............
    
    static MList *expand(MList **ml, MEntry *me) {    MList *oldTable, *newTable;
        unsigned long i, oldPosition, newPosition;
        bucket *oldBucket, *newBucket;
        entry *oldEntry, *newEntry;
        
        oldTable = *ml;
        /* Create a new hash table */
        if( (newTable=(MList *)malloc(sizeof(MList))) == NULL) {
            fprintf(stderr, "%s\n", "Unable to allocate memory for mailing list.");
            return NULL;
        }
        /* Make the new hash table twice the size of the old one */
        newTable->size = oldTable->size * 2;
        if( (newTable->buckets=malloc(sizeof(bucket)*(newTable->size))) == NULL) {
            fprintf(stderr, "%s\n", "Unable to allocate memory for mailing list.");
            return NULL;
        }
        /* Initialise the new buckets */
        for(i=0; i<(newTable->size); i++) {
            if((newTable->buckets[i]=malloc(sizeof(bucket))) == NULL) {
                fprintf(stderr, "%s\n", "Unable to allocate memory for entry.");
                return NULL;
            }
            newTable->buckets[i]->entries = 0;
            newTable->buckets[i]->head = NULL;
            newTable->buckets[i]->tail = NULL;
        }
        /* Move every entry in the old hash table over to new position in new hash table */
        for(oldPosition = 0; oldPosition < (oldTable->size); oldPosition++) {
            oldBucket = oldTable->buckets[oldPosition];
            oldEntry = oldBucket->head;
            while(oldEntry != NULL) {
                if((newEntry=(entry *)malloc(sizeof(entry))) == NULL) {
                    fprintf(stderr, "%s\n", "Unable to allocate memory for new entry");
                    return NULL;
                }
                newEntry->entry = oldEntry->entry;
                newEntry->next = NULL;
                oldEntry->entry = NULL;
                oldEntry = oldEntry->next;
                newPosition = me_hash(newEntry->entry, newTable->size);
                newBucket = newTable->buckets[newPosition];
                if(newBucket->head == NULL) {
                    newBucket->head = newEntry;
                } else {
                    newBucket->tail->next = newEntry;
                }
                newBucket->tail = newEntry;
                newBucket->entries += 1;
            }
        }
        /* Now add new entry into the new hash table */
        if((newEntry=(entry *)malloc(sizeof(entry))) == NULL) {
            fprintf(stderr, "%s\n", "Unable to allocate memory for new entry");
            return NULL;
        }
        newEntry->entry = me;
        newEntry->next = NULL;
        newPosition = me_hash(me, newTable->size);
        newBucket = newTable->buckets[newPosition];
        if(newBucket->head == NULL) {
            newBucket->head = newEntry;
        } else {
            newBucket->tail->next = newEntry;
        }
        newBucket->tail = newEntry;
        newBucket->entries += 1;
        ml_destroy(oldTable);
        return newTable;
    }
    
    ..............

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > oldTable = *ml;
    Where is
    *ml = newTable;

    > if( (newTable=(MList *)malloc(sizeof(MList))) == NULL)
    There is no need to cast malloc in a C program.
    Indeed, some of your malloc calls are cast-less, so clearly it isn't an issue for you.

    > if( (newTable->buckets=malloc(sizeof(bucket)*(newTable->size))) == NULL) {
    This should be sizeof(bucket*)
    You're allocating an array of pointers here.

    > MList **ml, MEntry *me
    You seem to have several styles for naming structs and typedef's.
    Applying a bit of consistency to the whole naming of things will keep things clearer.

    Lines 75 to 90 and 56 to 72 are basically the same.
    You should be able to factor this out into a helper function.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Do you realize that if you saved the original hash (not mapped to a bucket, the full hash) for each hashed item, you could trivially resize the hash table?

    Consider these types for a start.
    Code:
    typedef unsigned long     hash_t;
    typedef struct hash_item  hash_item_t; /* Clbuttic. Just remove the first _. */
    typedef struct hash_entry hashentry_t;
    typedef struct hash_table hashtable_t;
    
    /* Fixed-size hash table item: */
    struct hash_item {
    	hash_t             hash;
    	/* Hashed data follows: */
    	size_t             size;
    	void              *data;
    };
    
    /* Dynamically allocated, dynamically sized hash table entry: */
    struct hash_entry {
    	size_t             size;   /* Number of items allocated */
    	size_t             used;   /* Number of items used */
    	struct hash_item   item[];
    };
    
    /* Dynamically allocated, dynamically sized and resized hash table: */
    struct hash_table {
    	size_t             size;
    	hash_t             modulus;  /* Same as size */
    	struct hash_entry *entry[];
    };
    and for the hash function, for example
    Code:
    /* Bernstein hash function (djb2), XOR variant.
    */
    static inline hash_t hash(const void *const dataptr, const size_t bytes)
    {
    	const unsigned char       *data = (const unsigned char *)dataptr;
    	const unsigned char *const ends = bytes + (const unsigned char *)dataptr;
    	hash_t                     result = 5381UL;
    
    	while (data < ends)
    		result = (result * 33UL) ^ (hash_t)(*(data++));
    
    	return result;
    }
    The hash table type is dynamically allocated, to sizeof (struct hash_table) + entries * sizeof (struct hash_entry *) bytes. Each pointer is initialized to NULL. For hash h, the entry in table is table->entry[ h % table->modulus ], and it will be NULL if the entry contains no hashed items.

    A hash entry can also be reallocated at any time, to size sizeof (struct hash_entry) + size * sizeof (struct hash_item) chars. The used field is initialized to the number of items used, and size the number of items allocated. A good strategy for the size is a bit complex issue, but usually you want to allocate only a few items first (to reduce memory use). I'd guess 2 to 5 to 10 to 20 might work.

    Each hash item contains the original hash. I'd recommend keeping the hashes in sorted order, so you can do a binary search (on the original hash for the hashed items mapped to the same hash bucket/entry). Remember to compare hash and size; only if those two match, do a binary comparison using e.g. memcpy() .)

    You can resize each entry by simply reallocating the entry pointer, and updating the new size to match. The number of entries used is used.

    To resize the hash table, you allocate a new table, of size sizeof (struct hash_table) + new_size * sizeof (struct hash_entry) entries, and clear all entries to NULL. Allocate a temporary array of size_t's, one for each entry, and you can use that to preallocate each entry so you don't need to do any allocations while copying items from the old table to the new table. Remember that the items in an entry in the old table are likely to be spread to different entries; but, when you have counted them beforehand, you know how many there will be in the end. Copy each item from the old table entries to the new table, free()ing each entry in the old table when you have scanned it, then free() the old table, and the temporary array.

    Note that resizing the table will usually change the pointer to the table. I typically use an interface similar to
    Code:
    hashtable_t *hashtable_rehash(hashtable_t *table, const size_t new_size);
    where the new size may be any positive integer (that does not overflow the size calculation); the pointer to the "rehashed" table is returned, and the original pointer is no longer valid.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Hash table help
    By lemon-ice-tea in forum C++ Programming
    Replies: 1
    Last Post: 05-18-2011, 09:31 AM
  2. Dynamic Hash Table Implementation
    By james123 in forum C Programming
    Replies: 1
    Last Post: 12-30-2009, 11:07 AM
  3. Hash Table
    By mrsirpoopsalot in forum C++ Programming
    Replies: 11
    Last Post: 11-14-2009, 09:10 PM
  4. Hash Table
    By Cpro in forum C++ Programming
    Replies: 3
    Last Post: 03-20-2008, 02:14 PM
  5. dynamic hash table resizing
    By agentsmith in forum C Programming
    Replies: 1
    Last Post: 01-19-2008, 04:59 PM