-
Dynamic Hash Table
Hello,
I am attempting to code a dynamic hash table. The number of buckets is doubled when one of the buckets contains 20 or more items. If I ignore this (and thus run the program as a statically sized hash table) the program runs and completes as expected with zero memory leaks.
However, if I attempt to resize the table, crazy things happen. I'm not attempting an in-place resize but instead creating a second hash table then iterating through every bucket in the first hash table, recalculating the new position, and placing the items into the new hash table. Afterwards I free the old hash table.
Due to the fact that the statically sized hash table works, I have narrowed down the error to the only new function, expand(...). The code for which is below.
This is my first post on this forum and the program is part of an exercise for University (of which I'm about two weeks ahead and thus haven't been receiving assistance from tutors). If further code/context is required I will be able to produce the code.
Code:
..............
typedef struct bucketItem {
struct bucketItem *next;
MEntry *entry;
} entry;
typedef struct bucket {
int entries;
struct bucketItem *head;
struct bucketItem *tail;
} bucket;
struct mlist {
unsigned long size;
struct bucket **buckets;
};
..............
static MList *expand(MList **ml, MEntry *me) { MList *oldTable, *newTable;
unsigned long i, oldPosition, newPosition;
bucket *oldBucket, *newBucket;
entry *oldEntry, *newEntry;
oldTable = *ml;
/* Create a new hash table */
if( (newTable=(MList *)malloc(sizeof(MList))) == NULL) {
fprintf(stderr, "%s\n", "Unable to allocate memory for mailing list.");
return NULL;
}
/* Make the new hash table twice the size of the old one */
newTable->size = oldTable->size * 2;
if( (newTable->buckets=malloc(sizeof(bucket)*(newTable->size))) == NULL) {
fprintf(stderr, "%s\n", "Unable to allocate memory for mailing list.");
return NULL;
}
/* Initialise the new buckets */
for(i=0; i<(newTable->size); i++) {
if((newTable->buckets[i]=malloc(sizeof(bucket))) == NULL) {
fprintf(stderr, "%s\n", "Unable to allocate memory for entry.");
return NULL;
}
newTable->buckets[i]->entries = 0;
newTable->buckets[i]->head = NULL;
newTable->buckets[i]->tail = NULL;
}
/* Move every entry in the old hash table over to new position in new hash table */
for(oldPosition = 0; oldPosition < (oldTable->size); oldPosition++) {
oldBucket = oldTable->buckets[oldPosition];
oldEntry = oldBucket->head;
while(oldEntry != NULL) {
if((newEntry=(entry *)malloc(sizeof(entry))) == NULL) {
fprintf(stderr, "%s\n", "Unable to allocate memory for new entry");
return NULL;
}
newEntry->entry = oldEntry->entry;
newEntry->next = NULL;
oldEntry->entry = NULL;
oldEntry = oldEntry->next;
newPosition = me_hash(newEntry->entry, newTable->size);
newBucket = newTable->buckets[newPosition];
if(newBucket->head == NULL) {
newBucket->head = newEntry;
} else {
newBucket->tail->next = newEntry;
}
newBucket->tail = newEntry;
newBucket->entries += 1;
}
}
/* Now add new entry into the new hash table */
if((newEntry=(entry *)malloc(sizeof(entry))) == NULL) {
fprintf(stderr, "%s\n", "Unable to allocate memory for new entry");
return NULL;
}
newEntry->entry = me;
newEntry->next = NULL;
newPosition = me_hash(me, newTable->size);
newBucket = newTable->buckets[newPosition];
if(newBucket->head == NULL) {
newBucket->head = newEntry;
} else {
newBucket->tail->next = newEntry;
}
newBucket->tail = newEntry;
newBucket->entries += 1;
ml_destroy(oldTable);
return newTable;
}
..............
-
> oldTable = *ml;
Where is
*ml = newTable;
> if( (newTable=(MList *)malloc(sizeof(MList))) == NULL)
There is no need to cast malloc in a C program.
Indeed, some of your malloc calls are cast-less, so clearly it isn't an issue for you.
> if( (newTable->buckets=malloc(sizeof(bucket)*(newTable->size))) == NULL) {
This should be sizeof(bucket*)
You're allocating an array of pointers here.
> MList **ml, MEntry *me
You seem to have several styles for naming structs and typedef's.
Applying a bit of consistency to the whole naming of things will keep things clearer.
Lines 75 to 90 and 56 to 72 are basically the same.
You should be able to factor this out into a helper function.
-
Do you realize that if you saved the original hash (not mapped to a bucket, the full hash) for each hashed item, you could trivially resize the hash table?
Consider these types for a start.
Code:
typedef unsigned long hash_t;
typedef struct hash_item hash_item_t; /* Clbuttic. Just remove the first _. */
typedef struct hash_entry hashentry_t;
typedef struct hash_table hashtable_t;
/* Fixed-size hash table item: */
struct hash_item {
hash_t hash;
/* Hashed data follows: */
size_t size;
void *data;
};
/* Dynamically allocated, dynamically sized hash table entry: */
struct hash_entry {
size_t size; /* Number of items allocated */
size_t used; /* Number of items used */
struct hash_item item[];
};
/* Dynamically allocated, dynamically sized and resized hash table: */
struct hash_table {
size_t size;
hash_t modulus; /* Same as size */
struct hash_entry *entry[];
};
and for the hash function, for example
Code:
/* Bernstein hash function (djb2), XOR variant.
*/
static inline hash_t hash(const void *const dataptr, const size_t bytes)
{
const unsigned char *data = (const unsigned char *)dataptr;
const unsigned char *const ends = bytes + (const unsigned char *)dataptr;
hash_t result = 5381UL;
while (data < ends)
result = (result * 33UL) ^ (hash_t)(*(data++));
return result;
}
The hash table type is dynamically allocated, to sizeof (struct hash_table) + entries * sizeof (struct hash_entry *) bytes. Each pointer is initialized to NULL. For hash h, the entry in table is table->entry[ h % table->modulus ], and it will be NULL if the entry contains no hashed items.
A hash entry can also be reallocated at any time, to size sizeof (struct hash_entry) + size * sizeof (struct hash_item) chars. The used field is initialized to the number of items used, and size the number of items allocated. A good strategy for the size is a bit complex issue, but usually you want to allocate only a few items first (to reduce memory use). I'd guess 2 to 5 to 10 to 20 might work.
Each hash item contains the original hash. I'd recommend keeping the hashes in sorted order, so you can do a binary search (on the original hash for the hashed items mapped to the same hash bucket/entry). Remember to compare hash and size; only if those two match, do a binary comparison using e.g. memcpy() .)
You can resize each entry by simply reallocating the entry pointer, and updating the new size to match. The number of entries used is used.
To resize the hash table, you allocate a new table, of size sizeof (struct hash_table) + new_size * sizeof (struct hash_entry) entries, and clear all entries to NULL. Allocate a temporary array of size_t's, one for each entry, and you can use that to preallocate each entry so you don't need to do any allocations while copying items from the old table to the new table. Remember that the items in an entry in the old table are likely to be spread to different entries; but, when you have counted them beforehand, you know how many there will be in the end. Copy each item from the old table entries to the new table, free()ing each entry in the old table when you have scanned it, then free() the old table, and the temporary array.
Note that resizing the table will usually change the pointer to the table. I typically use an interface similar to
Code:
hashtable_t *hashtable_rehash(hashtable_t *table, const size_t new_size);
where the new size may be any positive integer (that does not overflow the size calculation); the pointer to the "rehashed" table is returned, and the original pointer is no longer valid.