Thread: A few questions about my modular code using void* as dynamic data type in C

  1. #1
    Registered User
    Join Date
    Mar 2006
    Posts
    158

    A few questions about my modular code using void* as dynamic data type in C

    Hi,

    I'm not going to post all my code where cause it's quite large, instead, I'll post the things that I think are important and hopefully are enough for you to help me out.

    My hash table structure is like this:

    Code:
    typedef void * HashKey;
    typedef void * HashValue;
    
    typedef struct sHa........em {
        HashKey key;
        HashValue value;
    
        char status;
    } Ha........em;
    
    typedef struct sHashTable {
        Ha........em *items;
    
        int count;
        float load;
        int size;
    
        Bool (*compare)(HashKey, HashKey);
        unsigned (*hash)(void *);
    } HashTable;
    The signature for my insert function goes like this:

    Code:
    Bool hashInsert(HashTable * const table, HashKey key, HashValue value);
    And somewhere inside that function, when I find a free bucket in the hash table, I do this:

    Code:
    table->items[index].key = key;
    table->items[index].value = value;
    table->items[index].status = USED;
    table->load = ++table->count / (float)table->size;
    All this presents a few problems:

    1) As you can see above I'm simply setting each key/value pair of the free bucket to the same pointer passed as the key/value hashInsert function arguments. This presents a problem as you may have already noticed... For instance, doing something like this:

    Code:
    char str[50];
    scanf("%s%*c", str);
    hashInsert(t1, (HashKey)str, (HashValue)5);
    scanf("%s%*c", str);
    hashInsert(t1, (HashKey)str, (HashValue)3);
    And if the input is "KeyA" and then "KeyB", both will have "KeyB" as the buckets keys. The same thing applies to the value and not just the key since they are basically the same type because I want to have my code fully modular, for any data type.

    How could I solve this?

    My first though is to use strdup(str) and pass that to the hashInsert function. That would solve the problem. And since this was handled in the main code, I could easily use malloc() too for any other data type I need to pass as the value (the key will probably always be a string or an int).

    But this solution presents another problem...

    2) How should I free this allocated memory? Sure, it was allocated by the "main programmer" and not the "hash table module programmer" so, the "main programmer" should free it in the main code, right? However, that doesn't look much like modular code to me.

    My code also has a hashDestroy function, to free all the allocated memory. But how can I use this function to free everything? I can't just iterate over every key/value and use free() on them cause maybe some of them weren't malloc'd by any programmer in the first place and I don't need to free them.

    How can I find out which ones my hashDestroy must free and which ones it shouldn't?

    3) To finish, I guess I can also throw this issue into the mix... In point one, my suggestion was to use strdup() or malloc to "fix" that specific problem (while introducing another) but that also don't look very modular to me. This memory allocation should be done in the hash table module code and not in the main code by the "main programmer".

    How do you suggest I solve this? I mean, the data types can be anything and while the use of strdup() helps a lot, it only works for strings. What if I need to allocate memory for some specific structure or just an int?

    Sorry for the big post but I think these questions are all related and I need some help figuring them out since my C knowledge is not that extreme. I only recently learned about void* so...

    EDIT:
    Lol, my code above as a forbidden word and it's being represented as dots. I guess it doesn't matter and you guys still understand the code just fine.

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    My first glance thru this (more than just) implies to me that you believe each key should have a unique bucket. Wrong! A single bucket can hold any number of unique keys.

    This is how a hash table works: You have a fixed number of buckets (the array indexes). The array should be an array of pointers, generally each pointer is to the head of a linked list. You don't look for an empty bucket -- you compute which bucket to use based on the value of the key. Then you add that to the list pointed to by the bucket.

    Here's a normative way to compute the bucket:
    Code:
    int getBucket(int table_size, char *key) {
            int i, slen = strlen(key), index = 0;
            for (i=0;i<slen;i++) index += key[i]-32;
            return (index%table_size);
    }
    "table_size" is the number of buckets in the table. This is fixed when the table is created and must not be changed. Notice, you use this to find the bucket to put a key into, and to find the bucket a key is already in. Then you just iterate thru the list in the bucket using strcmp (well, you could make a hierarchy of hash tables too).

    Generally, if the keys are alpha-numeric, there is not much point in using more than 500 or 1000 buckets because the value of most keys will be less than that. So if you have 500 buckets and only a few thousand keys, there will usually be < 5 keys in a bucket.

    Vis, free'ing everything -- set all the bucket pointers to NULL initially. If a bucket is null, don't free it. Otherwise, treat that bucket as the head of a LL and traverse the list freeing each node.

    And yeah, you have to copy values, not pointers.
    Last edited by MK27; 03-07-2010 at 05:43 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    Registered User
    Join Date
    Mar 2006
    Posts
    158
    Maybe so but I don't have time to rewrite the code and do it differently, I have to keep my Hash Table approach and need help with that, solving my 3 questions above.

  4. #4
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Fair enough, then this IS NOT a hash table: if it is for an assignment, you will fail because you have VERY fundamentally misunderstood the concept. It does not matter how many "problems" you fix after that. So you were warned.

    Vis, the problems:

    1) copy the data, not the pointer! Yes, that may mean malloc'ing a pointer and copying into it!
    2) set all pointers NULL when not allocated and it is easy to tell which ones need free'ing.
    Last edited by MK27; 03-07-2010 at 06:20 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #5
    Registered User
    Join Date
    Mar 2006
    Posts
    158
    Quote Originally Posted by MK27 View Post
    Fair enough, then this IS NOT a hash table:
    Why not? I'm just doing it differently I suppose...

    Can you explain to me with code what I'm doing wrong and what should I do? Not talking about the full code of an hashtable of course, but just the structures so I can try to understand what you are saying. Cause I'm not exactly understanding what's the problem and what should I do...

    And I know you talked about chaining and I know it's the recommended way. But for for now, I'm going with linear probing and when everything's working, I'll move to chaining.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Well, I am no hash table expert, I just had to look up linear probing. It just looks the number of items will be limited to the array size (which still needs to be fixed), and that the use of a "step" value will in the end just mean iterating thru the array -- which why bother in the first place?* Just use an array and iterate thru it.

    Of course, that really would not be a hash table I guess if linear probing counts, then it's okay. I stand corrected. The linked list method is not much more complicated and actually simpler in a certain sense (finding the bucket).

    Hopefully you've found your way thru the other issues.

    * like, so to find something now you compute the index and then check sequential buckets one step at a time?
    Last edited by MK27; 03-08-2010 at 08:05 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    Registered User
    Join Date
    Mar 2006
    Posts
    158
    I understand what you're saying but it has very little to do with my main issues...

    I just did linear probing in the first place because it was simpler and I just wanted to have the hash table working and then move to optimize it and make it better. One step at a time...

    What "bothered" me about your first post is that you are saying that what I'm doing is not an HashTable while I think it is...

    Quote Originally Posted by MK27 View Post
    You have a fixed number of buckets (the array indexes).
    That's what I have, the *items in the hash table is the array which will be dynamically allocated. Each element of that array, has a key/value pair.

    Quote Originally Posted by MK27 View Post
    The array should be an array of pointers, generally each pointer is to the head of a linked list.
    It is in array of pointers to each element.

    Quote Originally Posted by MK27 View Post
    You don't look for an empty bucket -- you compute which bucket to use based on the value of the key.
    And this is exactly what I'm doing... You can say that I'm looking for an empty bucket but that's because of open addressing using linear probing. But at first, I'm computing which is the bucket I'm supposed to use based on the key value and if it's free, good, if it's not, I move to the next one. That's the basis for linear probing. But like I said, I'll be moving to chaining in the future, I just need to fix my problems first.

    But there's no problem in using linear probing. It's not the best method, I know that, but as long as I justify my choices.

    But once again, this is very little to do with my main issues... I don't even know how you started pointing those things out when I didn't even posted my hashInsert code. Ok, I know you realized there was no *next pointer for the next bucket when dealing with collisions, but I didn't asked anything about that cause that was not my problem :P

    Anyway, back into the issues...

    So far, I think I've solved them, I just don't know if this is the way to go...

    For testing purposes, I'm using char* as key and simple struct as value:

    Code:
    typedef struct sTest {
    	char string[10];
    	int num; 
    } Test;
    Then I changed my hashInsert function to this:

    Code:
    Bool hashInsert(HashTable * const table, HashKey key, size_t key_sz, HashValue value, size_t val_sz)
    And I call it like this, for instance:

    Code:
    char str[50];
    Test hVal;
    
    strcpy(hVal.string, "ABC");
    hVal.num = 123;
    
    scanf("%s%*c", str);
    hashInsert(t1, str, strlen(str)+1, &hVal, sizeof(Test));
    And then, when I find an empty bucket (using linear probing) I do something like this:

    Code:
    table->items[index].key = xmalloc(key_sz);
    memcpy(table->items[index].key, key, key_sz);
    
    table->items[index].value = xmalloc(val_sz);
    memcpy(table->items[index].value, value, val_sz);
    Note: xmalloc() calls malloc and does some error handling.

    And to finish, my hashDestroy function does something like this to free everything:

    Code:
    for(i = 0; i < (*table)->size; i++) {
    	if((*table)->items[i].status != EMPTY) {
    		free((*table)->items[i].key);
    		free((*table)->items[i].value);
    	}
    }
    So far, all my testing worked fine but do you think there's something wrong with this approach on allocating/freeing memory?

    I can only think of one real problem for this solution, if my Test struct had some pointers inside, that would be an issue. Right? The solutions for this would be either to provide a function pointer in the hashInitialize function to free whatever it needs or don't bother it and that's the main programmer's job, to free whatever he allocated.

    But since this is a project where I'll be coding everything and where one of the objectives is modular code, I simply won't use any pointers inside the struct and call it a day. If I have a time in the end, I might adapt the code to provide the "free function" pointer on the initialization.

  8. #8
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Nazgulled View Post
    I understand what you're saying but it has very little to do with my main issues...

    I just did linear probing in the first place because it was simpler and I just wanted to have the hash table working and then move to optimize it and make it better. One step at a time...
    Hey I'm sorry for that but I honestly thought you were making a *big* mistake. I've only done a hash table once, I turned it into a library (like you are supposed to do) and it's worked fine for me ever since. I am not too sure that the linear probing is "objectively" simpler -- unless you are afraid of linked lists -- but if that's what you've read and have in your mind, then it will be to you

    I don't even know how you started pointing those things out when I didn't even posted my hashInsert code. Ok, I know you realized there was no *next pointer for the next bucket when dealing with collisions, but I didn't asked anything about that cause that was not my problem :P
    It was because you were assigning directly to the bucket pointer and there did not seem to be a way for a bucket to hold any more values (this was not a list head). Like I said, this just seemed fundamentally wrong -- I thought I was doing you a favour by pointing that out, but again, I apologize.

    Anyway, back into the issues...
    So far, I think I've solved them, I just don't know if this is the way to go...
    For testing purposes, I'm using char* as key and simple struct as value:

    Code:
    typedef struct sTest {
    	char string[10];
    	int num; 
    } Test;
    Then I changed my hashInsert function to this:

    Code:
    Bool hashInsert(HashTable * const table, HashKey key, size_t key_sz, HashValue value, size_t val_sz)
    And I call it like this, for instance:

    Code:
    char str[50];
    Test hVal;
    
    strcpy(hVal.string, "ABC");
    hVal.num = 123;
    
    scanf("%s%*c", str);
    hashInsert(t1, str, strlen(str)+1, &hVal, sizeof(Test));
    And then, when I find an empty bucket (using linear probing) I do something like this:

    Code:
    table->items[index].key = xmalloc(key_sz);
    memcpy(table->items[index].key, key, key_sz);
    This is good -- it means doing more work for each submitted value, but then you can use any kind of key. I always use char* for the key, so my add is:

    Code:
    void ht_add(hashtable *t, char *key, void *val);
    Since the key is always a string, it can be copied and it's length determined internally. "val" is a void pointer so could be anything.



    And to finish, my hashDestroy function does something like this to free everything:

    Code:
    for(i = 0; i < (*table)->size; i++) {
    	if((*table)->items[i].status != EMPTY) {
    		free((*table)->items[i].key);
    		free((*table)->items[i].value);
    	}
    }
    So far, all my testing worked fine but do you think there's something wrong with this approach on allocating/freeing memory?
    I will say one thing -- the status field is unnecessary if you always set unallocated pointers NULL (including after you free them). That's a very normal and useful practice. It means any non-null pointer is allocated and needs free'ing at some point.

    I can only think of one real problem for this solution, if my Test struct had some pointers inside, that would be an issue. Right? The solutions for this would be either to provide a function pointer in the hashInitialize function to free whatever it needs or don't bother it and that's the main programmer's job, to free whatever he allocated.
    I have the same issue with the void* value pointer. It gets freed with the table, but obviously there is no way to "inform" the library that it (could be) a pointer to a struct with pointers inside it. However, if you understand the library, then you understand that if the values are structs containing pointers, you need to iterate thru and free those *before* you destroy the whole table. So yeah, that's the user's (main programmer's) job.

    Handy function for that and other things:
    Code:
    htnode *ht_each(hashtable *t);
    The hashtable struct has a current bucket index and current element pointer that are incremented here, skipping empty buckets and traversing through the list in used ones to return one item at a time. Returns NULL at the end so you can:
    Code:
    table->cur_bucket = 0;   /* set these to the beginning */
    table->cur_elem = NULL;
    htnode *it = ht_each(table);
    while (it) {
         [do stuff such as free pointers inside value struct]
         it = ht_each(table);
    }
    Of course, if you only have one thing in each bucket that's a bit simpler. Good Luck!
    Last edited by MK27; 03-08-2010 at 10:01 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  9. #9
    Registered User
    Join Date
    Mar 2006
    Posts
    158
    Quote Originally Posted by MK27 View Post
    Hey I'm sorry for that but I honestly thought you were making a *big* mistake. I've only done a hash table once, I turned it into a library (like you are supposed to do) and it's worked fine for me ever since. I am not too sure that the linear probing is "objectively" simpler -- unless you are afraid of linked lists -- but if that's what you've read and have in your mind, then it will be to you



    It was because you were assigning directly to the bucket pointer and there did not seem to be a way for a bucket to hold any more values (this was not a list head). Like I said, this just seemed fundamentally wrong -- I thought I was doing you a favour by pointing that out, but again, I apologize.
    No hard feelings

    Quote Originally Posted by MK27 View Post
    I always use char* for the key, so my add is:

    Code:
    void ht_add(hashtable *t, char *key, void *val);
    Since the key is always a string, it can be copied and it's length determined internally. "val" is a void pointer so could be anything.
    I could do that and I know that char* will probably be the key 99% of the times someone uses an Hash Table but like I said, I'm trying to make this completely modular so that I (or anyone else) can use whatever he/she wants for the key. For instance, on this project I'll need to have two Hash Tables, one with char* as key and the other with int as key. So, my code will (hopefully) work for both


    Quote Originally Posted by MK27 View Post
    I will say one thing -- the status field is unnecessary if you always set unallocated pointers NULL (including after you free them). That's a very normal and useful practice. It means any non-null pointer is allocated and needs free'ing at some point.
    Yes, but that's only for chaining. With open addressing (linear probing) and I need to have that field. I'm not going to explain why because it doesn't matter for the subject at hand, and you'll have to trust me on that one. But once again, that will be gone too when I move to chaining has collision resolution.


    Quote Originally Posted by MK27 View Post
    I have the same issue with the void* value pointer. It gets freed with the table, but obviously there is no way to "inform" the library that it (could be) a pointer to a struct with pointers inside it. However, if you understand the library, then you understand that if the values are structs containing pointers, you need to iterate thru and free those *before* you destroy the whole table.
    Yeah, I understand that, but I won't be bother with it right now.

    Thanks.

  10. #10
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Nazgulled View Post
    Quote Originally Posted by MK27
    I will say one thing -- the status field is unnecessary if you always set unallocated pointers NULL (including after you free them). That's a very normal and useful practice. It means any non-null pointer is allocated and needs free'ing at some point.
    Yes, but that's only for chaining. With open addressing (linear probing) and I need to have that field. I'm not going to explain why because it doesn't matter for the subject at hand, and you'll have to trust me on that one.
    Are you sure about that? That does not quite make sense -- you are malloc'ing this:
    Code:
    table->items[index].key = xmalloc(key_sz);
    Meaning it could be set NULL and tested rather than requiring an additional "status" field. Like I said, the use of NULL is a normal general practice,* it's not derived from hash table chaining or any other particular scenario.

    * it's also a "best" practice since it foolproofs a double free:
    Code:
    if (ptr) {
         free(ptr);
         ptr = NULL;
    }
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #11
    Registered User
    Join Date
    Mar 2006
    Posts
    158
    I'm sure about it

    I'll explain then...

    Open addressing (be it linear probing or quadratic probing) finds the next free bucket by going continuously looking for that free bucket starting from the first computed index (from the hash) key.

    Let's say you have two keys, "KeyA" and "KeyB" and the hash from both computes to the same index, let's say 5. But "KeyB" will be stored at index 6, cause 5 is taken by "KeyA". What happens when you delete "KeyA"? How will you access "KeyB"? When you lookup for the "KeyB", it will compute to index 5 which is now empty since we deleted "KeyA".

    This where the status field plays it's part... If you just use the NULL pointer to know if a field is empty or now, you have no way to find "KeyB" cause index 5 is free, is NULL, what teels you to keep looking? Nothing... The status field will hold 3 values, EMPTY, DELETED and USED.

    EMPTY = Empty bucket, it's free to use
    USED = Used bucket, it's not free to use
    DELETED = Bucked that was being used but not anymore, the data may be still be there but it's free to use

    When you delete something you don't actually delete the data (but you could if you wanted to) but mark the field as DELETED. Then, when you search for "KeyB", you'll reach index 5, which is DELETED so you know you need to go to to next bucket and so forth to find exactly what you want.

    That's how open addressing works. Maybe there's some other way to implement this, as with everything in this area, there are 1001 ways to do something. But this is the method that was lectured to us and if you search for it, you'll find many similar implementations.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. program terminates abruptly
    By roaan in forum C Programming
    Replies: 3
    Last Post: 08-28-2009, 03:53 PM
  2. parent in a binary search tree
    By roaan in forum C Programming
    Replies: 4
    Last Post: 08-26-2009, 07:08 PM
  3. [question]Analyzing data in a two-dimensional array
    By burbose in forum C Programming
    Replies: 2
    Last Post: 06-13-2005, 07:31 AM
  4. Binary Tree, couple questions
    By scoobasean in forum C Programming
    Replies: 3
    Last Post: 03-12-2005, 09:09 PM
  5. Learning OpenGL
    By HQSneaker in forum C++ Programming
    Replies: 7
    Last Post: 08-06-2004, 08:57 AM