Thread: Storing key, value pairs

  1. #1
    Registered User
    Join Date
    Mar 2013
    Posts
    7

    Storing key, value pairs

    Hey, I have a project to do in C and coming form Java I miss all the included features that are missing from C! I need to be able to store key value pairs in an quick and memory efficient manner. I've looked up using hash maps but I'm very new to C so don't really understand even the basic ones. I've looked at using a multi-dimensional array as I'm more comfortable with arrays but I'm unsure if that would count as a memory efficient and quick method? Any suggestions? Thanks for any help, Tim.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    How many pairs are you planning to store?

    How many searches (per second) are you planning to run though them?

    What kind of hardware are you running on (desktop, smartphone, microcontroller)?

    Since you're so new in C, I would suggest you first work on getting a fixed array working. Efficiency in storage (and then efficiency in time) are things that can be worked on later.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Mar 2013
    Posts
    7
    I'd like to be able to spend time first learning to use C but seeing as this is for a project, I don't have the time to spend ages learning C before starting it which is pretty annoying.

    1. The array is going to be storing content from a text file passed to it, so it could be any number of elements in the array each time.
    2. I need to be able to read a text file into an array with each word as a element, but the list will need ordering at some point so trying to find a way of doing that most efficiently as if I do it as I'm planning at the moment I'll have to check every item in the array to sort it.
    3. Programming it on my laptop but will eventually be ran on a different computer, so trying not to do much hardware specific.

  4. #4
    Registered User
    Join Date
    Mar 2013
    Posts
    7
    Quote Originally Posted by Salem View Post
    How many pairs are you planning to store?

    How many searches (per second) are you planning to run though them?

    What kind of hardware are you running on (desktop, smartphone, microcontroller)?

    Since you're so new in C, I would suggest you first work on getting a fixed array working. Efficiency in storage (and then efficiency in time) are things that can be worked on later.
    Answered in above post, been doing some research and have read about people using an array of structs to accomplish similar tasks. I just need to make sure it's an efficient way of doing it as the "values" to the "keys" will constantly need to be updated as the file is searched through. Pretty much as a counter.

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Have you considered using a key/value store database, or maybe use SQLite as such a database (e.g., an in-memory database)?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Registered User
    Join Date
    Mar 2013
    Posts
    7
    Quote Originally Posted by laserlight View Post
    Have you considered using a key/value store database, or maybe use SQLite as such a database (e.g., an in-memory database)?
    I'd rather do it entirely through coded data structures if possible. It's counting how many times values appears in a file, so I will need to keep checking if it already exists in the array (thus searching the whole array each time) so I'm thinking that surly can't count as "efficient" ?

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by tim8
    I'd rather do it entirely through coded data structures if possible.
    Using SQLite basically means downloading the amalgamation source file, setting it to be compiled along with your other source files, learning the API and including the header in your source files where needed. The library effectively provides "coded data structures" for you to use. You can use an in-memory database if having a separate database file is undesirable. The advantage here is that you will be spending time learning how to use a well tested library instead of writing your own and debugging it.

    Quote Originally Posted by tim8
    I've looked up using hash maps but I'm very new to C so don't really understand even the basic ones.
    So you would "rather do it entirely through coded data structures", but learning how is not feasible because you don't have the time:
    Quote Originally Posted by tim8
    I'd like to be able to spend time first learning to use C but seeing as this is for a project, I don't have the time to spend ages learning C before starting it which is pretty annoying.
    Yet you want to do it efficiently:
    Quote Originally Posted by tim8
    It's counting how many times values appears in a file, so I will need to keep checking if it already exists in the array (thus searching the whole array each time) so I'm thinking that surly can't count as "efficient" ?
    But you don't want to use a library. I suggest that you don't use C, but since that is not an option... something has to give. You either get cracking with a data structures and algorithms book, or you get cracking by learning the API of a suitable library.

    And yes, repeated linear search over a large array is inefficient compared to the alternatives.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    You can allocate char array[MAX_WORDS][MAX_LEN + 1] but that's not very efficient because it would need to accommodate the longest word, thus the entire array has the other words wasting space. But it's simplest.

    Or you could allocate a char *array and store all words sequentially (null separated) and a separate array of pointers to those words. A third array that corresponds to the counts.

    Either way you may have to realloc() when the number of words increases beyond the anticipated limit.

    You would sort the word array using qsort(), keeping the count array in synch if needed initially. Then use bsearch() and increment the corresponding count element.

    That's if I understand correctly that the "key" is nothing more than a counter. It's all very basic in plain C as far as complexity goes. No need for a honking SQL & API bloat.
    Last edited by nonoob; 03-27-2013 at 01:09 PM.

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by nonoob
    That's if I understand correctly that the "key" is nothing more than a counter.
    I understood it to be effectively a std::map<std::string, int> in C++ (or std::unordered_map, but apparently there may be ordering required), i.e., the key is a string that maps to a count.

    Quote Originally Posted by nonoob
    You would sort the word array using qsort(), keeping the count array in synch if needed initially. Then use bsearch() and increment the corresponding count element.
    That could be inefficient since you'll be calling qsort for each new element. If you change to insert the new element at the given location, then you'll effectively be doing insertion sort, which means the same time complexity as repeated linear search.

    Of course, if the array can be built up and sorted just once then this isn't a problem, but it would also mean that you don't need bsearch: a single pass through the array would be sufficient after sorting. On the other hand, this means that repeated elements have to be stored, rather than merely causing a count to be incremented.

    Basically, I think that a hash table or balanced binary tree would be suitable here. If a hash table is used, then a final sort may be necessary, but that should be no big deal.

    Quote Originally Posted by nonoob
    It's all very basic in plain C as far as complexity goes. No need for a honking SQL & API bloat.
    The "SQL" part is only necessary to utilise it as a key/value store. As for API bloat... possibly, but the overhead of SQLite is small anyway.
    Last edited by laserlight; 03-27-2013 at 01:42 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    Registered User
    Join Date
    Mar 2013
    Posts
    7
    I'll need to incorporate memory management into the equation somehow, but hoping to get that sorted after I get an idea of how to store it. The key would be the word found and the value the count. I've read about using bsearch() and qsort() to sort them, but I'll still have to search every element to check if it's already in the array.

    Quote Originally Posted by laserlight View Post
    Using SQLite basically means downloading the amalgamation source file, setting it to be compiled along with your other source files, learning the API and including the header in your source files where needed. The library effectively provides "coded data structures" for you to use. You can use an in-memory database if having a separate database file is undesirable. The advantage here is that you will be spending time learning how to use a well tested library instead of writing your own and debugging it.
    Unfortunately I'm only able to use standard C libraries. So it looks like my best bet is to write my own basic hashmap implementation if it's going to be much more efficient than my array method.

  11. #11
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by tim8
    I've read about using bsearch() and qsort() to sort them, but I'll still have to search every element to check if it's already in the array.
    Using bsearch should be faster because it should be implementing binary search (though I believe this is not actually guaranteed). The problem though is that if you use qsort repeatedly, then you may end up worse off.

    Quote Originally Posted by tim8
    Unfortunately I'm only able to use standard C libraries. So it looks like my best bet is to write my own basic hashmap implementation if it's going to be much more efficient than my array method.
    Ah yes, in that case you do indeed have to implement something

    By the way, I made an edit to my reply to nonoob that may be of interest to you, i.e., the idea of storing everything, sorting once, then making a single pass through the array to count the repeated elements (which will then be adjacent). If this is feasible, then you would not need to implement any data structures more complex than an array, and qsort will suffice. The disadvantage is that there may be a significant cost in space if many elements are repeated many times.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #12
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    Yeah, if you need to ensure that you only store unique words then yes, you'd have to do a search each time. Which means linear search is likely just as fast/bad as doing a qsort() each time and using bsearch().
    Or a periodic qsort() & merge after some interval of binary & linear searches for recent entrants, depending on a weight factor that optimizes the balance.

    Another way is to use a fancier data structure such as some tree thing.

    Or as you suggested. some custom hash function. Collisions and you're back to some ugly linear linked list.

  13. #13
    Registered User
    Join Date
    Mar 2013
    Posts
    7
    Quote Originally Posted by laserlight View Post
    Of course, if the array can be built up and sorted just once then this isn't a problem, but it would also mean that you don't need bsearch: a single pass through the array would be sufficient after sorting. On the other hand, this means that repeated elements have to be stored, rather than merely causing a count to be incremented.
    Exactly what I need! Building up once and then sorting would be fine. Although I don't want to store repeated elements, I want to increase a counter if spotted more than once. I'm also trying to make it as memory efficient as possible so storing the whole file as an array word by word then doing the count wouldn't really work :/ As the point is the text file isn't set there could be one with hundreds of the same word passed in.

  14. #14
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    Your goals while reasonable individually, or even in sets, are not something you will match.

    Pick the thing you think is least important: space, speed, or development cost (time).

    Because you are new, I'd advise simplicity. What nonoob suggests isn't going to be fast (Here meaning only efficient use of time; I'd be surprised if it wasn't fast enough for what you are likely to need. You can start developing a more complex version if your current goal, getting the project done it seems is most important, is matched.), but it is simple and something you could manage.

    But no, as a newbie you aren't going to develop the kind of data structures you need to do this without sacrificing something of either speed or space.

    To build a dictionary "on the fly" that is always searched and updated quickly will see you delving into data structures that will take a bit of time to understand. That is to say nothing of the time it will take you to debug your implementation after you understand the data structures.

    I'm not trying to be discouraging either; I'd be pleased if you took the time to learn the data structures needed, but as I've asked, isn't getting the project "out the door" more important for now?

    If that, the development cost (time), is less important, by all means, please feel free to search "Red-Black Tree" or similar structure.

    Soma

  15. #15
    Registered User
    Join Date
    Mar 2013
    Posts
    7
    Quote Originally Posted by phantomotap View Post
    O_o
    I'm not trying to be discouraging either; I'd be pleased if you took the time to learn the data structures needed, but as I've asked, isn't getting the project "out the door" more important for now?
    Yes the most important thing is to have something working for the project. I'd probably say space is more important than speed for this exercise. So it looks like "Or you could allocate a char *array and store all words sequentially (null separated) and a separate array of pointers to those words. A third array that corresponds to the counts." is the most realistic option for me? I only need to sort the array once after all the data has been entered which is a sort by the value, so bsearch() each time a word is processed and a qsort() at the end.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. stl for pairs
    By dpp in forum C++ Programming
    Replies: 14
    Last Post: 05-18-2009, 09:46 AM
  2. vector and pairs
    By dpp in forum C++ Programming
    Replies: 1
    Last Post: 01-28-2009, 11:24 AM
  3. Pairs & Constructors
    By Nereus in forum C++ Programming
    Replies: 3
    Last Post: 04-04-2006, 10:55 AM
  4. prime pairs from 3 - 10,000
    By david999 in forum C Programming
    Replies: 4
    Last Post: 11-09-2005, 12:47 AM
  5. pairs of vector..
    By Luigi in forum C++ Programming
    Replies: 4
    Last Post: 12-20-2002, 02:09 AM