Thread: Help me check my hash function

  1. #16
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Subsonics View Post
    Since the key in this case is an integer, student ID, that supposedly is unique. Then you could use the ID as a hash directly and have a perfect hash table. If I'm not missing something here.
    As long as you do modulus on the ID by the number of buckets (which is pre-determined), yeah. You cannot just use the ID itself, if I have 5 records and the ids are:

    1234523533
    9922883
    42
    1
    23456234134

    I suppose I could be safe and use a 10000000000 element array, but...
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  2. #17
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by MK27 View Post
    As long as you do modulus on the ID by the number of buckets (which is pre-determined), yeah. You cannot just use the ID itself, if I have 5 records and the ids are:

    1234523533
    9922883
    42
    1
    23456234134

    I suppose I could be safe and use a 10000000000 element array, but...
    That's obvious MK but, first of all. No one in their right mind would implement a database for student record from scratch using c and hash tables. But, if that was the case you would create ID's from 1 and up, so list of students of 5 would have ID 1-5 (or 0-4). Then perhaps if some rules are involved in creating the ID like year, and so on, you could easily make a formula such that you end up at 0 for the first student in the database.

  3. #18
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Subsonics View Post
    No one in their right mind would implement a database for student record from scratch using c and hash tables.
    Speak for yourself! I've never looked at the source code for (eg) mySQL and would refuse to do so , but it's in C and I bet there's some very hash tablish stuff in there.

    There's a difference between an ID number you can make up (meaning it is internal) and one which is a given (part of what you are responsible for recording).
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #19
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by MK27
    Speak for yourself! I've never looked at the source code for (eg) mySQL and would refuse to do so , but it's in C and I bet there's some very hash tablish stuff in there.
    I never looked at MySQL source code either, and only superficially at SQLite source code, but according to the documentation, SQLite uses B-trees.

    But I suspect Subsonics might have intended to say that a sane person would use a database engine like SQLite instead of coding the nitty gritty database details from scratch
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #20
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by MK27 View Post
    Speak for yourself! I've never looked at the source code for (eg) mySQL and would refuse to do so , but it's in C and I bet there's some very hash tablish stuff in there.
    Yes but that was sort of my point, you would choose some working database solution, mySQL, Oracle etc. And I'm sure there are hash tables involved somewhere in their source, but I'm not going to look into it to confirm it either haha.

    Quote Originally Posted by MK27 View Post
    There's a difference between an ID number you can make up (meaning it is internal) and one which is a given (part of what you are responsible for recording).
    Absolutely, but if you are given these requirements, and probably some other (like being able to search on all terms in the record) then you would probably not choose a hash table to begin with. My point was that a numeric key that is an ID, (that is, it's unique and serve only as identification) can be arbitrarily chosen and still serve this same purpose. If you are going to search on name or make a dictionary where a word is a key and the definition is the value, then you don't have the option to choose the key.

    Quote Originally Posted by laserlight View Post
    But I suspect Subsonics might have intended to say that a sane person would use a database engine like SQLite instead of coding the nitty gritty database details from scratch
    Yes, that was what I had in mind. SQLite seems like a nice solution for c, I have just looked over it briefly yet but.
    Last edited by Subsonics; 04-17-2010 at 05:17 PM.

  6. #21
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Subsonics View Post
    My point was that a numeric key that is an ID, (that is, it's unique and serve only as identification) can be arbitrarily chosen and still serve this same purpose. If you are going to search on name or make a dictionary where a word is a key and the definition is the value, then you don't have the option to choose the key.
    Hmmm. There's a bit of a catch here. If you assign an ID number, internally, to each record, how can you search for a given record (based on content) by using the ID number you have arbitrarily assigned? I have five students:

    Bob
    Mickey
    Troubadour
    Ruffian
    Sally-Anne

    So I give them ID numbers 1-5. Now the user wants to find the record for "Troubadour"...unless I have a separate hash table (to look up the keys for "the real" hash table), we have an issue.

    Anyway, I think hash tables do have a tremendous purpose in general programming. I don't actually use them in C that much, but I use the "built in" equivalent else where (eg, perl hashes). If you make the "key" a sequential number, you are not talking about a hash table -- that's just an array.

    Quote Originally Posted by Subsonics View Post
    Yes, that was what I had in mind. SQLite seems like a nice solution for c, I have just looked over it briefly yet but.
    SQlite is also great for general purpose programming because you don't need a server to run it, you have a discrete file. MySQL and postgres work in a very similar way, but require a running server. The server model (presumably) has higher performance.
    Last edited by MK27; 04-17-2010 at 05:30 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #22
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by MK27 View Post
    Hmmm. There's a bit of a catch here. If you assign an ID number, internally, to each record, how can you search for a given record (based on content) by using the ID number you have arbitrarily assigned? I have five students:

    Bob
    Mickey
    Troubadour
    Ruffian
    Sally-Anne

    So I give them ID numbers 1-5. Now the user wants to find the record for "Troubadour"...unless I have a separate hash table (to look up the keys for "the real" hash table), we have an issue.
    Yes but that applies to hash tables in general, there is one key and one value, but suppose you want to search for name, phone number or grades, or a combination of all of these, or only students enrolled this year who got at least one A and are male.

    Quote Originally Posted by MK27 View Post
    Anyway, I think hash tables do have a tremendous purpose in general programming. I don't actually use them in C that much, but I use the "built in" equivalent else where (eg, perl hashes). If you make the "key" a sequential number, you are not talking about a hash table -- that's just an array.
    I agree, but on that last part. If you have an array of pointers to structs, each struct containing the information. Then if you make an ID that is the array index or calculate the index using a hashing function should not matter the way I see it. Other than that a direct relation should be faster and simpler to implement since you don't have to search thru a linked list at the index your hashing function gives you, and you also don't need to calculate the hash.

  8. #23
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by MK27
    MySQL and postgres work in a very similar way, but require a running server. The server model (presumably) has higher performance.
    If you don't actually need a database server, then an embedded database engine could actually provide higher performance as there would be no need to communicate with the database server.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  9. #24
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Subsonics View Post
    Yes but that applies to hash tables in general, there is one key and one value, but suppose you want to search for name, phone number or grades, or a combination of all of these, or only students enrolled this year who got at least one A and are male.
    You would need to maintain a set of hash tables, one for each criteria. But in this case SQL is definitely the way to go.

    I think a one to one key/value relationship is a much more common "task" tho. Eg, you parse a directory tree of uniquely named files (I keep all my mp3's this way). You want to know the exact path of "songx", the key is "songx" -- the value associated with the key can be a pointer containing more info. Chances are, there is only one criteria that's unique (in this case the name), the other details are just...details. You *can* do operations on those other criteria (show me only files under a certain size) but I still think just using the primary key (name) and rejecting long ones is how to do it.

    If you use interpreted languages (perl, python, ruby, php, etc) hashes (or whatever they are called) are a staple, that kind of thing is totally normative and gives you a *HUGE* amount of functionality over an array. I use hashes about as much as arrays. C++ maps are the same but C++ is pretty awkward, comparatively. IMO.

    Quote Originally Posted by laserlight View Post
    If you don't actually need a database server, then an embedded database engine could actually provide higher performance as there would be no need to communicate with the database server.
    Within a single process, yeah, that makes sense. I have never tried this. But if you have a forked/threaded web server, it's relationship to the SQL server amounts to the same thing I think. Currently these aren't choices I get to make
    Last edited by MK27; 04-17-2010 at 05:59 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Compiling sample DarkGDK Program
    By Phyxashun in forum Game Programming
    Replies: 6
    Last Post: 01-27-2009, 03:07 AM
  2. Undefined Reference Compiling Error
    By AlakaAlaki in forum C++ Programming
    Replies: 1
    Last Post: 06-27-2008, 11:45 AM
  3. We Got _DEBUG Errors
    By Tonto in forum Windows Programming
    Replies: 5
    Last Post: 12-22-2006, 05:45 PM
  4. Calling a Thread with a Function Pointer.
    By ScrollMaster in forum Windows Programming
    Replies: 6
    Last Post: 06-10-2006, 08:56 AM
  5. Problem with Visual C++ Object-Oriented Programming Book.
    By GameGenie in forum C++ Programming
    Replies: 9
    Last Post: 08-29-2005, 11:21 PM