Hash tables - open adressing for collision resolution

This is a discussion on Hash tables - open adressing for collision resolution within the C Programming forums, part of the General Programming Boards category; I have up to 10,000 such data: Code: typedef struct { char authors[5][40]; // first and last names of the ...

  1. #1
    Registered User
    Join Date
    Mar 2010
    Posts
    31

    Hash tables - open adressing for collision resolution

    I have up to 10,000 such data:
    Code:
    typedef struct
    {
    	char authors[5][40]; // first and last names of the author
    	char title[50]; // title of the publication
    	unsigned int year; // year of the publication
    	char isbn[15]; // publication code
    } PubEntryT;
    I have to use a hash table with open adressing for collision resolution to store this data and retrieve them effectively, on the basis of the isbn.

    I'm having problems understanding how to store this data.
    I can't figure out what is the key of the hash table here?
    When does the collision happens?
    Give me examples using this data please.

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Quote Originally Posted by Delia View Post

    I'm having problems understanding how to store this data.
    I can't figure out what is the key of the hash table here?
    I like it when people answer their own question:
    Quote Originally Posted by Delia View Post
    I have to use a hash table with open adressing for collision resolution to store this data and retrieve them effectively, on the basis of the isbn.
    Collisions will happen when two ISBNs hash to the same thing. How often that happens depends on what you choose for the hash.

  3. #3
    Registered User
    Join Date
    Mar 2010
    Posts
    31
    Quote Originally Posted by tabstop View Post
    I like it when people answer their own question:


    Collisions will happen when two ISBNs hash to the same thing. How often that happens depends on what you choose for the hash.
    But the isbn is unique for one book, a data can't have two isbns. Or two data, the same isbn.

    Or what "hash the same thing/choose for the hash" means?

  4. #4
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,047
    Quote Originally Posted by Delia View Post
    But the isbn is unique for one book, a data can't have two isbns. Or two data, the same isbn.
    However, the hashing function may resolve two different isbns to the same location, hence the collision, and the reason for techniques like open addressing to resolve 'em.

  5. #5
    Registered User
    Join Date
    Mar 2010
    Posts
    31
    Quote Originally Posted by itCbitC View Post
    However, the hashing function may resolve two different isbns to the same location, hence the collision, and the reason for techniques like open addressing to resolve 'em.
    What can I use as a hash function here?

  6. #6
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,832
    No disrespect to the poster, but how come so many of the questions raised in these forums indicate that the "C" course being taught mixes algorithms / data structures and C-syntax issues leaving the student hopelessly confused as to which is which?

    I mean, things like hashing or linked lists should be properly covered on a white board, or with nice animated illustrations, way before the student is sent out to code. Instead, the student is left to pull out their hair trying to resolve two different areas of knowledge at the same time.

  7. #7
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Quote Originally Posted by nonoob View Post
    No disrespect to the poster, but how come so many of the questions raised in these forums indicate that the "C" course being taught mixes algorithms / data structures and C-syntax issues leaving the student hopelessly confused as to which is which?

    I mean, things like hashing or linked lists should be properly covered on a white board, or with nice animated illustrations, way before the student is sent out to code. Instead, the student is left to pull out their hair trying to resolve two different areas of knowledge at the same time.
    You picked a weird thread to post this in, since I see no questions about C syntax here. (I think you're also making the assumption that "passed Programming I/II implies skill at programming", which is not always a valid assumption for various reasons.)

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,304
    Quote Originally Posted by nonoob View Post
    I mean, things like hashing or linked lists should be properly covered on a white board, or with nice animated illustrations, way before the student is sent out to code.
    They typically are!
    It's just that the students who come here asking about it are the ones who skipped class, didn't pay attention, aren't actually interested in programming, and didn't even want to be in the course to begin with.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  9. #9
    Registered User
    Join Date
    Mar 2010
    Posts
    31
    Quote Originally Posted by iMalc View Post
    They typically are!
    It's just that the students who come here asking about it are the ones who skipped class, didn't pay attention, aren't actually interested in programming, and didn't even want to be in the course to begin with.
    That's not true... The fact that I ask means that I'm interested. And no, they weren't properly explained at school. I bought 3 books written by my proffesor about Data structures and algorithms, so that I could understand his way of solving this. At the Hash tables area I have only theory, not even one example. I don''t want to complain. I don't need the solving of my program. I was just asking for some help to understand what can be a hash function in my case. It's Easter holiday and I'm in trying to solve this. And you say that i'm not interested?

  10. #10
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Maybe this will clarify something for you:

    Code:
    #include <stdio.h>
    #include <string.h>
    
    #define  SIZE 12
    
    int main() {
    	char words[5][10] = { "word", "static", "delete", "goto", "continue" };
    	int i, j, len, base, key;
    
    	for (i=0;i<5;i++) {
    		base = 0;
    		len = strlen(words[i]);
    // total the ascii value:
    		for (j=0;j<len;j++) base += words[i][j];
    // the hash:
    		key = base%SIZE;
    		printf("%s has a key of %d (base was %d)\n", words[i], key, base);
    	}
    
    	return 0;
    }
    Since an isbn number is all numbers, you could calculate "base" differently. But you might as well do it this way if the number is in a string. It doesn't matter; what matters is that the base for calculating the key needs to be a single integer value.

    SIZE is the size of the array you will use for the table, so "key" will always be between 0 and SIZE-1. That's the slot in the array you use for that word (or isbn number).

    The size of the array is pre-determined and set, but the number of items in the table is not. It could be (generally, will be) more than the number of slots, but that is not the only reason for collisions. The reason I chose 12 here is because it creates a collision between "word" and "static". That means they will be going into the same slot (aka bucket).

    There are a few ways to deal with collisions, one of the most common is to make the array an array of pointers to a linked lists. These lists are initially empty. Collisions are not to be avoided, they are an integral aspect of how a hash table works.
    Last edited by MK27; 04-09-2010 at 08:22 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #11
    Registered User
    Join Date
    Mar 2010
    Posts
    31
    It did clarify a lot. Thanks!

  12. #12
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,304
    Quote Originally Posted by Delia View Post
    That's not true...
    Glad to hear you are an exception. I was speaking rather generally there.

    MK27 has given an example of claculating some kind of hash. WHat is the next most useful thing we can show you?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  13. #13
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    To be fair, I recall thinking that most hash table "tutorials" I found seemed to have been written on the assumption that you already knew how they worked, which is kind of oxymoronic.

    Being short on explanation and long on obfuscation is an unfortunate standard in much programming documentation. If you extended that policy all the way into the academy (not all profs are good profs!), then it doesn't surprise me that the OP needed a basic explanation.

    http://en.wikipedia.org/wiki/Plain_Language_Movement
    ...a very sane concept.
    The movement focuses attention on the information needs and the reading abilities of the reader and opposes writer-based prose, which is the tendency to use long sentences, jargon, and a formal style as a way to acquire authority, power, and credibility.
    Last edited by MK27; 04-09-2010 at 02:45 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Random UDP port open.
    By Yarin in forum Tech Board
    Replies: 7
    Last Post: 02-27-2010, 01:36 AM
  2. Open Source Licenses
    By Mario F. in forum A Brief History of Cprogramming.com
    Replies: 9
    Last Post: 10-10-2006, 08:53 PM
  3. Big help in Astar search code...
    By alvifarooq in forum C++ Programming
    Replies: 6
    Last Post: 09-24-2004, 11:38 AM
  4. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM
  5. Ghost in the CD Drive
    By Natase in forum A Brief History of Cprogramming.com
    Replies: 17
    Last Post: 10-12-2001, 05:38 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21