I need to create a hashing function in which I use a name (c string) as the key. Is there any way to do this without treating it as an int?
Printable View
I need to create a hashing function in which I use a name (c string) as the key. Is there any way to do this without treating it as an int?
could you go into more detail?
basically, i need to read a file that contains records. i want to use the NAME as my hashing key. however, i am new to hashing, and i am not sure if i can create a hash table with a string or if i first need to convert that string in to an int, and then perform my hash function? like i said, i am working with a binary file and this is new to me. please help out if you can?
>however, i am new to hashing, and i am not sure if i can create a hash table with a string or if i first need to convert that string in to an int
It's a good idea to convert the string (hash) to an int so that you can use that value for direct access to the table. That's basically what a hash table is:
Here's a good general hash function that converts a string to unsigned long:Code:int index = hash ( string );
index %= TABLE_SIZE;
// Do something with table[index]
My appologies if the algorithm is off, I kind of wrote it from memory.Code:unsigned long elf ( const char *key )
{
const char *it = key;
unsigned long ret = 0, i;
while ( *it != '\0' ) {
ret = ( ret << 4 ) + *it;
i = ret & 0xf0000000;
if ( i != 0 )
ret ^= i >> 24;
ret &= ~i;
it++;
}
return ret;
}
-Prelude
you gave me two sections of code...THANKS!
however, i am unsure where i put each different segment?
also, you stated this:
int index = hash ( string );
where you have string, should i put the name of variable that is a string or just keep it as string?
thanks, i just need a little more explanation.
>thanks, i just need a little more explanation.
I can do better, I'll give you a simple implementation to pore over:
-PreludeCode:#include <iostream>
#include <cstring>
// Hash using ELF format
unsigned long elf ( const char *key )
{
const char *it = key;
unsigned long ret = 0, i;
while ( *it != '\0' ) {
ret = ( ret << 4 ) + *it;
i = ret & 0xf0000000;
if ( i != 0 )
ret ^= i >> 24;
ret &= ~i;
it++;
}
return ret;
}
int main()
{
char table[13][20]; // Small table as an example
for ( int i = 0; i < 13; i++ )
table[i][0] = '\0'; // Initialize table
for ( int i = 0; i < 5; i++ ) {
char name[20];
std::cout<<"Enter a name: ";
std::cin.getline ( name, 20 );
unsigned long index = elf ( name ) % 13; // First hash
// If the index already has an item, rehash to the next index
while ( table[index][0] != '\0' )
index = elf ( name ) % 13 + 1;
strcpy ( table[index], name );
}
for ( int i = 0; i < 13; i++ )
std::cout<< table[i] <<std::endl;
}
awesome, you guys are helping a lot here!
BUT, you know me...I still have some trouble.
the last chunk of code is close to what I think I need. However, the table I need has to have exactly 3479 records. How would I change the code given to make it work? I tried, but I'm having some trouble with the strcpy ALSO, I am inputing my information from a binary file. so how would I account for that as well? something like myInput >> name >> address >> etc;
thanks guys and gals!
ALSO, I noticed that I seem to be getting collissions within my table with the code I have. How do I go about implementing linear probing to avoid collisions? Would it be placed within the code above or outside of it?
>However, the table I need has to have exactly 3479 records.
Just change the size of the first dimension for the table.
>but I'm having some trouble with the strcpy
Could you be more specific?
>I am inputing my information from a binary file. so how would I account for that as well?
It makes your job considerably easier, just read an appropriately sized block into the input array and then insert exactly as shown above.
>How do I go about implementing linear probing to avoid collisions?
This is your lucky day, the code I gave you implements linear probing for collision resolution.
-Prelude
Prelude,
alright, so it is my luck day! Nice!
you said that I just need to change my first dimension--however, do i need the second dimension at all???
so i need something like this:
char table[3479][13]; ??
secondly, with binary files you said:
>It makes your job considerably easier, just read an appropriately sized block into the input array and then insert exactly as shown above.
do i set up a while(!myInput.eof())
{
myInput >> name >> etc;
}
i'm not sure what you mean by block size?
forgive me, i'm new...but i learn quick! :)
>you said that I just need to change my first dimension--however, do i need the second dimension at all???
Yes, the first dimension is the number of elements in the table, the second dimension is the number of characters in an element. Assuming that the table is an array of strings.
>i'm not sure what you mean by block size?
Reading from a binary file is not much different from reading a text file. Here is the input program:
And the program that wrote the binary file:Code:#include <fstream>
#include <iostream>
int main()
{
std::ifstream inb ( "test.txt", std::ios::binary );
char block[13];
while ( inb.read ( block, sizeof block ) )
std::cout<< block <<std::endl;
inb.close();
}
Piece of cake, no harder than text files. :)Code:#include <fstream>
#include <iostream>
int main()
{
std::ofstream outb ( "test.txt", std::ios::binary );
char block[13];
while ( std::cin.getline ( block, sizeof block ) ) {
outb.write ( block, sizeof block );
for ( int i = 0; i < sizeof block; i++ )
block[i] = '\0'; // Clear the block
}
outb.close();
}
-Prelude
once again, results are taking place...just a few questions:
can I do both of those in the same program with no problems?
where should I place this code you just gave me with respect to the code from before dealing with the hashing? should it be inserted where you had the user inputing names?
with my binary file, i need to read 4 values (a struct), and use one value as the hash key. would i just have 4 read statements in a row?
I appreciate the help Prelude!
Oh my, so I thought I was on the right track--then I spoke with my peer advisor and he's what he had to say:
your program is not allowed to create huge arrays to hold the records and your program is not allowed to hold more than 2 or 3 baby names records at any one time. So, making the table array is the wrong thing to do. You must use hash files, not hash table arrays. The concept is the same, but the storage must be on disk. You do not initialize to zero; the disk must hold baby struct images. You must create an initialization baby record to write to the file; that may contain 0 in some fields.
Any help now? Was I going in the complete wrong direction with the code above? This is horrible.
>Any help now?
I think you should consult www.google.com with the search string "hash files". I could spell it out here, but I know there are plenty of web sites that do this already, thus saving me all of that typing. ;)
-Prelude
Hey Prelude...
I'll see what I can figure out, but I did have one question:
how do I initialize a hash file??? with a table, it was easy. Also, do I still need to use an array of any sort or not with hash files?
>how do I initialize a hash file???
Hopefully the file already exists, but we're basically talking about a binary file where every record is exactly the same size. This way you can calculate an offset with the hash function.
>Also, do I still need to use an array of any sort or not with hash files?
You only need to work with a single record at a time, so no.
-Prelude
can i still use the hash function you gave me ?
>can i still use the hash function you gave me ?
Of course. The only real difference between a hash table and a hash file is how you access the data. The actual hashing part remains the same throughout.
-Prelude
I hate to keep bothering, but how do I deal with collissions now.
I have set up a while loop to go through the entire file, and I read on record at a time and hash it. how can i deal with collissions within in my while loop???
that will be the last of my questions!
>how can i deal with collissions within in my while loop???
Since you are using linear probing you don't have to make any changes, handle collisions the same way: If your hashed index is already taken, move to the next one until you find an empty index.
-Prelude