PDA

View Full Version : how to create hashes



B-Con
02-24-2005, 12:35 AM
(I know this is programming related, but it's more of a concept then a direct programming question, so it didn't seem to fit anywhere else nicely....)


I'd like to create a couple of my own custom hashes, basically just for the sheer fun/learning proccess of it....

I don't need anything fancy like MD5, just a simple function that will take a string of characters and spit out a value between 0 and 255 (it needs to be only 1 byte big)....

what's the proccess that I'd have to go through for this? basically all I'm doing is creating a value, in the end, from which the input string is uncalcuable, right? would something simple like below suffice?



unsigned char hash1(char pwd[])
{
int idx,total=0;
for (idx=0; idx < strlen(pwd); idx++)
total += pwd[idx] + (42 - idx);
return(total % 256);
}

but with that you can kinda make loose guesses as to what the input string was because each character's value directly influences the resulting value.... so would something like below be a bit more, er, "random"?


unsigned char hash2(char pwd[])
{
unsigned int idx,total=1;
for (idx=0; idx < strlen(pwd); idx++)
total += (pwd[idx] >> (idx+1)) * (pwd[idx] << (idx+1));
for (idx=0; idx < strlen(pwd); idx++)
total *= pwd[idx] % (pwd[idx] >> (idx+1));
return(total % 256);
}


am I running down the right road here, or just a complete, hopeless moron? ;)

any input or tutorials are the subject would be appreciated.....

sean345
02-24-2005, 01:01 PM
I found http://www.gamedev.net/reference/articles/article2066.asp to be helpful when I was first starting to learn about hashes.

- Sean

pianorain
02-24-2005, 01:26 PM
It depends on what you want your hash to do. Most hashes are used to preserve data integrity. Basically, any change to the input should result in a different output, while two different inputs should not result in the same output.
unsigned char hash1(char pwd[])
{
int idx,total=0;
for (idx=0; idx < strlen(pwd); idx++)
total += pwd[idx] + (42 - idx);
return(total % 256);
}This kind of hash has the unfortunate property of commutivity. Try running the strings "welcome" and "leewcom" through it. They both yield the same value. (Not to mention that strlen is used in the for loop condition. You should assign the size to a temporary variable so that you don't recalc the length every loop.)
unsigned char hash2(char pwd[])
{
unsigned int idx,total=1;
for (idx=0; idx < strlen(pwd); idx++)
total += (pwd[idx] >> (idx+1)) * (pwd[idx] << (idx+1));
for (idx=0; idx < strlen(pwd); idx++)
total *= pwd[idx] % (pwd[idx] >> (idx+1));
return(total % 256);
}Interesting, but unusable. The % operator may (and does with an input of "welcome") hit a divide by 0.

The overall goal of a hash is to create a number that depends on both the order and the value of the data passed in. This is most easily accomplished using bitwise operators. Consider the following "hash":
unsigned char hash3(char pwd[])
{
unsigned int idx,total=0;
unsigned int size = strlen(pwd);
for (idx=0; idx < size; idx++)
total = (total << 1) ^ pwd[idx];
return total;
}This is both order- and value-dependent. With inputs "Welcome" and "eWlcome", the output is 187 and 123. Note that with an 8-bit hash, there will be a significant number of collisions (two inputs that result in the same output).

I recommend the following link for a more indepth discussion: http://www.geocities.com/SiliconValley/Pines/8659/crc.htm.

B-Con
02-25-2005, 12:31 AM
Thanx for the links, I'll be sure to read through them :)

I hadn't thought about accounting for the order of the chars in the string influencing the outcome, thx for mentioning that :)

and in what compiler does the '^' operator exist? I've always had to write my own function to do that....

I know that with a mere 256 possibilities there going to be a heckofa lotta collisions, but my aim is to create about 5 different hashes that will be used, that way there will be 256^5 possible combinations of values from the 5 hashes, meaning that there should be very few collisions for an input pwd that would generate all 5 hashes perfectly, just like the actual origonal pwd..... btw, by doing that I wouldn't be comprimising the origonal string, would I? if they have 5 different hashes of the string, would that make it easier to try and guess what it was?

CornedBee
02-25-2005, 02:41 AM
The ^ operator exists in every C compiler, but it's a bit-wise XOR, not a power operator.

Oh, and don't put strlen in the loop condition. Take the value once and save it. We had an interesting discussion about this a while ago.

B-Con
02-25-2005, 03:49 AM
where was the discussion? got a link, please? ;)

CornedBee
02-25-2005, 05:11 AM
http://cboard.cprogramming.com/showthread.php?t=61652

pianorain
02-25-2005, 08:51 AM
btw, by doing that I wouldn't be comprimising the origonal string, would I? if they have 5 different hashes of the string, would that make it easier to try and guess what it was?Well, easier? I guess so. After all, the more data someone can get, the "easier" it is. However, a good hash is typically a one-way street. By a hash's very nature, you will lose a certain amount of data about the original string. Recreating that data is practically impossible.

I suggest using a single 32-bit hash instead of 5 8-bit hashes. The page that I originally posted has some good information on hashes, along with some code that you can download and use.

B-Con
02-26-2005, 02:13 AM
I suggest using a single 32-bit hash instead of 5 8-bit hashes. The page that I originally posted has some good information on hashes, along with some code that you can download and use.
I'm using it to offset bytes, so any value over 256 is pointless.... thus I chose to create several hashes.....