# Thread: Very fast hashing of single integer

1. ## Very fast hashing of single integer

I am looking for a very fast (at most a few CPU cycles) algorithm to hash an 64-bit integer into another 64-bit integer.

The main objective is that similar input values should produce very different output values.

It does not need to be cryptographically secure.

One obvious approach is to pre-compute a random number for each possible input value, and use that as the hash, but the table would be way too big for 64-bit.

It can be done for each part (for example, 16-bit parts) instead, with the results xor-ed together, but that's quite a few instructions -

Code:
```rand0; // all randomly generated

uint16_t parts;

parts = x & 0xffff;
parts = (x >> 16) & 0xffff;
parts = (x >> 32) & 0xffff;
parts = (x >> 48) & 0xffff;

hash = rand[parts] ^ rand[parts ^ rand[parts ^ rand[part]```
It also has the problem that the big rand table takes up 2MB, and will generate many cache misses.

Does anyone know of a faster way?

Any algorithm with L1 misses will probably be too slow, since an L1 miss is about 10 cycles, or about 40 simple arithmetic instructions (4 instructions per cycle on most modern Intel CPUs). 2. Also, the input domain is numbers with up to 8 arbitrary bits set. 3. O_o

You aren't mixing any bits of the source that way.

*shrug*

I guess one would try using smaller primes to mutate bytes or words within the string if one required a hash in the fewest instructions possible.

Code:
```unsigned long long Hash
(
unsigned long long fValue
)
{
const unsigned long long kPrime = (~0 - 83 + 1);
const unsigned long long kMultiplier = 0x00F100F100F100F1;
fValue ^= fValue * kMultiplier;
fValue ^= fValue * kPrime;
return(fValue);
}```
I don't really like such an idea without serious testing, but I like not using the actual value in the mix even less.

Soma 4. Is something like MurmurHash too slow? 5. Originally Posted by phantomotap O_o

You aren't mixing any bits of the source that way.

*shrug*

I guess one would try using smaller primes to mutate bytes or words within the string if one required a hash in the fewest instructions possible.

Code:
```unsigned long long Hash
(
unsigned long long fValue
)
{
const unsigned long long kPrime = (~0 - 83 + 1);
const unsigned long long kMultiplier = 0x00F100F100F100F1;
fValue ^= fValue * kMultiplier;
fValue ^= fValue * kPrime;
return(fValue);
}```
I don't really like such an idea without serious testing, but I like not using the actual value in the mix even less.

Soma
Thanks, will definitely give that a try.

Is there anything fundamentally wrong with not using bits in the input, if it satisfies all the properties of a good hash (or is there a property that doesn't satisfy)? 6. Originally Posted by MutantJohn Is something like MurmurHash too slow?
I have never heard of it but just looked it up. Pretty cool!

It does seem to be a little too slow, though. 7. Is there anything fundamentally wrong with not using bits in the input, if it satisfies all the properties of a good hash (or is there a property that doesn't satisfy)?
O_o

I realize that the circumstances would be unlikely, but the generated values could conceivably cancel out in any of several ways.

Let's pretend that `rand[\$1] ^ rand[\$2] ^ rand[\$3]' evaluates to `\$4' regardless of the corresponding values.

If any other `rand[\$4] ^ rand[\$5] ^ rand[\$6]' evaluates to `\$4', the lowest bits are solely responsible for changing the hashed value.

Yes. I again do realize the circumstances would be unlikely. I just don't like the idea than any bit could ever be excluded from changing the computed hash.

Soma 8. Originally Posted by phantomotap O_o

I realize that the circumstances would be unlikely, but the generated values could conceivably cancel out in any of several ways.

Let's pretend that `rand[\$1] ^ rand[\$2] ^ rand[\$3]' evaluates to `\$4' regardless of the corresponding values.

If any other `rand[\$4] ^ rand[\$5] ^ rand[\$6]' evaluates to `\$4', the lowest bits are solely responsible for changing the hashed value.

Yes. I again do realize the circumstances would be unlikely. I just don't like the idea than any bit could ever be excluded from changing the computed hash.

Soma
Is there a theoretical basis for this being a bad thing?

In this case, I wouldn't say the higher bits are excluded from changing the computed hash. If they were different, the computed hash would be different, so they had just as much of a role in determining the hash as the lower bits. 9. Is there a theoretical basis for this being a bad thing?
O_o

Yes. Of course, the research is focused on cryptography.

Yes. I know that you don't care about the strength of the function.

I only said I don't like the idea.

In this case, I wouldn't say the higher bits are excluded from changing the computed hash. If they were different, the computed hash would be different, so they had just as much of a role in determining the hash as the lower bits.
No. The higher order bits would not play any role in determining the hash in situation I described.

You should really just read the explanation again if you don't understand the situation I described, but I will offer an example.

Code:
```// ...
Table = 641671;
// ...
Table = 751345;
// ...
Table = 179318;
// ...```
Code:
```Value = Value & 0xffff;
// ...
Word = Value & 0xffff; // Word = Value;
Word = (Value >> 16) & 0xffff; // Word = 0;
Word = (Value >> 32) & 0xffff; // Word = 0;
Word = (Value >> 48) & 0xffff; // Word = 0;
// ...
Fragment = Table[Word] ^ Table[Word] ^ Table[Word]; // Fragment = 0;
// ...
Hash = Fragment ^ Table[Word]; // Hash = Table[Word];```
Code:
```// ...
Table[(1 << 15)] = 179318;
// ...```
Code:
```Value = Value & 0xffff;
// ...
Value |= 1 << 63;
// ...
Word = Value & 0xffff; // Word = Value;
Word = (Value >> 16) & 0xffff; // Word = 0;
Word = (Value >> 32) & 0xffff; // Word = 0;
Word = (Value >> 48) & 0xffff; // Word = 1 << 15;
// ...
Fragment = Table[Word] ^ Table[Word] ^ Table[Word]; // Fragment = 0;
// ...
Hash = Fragment ^ Table[Word]; // Hash = Table[Word];```
Code:
```Value1 = Value & 0xffff;
Value2 = Value1 | (1 << 63);
Hash(Value1) == Hash(Value2);```
As I said, the circumstances would be unlikely.

I still don't like the idea that any bit could ever be excluded from changing the computed hash.

Soma 10. I have read your example multiple times, and still don't think I get it.

Are you essentially trying to say Table[1 << 15] can happen to be equal to Table? Because that's the only case the 2 Fragments can be equal, assuming the other entries of the table are the same.

Isn't that essentially the same as finding a collision in the 3-words hash function (that generates Fragment)?

Does this property make collisions more likely? Since all the table entries are randomly generated, I would think that makes this function more uniformly distributed, not less? 11. Out of curiosity, is this too many instructions for you?
Code:
```int hash( int k )
{
k *= 357913941;
k ^= k << 24;
k += ~357913941;
k ^= k >> 31;
k ^= k << 31;

return k;
}``` 12. Originally Posted by MutantJohn Out of curiosity, is this too many instructions for you?
Code:
```int hash( int k )
{
k *= 357913941;
k ^= k << 24;
k += ~357913941;
k ^= k >> 31;
k ^= k << 31;

return k;
}```
That seems fast enough (extended to 64-bit). What's the story behind that? 13. Oh, it's just some code I found from the authors of gFlip3D so I don't know much about it but it seems quite condensed. I also trust the authors' wisdom. Their other code is pretty good so I'm trusting that this as well.

This is the code in pure source form :
Code:
```// originally CUDA code
__forceinline__ __device__ float hash( int k )
{
k *= 357913941;
k ^= k << 24;
k += ~357913941;
k ^= k >> 31;
k ^= k << 31;

return int_as_float( k );
}```
Source : 3D Delaunay Triangulation on the GPU

The code is in KerPredicates.cu of gDel3D.

Though I'm not sure about it being originally intended for floating point numbers. I don't know if that'd make a good integer hash. I guess the only way would be to test it. 14. Originally Posted by MutantJohn Oh, it's just some code I found from the authors of gFlip3D so I don't know much about it but it seems quite condensed. I also trust the authors' wisdom. Their other code is pretty good so I'm trusting that this as well.

This is the code in pure source form :
Code:
```// originally CUDA code
__forceinline__ __device__ float hash( int k )
{
k *= 357913941;
k ^= k << 24;
k += ~357913941;
k ^= k >> 31;
k ^= k << 31;

return int_as_float( k );
}```
Source : 3D Delaunay Triangulation on the GPU

The code is in KerPredicates.cu of gDel3D.

Though I'm not sure about it being originally intended for floating point numbers. I don't know if that'd make a good integer hash. I guess the only way would be to test it.
Ah! Good old CUDA.

The constants are 0x155555..., so it basically shifts the number by all even amounts, then xor them together. And to make sure the lower bits aren't too static, the upper bits are mixed into lower bits.

I suppose that could work.

Will just have to verify experimentally that this doesn't cause too many collisions. 15. Does this property make collisions more likely?
O_o

Let me try a different approach to explaining the reason behind my comment: despite having a 64bit result, you don't really have a 64bit hash. The 64bit result of each component is just an arbitrarily lengthened 16bit hash. You have four 16bit hashes which you've combined with the `XOR' operation. I'm basically guaranteed to find a collision in 262144 attempts.

Soma Popular pages Recent additions 