I am looking for a very fast (at most a few CPU cycles) algorithm to hash an 64-bit integer into another 64-bit integer.

The main objective is that similar input values should produce very different output values.

It does not need to be cryptographically secure.

One obvious approach is to pre-compute a random number for each possible input value, and use that as the hash, but the table would be way too big for 64-bit.

It can be done for each part (for example, 16-bit parts) instead, with the results xor-ed together, but that's quite a few instructions -

Code:
rand0[4][65536]; // all randomly generated

uint16_t parts[4];

parts[0] = x & 0xffff;
parts[1] = (x >> 16) & 0xffff;
parts[2] = (x >> 32) & 0xffff;
parts[3] = (x >> 48) & 0xffff;

hash = rand[0][parts[0]] ^ rand[1][parts[1] ^ rand[2][parts[2] ^ rand[3][part[3]]
It also has the problem that the big rand table takes up 2MB, and will generate many cache misses.

Does anyone know of a faster way?

Any algorithm with L1 misses will probably be too slow, since an L1 miss is about 10 cycles, or about 40 simple arithmetic instructions (4 instructions per cycle on most modern Intel CPUs).