Hello,
I am writing a program that generates random graphs given the number of nodes and the number of edges. Right now I just generate two random numbers x and y and add the edge (x, y) if it doesn't already exist. Too check for existence, I use a hash table.
My first idea was to have a hash function that takes x and y as parameters, computes c = x + y then adds an entry in the hash table at position c % hash_table_size (a prime number, I chose 666013). This obviously generates a lot of collisions, and wouldn't generate different values for edges made of nodes that add up to the same value ((1, 3) would have the same hash as (2, 2)).
This idea generated an edge list and printed it to a file of a graph with 10 000 nodes and 1 000 000 edges in about 10 seconds. This is just to make an idea, I realise it's nowhere near a good benchmark.
Then, I quickly came up with this simple hash function:
This sped things up a lot. For the same input, the graph was generated and printed to file in about 3.6 seconds. Almost three times faster. Without printing to file, this approach takes about 1.8 seconds.Code:unsigned int getkey(unsigned int x, unsigned int y) { return ((x << 19) | (y << 7)); }
Now, I realise there are a lot more things involved in the performance of the program, like how I generate the random numbers (I just use x = 1 + (rand() * rand()) % n and y = same until the x and y generated are not present in the hash, then print that edge and hash it...). I just want to know if there's anything that I can improve about the hash function that might speed things up. I did some testing, and it seems my function generates mostly very high numbers, so I think maybe a lot of the 666013 keys go unused. For the same test I mentioned before, the longest chain seems to be around 12 on average, at the end of the program.
So, is there any good hash function for what I am trying to do? Any other place that I could optimize in?