Hi everyone, I did a little benchmark, and I'm trying to figure out why the <unordered_set> (visual studio 2010) hash table performance is so much worse than std::set.
Code:
Time is measured in seconds. Disregard SOHTBST,
that's just another implementation of a tree.
adding to a container stats follow.
the first row is 2^11 randomly generated integers,
doubling each time until the last row,
which is 2^18 randomly generated integers.
std::set SOHTBST HashTable1 HashTable2
0.001210 0.000504 0.005610 0.003793
0.002509 0.000998 0.017613 0.015412
0.005010 0.002214 0.034532 0.014783
0.010273 0.004754 0.048999 0.029705
0.020482 0.009841 0.079733 0.060976
0.042823 0.021546 0.669926 0.140907
0.087980 0.044820 0.793633 0.238754
0.197135 0.103221 1.048396 0.480977
(HashTable1 is a new container, HashTable2 is
that same container after having all elements
removed. The effect is that HashTable2 doesn't need any resizing)
Adding uses:
start = timeSinceEpoch();
for (int i = 0; i < testSize; i++)
container.insert(testData[i]);
end= timeSinceEpoch();
iterating through all the elements:
std::set SOHTBST HashTable
0.000547 0.000017 0.000685
0.001091 0.000032 0.001361
0.002334 0.000059 0.002653
0.004378 0.000126 0.005325
0.008599 0.000226 0.010637
0.017478 0.000467 0.021505
0.034945 0.001034 0.042649
0.071910 0.003394 0.087486
Iterating uses:
start = timeSinceEpoch();
typename Algorithm::iterator iter = container.begin();
while (iter != container.end())
iter++;
end = timeSinceEpoch();
finding each of the elements:
std::set SOHTBST HashTable
0.000929 0.000118 0.001475
0.001918 0.000285 0.003258
0.003887 0.000637 0.005811
0.008105 0.001380 0.012648
0.016566 0.003361 0.027964
0.034278 0.006734 0.042473
0.071597 0.016240 0.087051
0.161150 0.040815 0.209862
Finding uses:
start = timeSinceEpoch();
for (int i = 0; i < testSize; i++)
container.find(testData[i]);
end = timeSinceEpoch();
removing all of the elements, one by one:
std::set SOHTBST HashTable
0.001538 0.000268 0.010862
0.003081 0.000532 0.027662
0.006087 0.001043 0.215371
0.012096 0.002074 0.481068
0.024105 0.004128 1.014633
0.048194 0.008248 13.332217
0.096763 0.016540 30.043754
0.196106 0.034952 62.446652
Removing uses:
start = timeSinceEpoch();
for (iter = container.begin(); iter != container.end(); ) {
typename Algorithm::iterator nextIter = iter;
nextIter++;
container.erase(iter);
iter = nextIter;
}
end = timeSinceEpoch();
Hash function:
struct myhash {
unsigned int operator() (int a) const {
// Robert Jenkins
a = (a+0x7ed55d16) + (a<<12);
a = (a^0xc761c23c) ^ (a>>19);
a = (a+0x165667b1) + (a<<5);
a = (a+0xd3a2646c) ^ (a<<9);
a = (a+0xfd7046c5) + (a<<3);
a = (a^0xb55a4f09) ^ (a>>16);
return a;
}
};
My main question is, why is the hash table so much slower in these use cases than stl::set?
And why is hash table's erase() function so terribly slow (60 seconds!)?
Please, give me some insight into how they may have implemented their hash table, that would lead to this. Or maybe its an error in my benchmarking?
- Evan Ovadia
An extra note, a few other hash tables I've measured against show the same behavior: worse than stl::set, and terrible erase().