I didn't say it was the only solution. I only said it was the only solution I could see so far. That means that I haven't seen any solution yet that would work as well as re-hashing.
Anyway, I'm just testing the waters right now, an my implementation is rather simple (and incomplete). Just for reference:
Code:
template<typename Key_t, typename Type_t>
class XHashTable
{
public:
XHashTable(float LoadFactor = 0.01): m_Items(0), m_Collisions(0), m_Values(100), m_LoadFactor(LoadFactor)
{
assert(LoadFactor > 0);
}
void Add(const Key_t& Key, const Type_t& Value)
{
m_Items++;
assert(m_Items > 0);
if (m_Values.size() / m_Items < m_LoadFactor)
{
m_Values.resize(m_Values.size() * 2);
Rehash();
}
if (IFind(Key, nullptr))
return;
auto Hash = SpookyHash::Hash32(&Key[0], Key.size(), 0) % m_Values.size();
auto pair = std::make_pair(Key, Value);
auto & list = m_Values.at(Hash);
if (list.size() > 0)
m_Collisions++;
assert(m_Collisions >= 0);
list.push_back(pair);
}
Type_t& Find(const Key_t& Key)
{
Type_t temp;
if (!IFind(Key, &temp))
throw std::runtime_error("Key not found!");
return temp;
}
const Type_t& Find(const Key_t& Key) const;
unsigned int GetCollisions() { return m_Collisions; }
std::size_t GetStorageSize() { return m_Values.size(); }
unsigned int GetMapSize() { return m_Items; }
protected:
bool IFind(const Key_t& Key, Type_t* Out)
{
auto Hash = SpookyHash::Hash32(&Key[0], Key.size(), 0) % m_Values.size();
auto & list = m_Values.at(Hash);
for (auto it = list.begin(); it != list.end(); ++it)
{
if (it->first == Key)
{
if (Out) *Out = it->second;
return true;
}
}
return false;
}
void Rehash()
{
for (std::size_t it = 0; it < m_Values.size(); ++it)
{
auto & list = m_Values.at(it);
for (auto list_it = list.begin(); list_it != list.end();)
{
const auto & key = list_it->first;
const auto & value = list_it->second;
auto NewHash = SpookyHash::Hash32(&value, sizeof(value), 0) % m_Values.size();
if (NewHash != it)
{
m_Values.at(NewHash).push_back(std::make_pair(key, value));
auto old_it = list_it++;
list.erase(old_it);
}
else
++list_it;
}
}
}
std::vector<std::list<std::pair<Key_t, Type_t>>> m_Values;
int m_Items;
int m_Collisions;
float m_LoadFactor;
};
What collision strategy to use is an interesting question, and one I'll think about later. For now, I'm simply using a linked list in each bucket.
SpookyHash is a hashing function that can be found on the web.
But the problem remains: everytime I expand the size, the hash will be different for the same key. How does one fix this without re-hashing? From what I gather, an algorithm that returns the same hash for every table size is impossible. It can only strive to fulfill it.
All I require is that all insertion that are made can later be retrieved by searching for the key. There can be no misses. If I inserted something with a key, then it must be able to be retrieved.
Also I am seeking at worst O(logN) lookup time and O(k*n) where k is as close to 1 as possible memory usage.