here's my current code for hasher() and hasher_update():
Code:
size_t phantom::xor_hasher(const std::string &s) {
size_t len = s.size();
assert (len > 0);
unsigned int result = 0U;
unsigned int offset = s[0];
for (size_t i = 0; i < len; ++i) {
result ^= keys[offset + s[i]];
offset = s[i];
}
return result;
}
size_t phantom::xor_hasher_update(unsigned old_hash, const char &new_origin, const char &new_finale,
const char &old_origin, const char &old_finale) {
old_hash ^= keys[old_origin + old_origin];
old_hash ^= keys[old_origin + new_origin];
old_hash ^= keys[new_origin + new_origin];
old_hash ^= keys[old_finale + new_finale];
return old_hash;
}
if we want to improve speed here, why not make the keys array only half the current size (BASE rather than BASE + BASE) change hasher to this:
Code:
size_t phantom::xor_hasher(const std::string &s) {
size_t len = s.size();
assert (len > 0);
unsigned int result = 0U;
// unsigned int offset = s[0];
for (size_t i = 0; i < len; ++i) {
result ^= keys[s[i]]; // offset eliminated
// offset = s[i];
}
return result;
}
then hasher_update() gets much shorter and faster.
ah, ok, that won't work, although i'm leaving the post in here, because then permutations of the same string get hashed to the same value.
we could maybe cut off 1 line by setting offset = 0 for the first character, but that's only going to be a small speed improvement.