Thread: Testing hash functions

  1. #1
    Registered User
    Join Date
    Aug 2022
    Posts
    6

    Testing hash functions

    Can someone scan through my code and tell me what I'm doing wrong?
    I get 0 as results for my probability value and a 0.151722 for Multiplicative which I'm not sure is correct either..
    Code:
    #include <iostream>
    #include <fstream>
    #include <vector>
    #include <cstdint>
    #include <boost/math/distributions/chi_squared.hpp>
    
    
    int stringLengthHash(std::string str) {
        return str.length() % 65536;
    }
    
    
    int firstCharacterHash(std::string str) {
        if (!str.empty()) {
            return uint16_t(str[0]) % 65536;
        }
        return 0;
    }
    
    
    int additiveChecksumHash(std::string str) {
        uint16_t sum = 0;
        for (const char& c : str) {
            sum += uint16_t(c);
        }
        return sum % 65536;
    }
    
    
    int remainderHash(std::string str) {
        uint16_t sum = 0;
        for (const char& c : str) {
            sum = (sum + uint16_t(c)) % 65413;
        }
        return sum;
    }
    
    
    int multiplicativeHash(std::string str) {
        uint16_t sum = 0;
        for (const char& c : str) {
            sum = (sum * 31 + uint16_t(c)) % 65536;
        }
        return sum;
    }
    
    
    float performChiSquareTest(const std::vector<int>& hashes) {
        double totalWords = 100000.0;
        double expected = totalWords / 65536;
        double chiSquare = 0.0;
    
    
        for (int count : hashes) {
            chiSquare += (count - expected) * (count - expected) / expected;
        }
    
    
        boost::math::chi_squared c2d(65535.0);
        float p = 1.0 - boost::math::cdf(c2d, chiSquare);
    
    
        return p;
    }
    
    
    int main() {
        std::ifstream file("/usr/share/dict/words");
        std::string word;
        std::vector<int> stringLengthHashes(65536, 0);
        std::vector<int> firstCharacterHashes(65536, 0);
        std::vector<int> additiveChecksumHashes(65536, 0);
        std::vector<int> remainderHashes(65536, 0);
        std::vector<int> multiplicativeHashes(65536, 0);
    
    
        // Read words and perform hash functions
        while (file >> word) {
            // Accumulate counts for each hash function
            stringLengthHashes[stringLengthHash(word)]++;
            firstCharacterHashes[firstCharacterHash(word)]++;
            additiveChecksumHashes[additiveChecksumHash(word)]++;
            remainderHashes[remainderHash(word)]++;
            multiplicativeHashes[multiplicativeHash(word)]++;
        }
    
    
        // Perform chi-square tests and print results
        float a,b,c,d,e;
    
    
        a = performChiSquareTest(stringLengthHashes);
        std::cout << "String Length Hash - P Value: " << a << '\n';
    
    
        b = performChiSquareTest(firstCharacterHashes);
        std::cout << "First Character Hash - P Value: " << b << '\n';
    
    
        c = performChiSquareTest(additiveChecksumHashes);
        std::cout << "Additive Checksum Hash - P Value: " << c << '\n';
    
    
        d = performChiSquareTest(remainderHashes);
        std::cout << "Remainder Hash - P Value: " << d << '\n';
    
    
        e = performChiSquareTest(multiplicativeHashes);
        std::cout << "Multiplicative Hash - P Value: " << e << '\n';
    
    
        return 0;
    }

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Why aren't all your modulos a prime number?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Aug 2022
    Posts
    6
    Quote Originally Posted by Salem View Post
    Why aren't all your modulos a prime number?
    The assignment states:
    The hash (non)functions you should test are:


    String length (modulo 2^16)


    First character


    Additive checksum (add all characters together), modulo 2^16


    Remainder (use m = 65413, this is the first prime that is smaller than the table size). Remember that you cannot just add up all the characters and then take the mod of the result; you have to thread the modulo through the loop that computes the sum.


    Note that you may get a better p value from the Pearson’s χ² test (see below) by using m = 65413 there, as well.


    Multiplicative (using the scheme described in class/in the lecture notes). Again, remember that you can’t just use the final sum; you have to incorporate the multiplicative calculation into hashing loop.


    NOTE: Use double when computing the intermediate results inside the multiplicative hash. The extra precision is necessary because most of the work of the mult. hash uses fractional values.

  4. #4
    Registered User
    Join Date
    Dec 2017
    Posts
    1,633
    I don't know anything about cummulative distribution functions ( 4.1: Probability Density Functions (PDFs) and Cumulative Distribution Functions (CDFs) for Continuous Random Variables - Statistics LibreTexts ).
    However, it seems that in your case, when chiSquare is too much higher than your modulus, it yields 0.
    Do you have a link to a page that describes why this is an appropriate test for your problem? How are we supposed to interpret the result? What would you consider to be a "good" value?

    Note that stringLengthHash will never use more than about 20 of the 65000+ bins. firstCharacterHash will only use 127 (at most). additiveChecksumHash will hardly ever use more than 1000. remainderHash is pretty much identical to additiveChecksumHash. So the only one with any chance of using the entire range is multiplicativeHash, which is also the only one that doesn't yield 0.
    A little inaccuracy saves tons of explanation. - H.H. Munro

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. I need some help testing array functions
    By lemmiwinks in forum C Programming
    Replies: 8
    Last Post: 11-10-2013, 09:17 PM
  2. hash functions
    By kerrymaid in forum C Programming
    Replies: 0
    Last Post: 10-07-2010, 07:25 AM
  3. testing your program/functions
    By l2u in forum C++ Programming
    Replies: 6
    Last Post: 11-08-2006, 10:58 AM
  4. data Structures / Hash functions
    By rickc77 in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 11-11-2001, 03:46 PM
  5. data structures / hash functions
    By rickc77 in forum C Programming
    Replies: 5
    Last Post: 11-11-2001, 01:54 PM

Tags for this Thread