Objective: This project will give you experience working with hash tables, as well as working with real data used by complex pieces of software, namely, Web proxy caches.
In class we learned about hash tables, which store pairs of the form p = (k,v), where k is a key and v is a value associated with the key. The Standard Template Library does not strictly provide a hash table (although it does provide classes that will implement insert(), find() and erase()). To implement a hash table, you need to decide the following things:
* the size of the hash table
* what hash function you will use
* how you will resolve collisions
To make reasonable decisions you need to consider anything you may know about the population of keys you will be placing in the table. This should become apparent in the following application of hash tables.
A "proxy cache" is a piece of software that handles web requests on
behalf of a population of Web users. Instead of sending requests directly
to the Web server that contains the item, users may point their Web browsers
to a proxy cache that is running near them. If the proxy cache contains
the item requested, it can return that item immediately instead of
requiring the request to travel all the way to a remote (and possibly
very busy) Web server and back. If the cache does not
contain the item it will fetch the item from the remote Web server and
forward it to the requesting browser, while storing a local copy. In this
way, subsequent requests for popular items may be satisfied more quickly.
Implementing a proxy cache requires three operations: insert(p), find(k)
and delete(k), where the keys are strings representing the items being
requested (the portion of a URL *after* http://,
with "www", like "www.example.com/index.html"). Sounds like a job for a
hash table, no? (Go figure!) So your job for this project is to evaluate
three different designs for a hash table.
Each evaluation method will work in the same way:
* Write a program that creates a hash table of a specified size.
* Read input from a file called "requests.dat", each line of which is a
request for an item residing on a server.
* Place the item in the hash table using some hash function. The method
we will use to resolve collisions is similar to separate chaining:
we will store the elements in the "set" class provided by the STL.
(#include <set>) That is, you don't know the actual underlying
data structures used by the STL set --- just that you can store multiple
items in it, and ask how many items are in the set.
* Loop through each location in the hash table, printing out the number
of elements stored in that location.
You will do this three times, with a different set of hash table parameters
each time. We observed that the best form of a hash table is one in which
each "bucket" contains the fewest possible items. You will evaluate these
hash table/hash function combinations:
* First, use a hash table of size 256. The hash function you will use
is f(k) = 10. Yes, it's a silly hash function; you should use it just
to "sanity check" your work.
* Second, create a hash table of size 256. The hash function you will use
now is the function given in program 10.14 on page 386 of the text (this
is the one we discussed in class in some detail).
* Third, create a hash table, the size of which is a prime number of
your choice with size less than 400.
Select your own hash function. Try to find one that does a good job
of spreading out the data. (Can you do better than in the second
You will turn in the following for each of the three sets of parameters:
1. A printout of the number of items in each location in the hash table
in table form, with 5 elements across in one row and fields of width 8.
For example, if your hash table is of size 14, your table may look
12 6 16 13 1
4 0 8 5 15
21 0 21 0
2. A discussion of the results, including a discussion of why the table
has the structure it does and how good you think the hash table
parameters are. (This discussion may be brief --- a full analysis
will entail knowledge of probability and statistics beyond the scope
of this course. However, feel free to include information like mean,
median and stddev if you are so inclined.)