I've created a C++ web counter that only counts unique hits. To do this, I've created a structure called IPStruct:
unsigned char ip1;
unsigned char ip2;
unsigned char ip3;
unsigned char ip4;
Whenever the counter is run, it gets the client's IP addy and scans the IP log file (using a binary file search since the file is stored in binary, containing one structure for each IP that has visited the site). If the IP is found, it checks the VisitTime to see if the hit is "unique" and handles each case appropriately. However, if a matching IP is not found, it inserts it in the proper order in the file. The only way I could think of to do that with a binary file was to find the spot where the new IP should be, copy the IP that currently resides at the location and every IP after that to a temp file, add the new IP, and copy the temp file's contents back in.
That works just fine, but I'm really worried that it'll be slow in practice. Anyone have any other ideas? I thought about having the log file contain an entry for every possible IP addy, but by my math, that would mean (roughly) 255*255*255*255 entries. Since each entry is 8 bytes, the file would be way too large (I calculated it at about 30 gig, but that doesn't sound right at all to me).
Anyway, any suggestions would be appreciated.
You could use some sort of post processing scheme. That is just log everything the hits you system on a given day. Then at midnight or when ever capture the log you've been writing to and empty it for the next days activites.
Then take that days log, parse it and put it in an SQL table, only adding new IP addresses. Unfortuanately this method would always be a day behind on activity. Its still a lot of data but its easy to get to.
just my two cents...
I had thought about doing that, but as you said, there's that nasty delay which could potentially count someone every time they visited for that first day, which just isn't feasible with a counter. I thought about having it directly interface with a database, using that to store and check uniques, but even with a fast database (ie: not a relation database of any kind), I don't think it would be any faster than what I'm doing now...