I think I've mentioned before on here that I did some bandwidth monitoring scripts in Python for work. Well, I've been learning C++ lately and I rewrote a critical part of the Python code in C++ to see how much faster it would be.
I ran both codes and have the following results:
Python: 59 seconds
C++: 34 seconds
These programs process very large files (tables of data from Wireshark dumps), and this is just a huge reminder of how something can be more IO bound than anything else. Honestly I thought the C++ version would be incredibly faster, but alas...
Here are the codes...nothing critical here of concern to my employer:
Code:
bandwidth = {}
fin = open('20110803.txt', 'r')
line = fin.readline()
while line != '':
row = line.split('\t')
bytes = int(row[6])
if row[1] in bandwidth:
bandwidth[row[1]] += bytes
else:
bandwidth[row[1]] = bytes
if row[3] in bandwidth:
bandwidth[row[3]] += bytes
else:
bandwidth[row[3]] = bytes
line = fin.readline()
fin.close()
Code:
#include <fstream>
#include <string>
#include <unordered_map>
#include <cstdlib>
int main(void)
{
std::ifstream infile;
std::unordered_map<std::string, int> bandwidth;
std::string line, ip1, ip2;
size_t pos1, pos2;
int bytes;
infile.open("20110803.txt");
while (std::getline(infile, line))
{
pos1 = line.find('\t');
pos2 = line.find('\t', pos1 + 1);
ip1 = line.substr(pos1 + 1, pos2 - pos1 - 1);
pos1 = line.find('\t', pos2 + 1);
pos2 = line.find('\t', pos1 + 1);
ip2 = line.substr(pos1 + 1, pos2 - pos1 - 1);
pos1 = line.find('\t', pos1 + 1);
pos1 = line.find('\t', pos1 + 1);
pos1 = line.find('\t', pos1 + 1);
pos2 = line.find('\t', pos1 + 1);
bytes = atoi(line.substr(pos1 + 1, pos2 - pos1 - 1).c_str());
if (bandwidth.find(ip1) != bandwidth.end())
bandwidth[ip1] += bytes;
else
bandwidth[ip1] = bytes;
if (bandwidth.find(ip2) != bandwidth.end())
bandwidth[ip2] += bytes;
else
bandwidth[ip2] = bytes;
}
infile.close();
return 0;
}
Suggestions/improvements are welcome, especially for the C++ part.