I think I've mentioned before on here that I did some bandwidth monitoring scripts in Python for work. Well, I've been learning C++ lately and I rewrote a critical part of the Python code in C++ to see how much faster it would be.
I ran both codes and have the following results:
Python: 59 seconds
C++: 34 seconds
These programs process very large files (tables of data from Wireshark dumps), and this is just a huge reminder of how something can be more IO bound than anything else. Honestly I thought the C++ version would be incredibly faster, but alas...
Here are the codes...nothing critical here of concern to my employer:
Code:bandwidth = {} fin = open('20110803.txt', 'r') line = fin.readline() while line != '': row = line.split('\t') bytes = int(row[6]) if row[1] in bandwidth: bandwidth[row[1]] += bytes else: bandwidth[row[1]] = bytes if row[3] in bandwidth: bandwidth[row[3]] += bytes else: bandwidth[row[3]] = bytes line = fin.readline() fin.close()Suggestions/improvements are welcome, especially for the C++ part.Code:#include <fstream> #include <string> #include <unordered_map> #include <cstdlib> int main(void) { std::ifstream infile; std::unordered_map<std::string, int> bandwidth; std::string line, ip1, ip2; size_t pos1, pos2; int bytes; infile.open("20110803.txt"); while (std::getline(infile, line)) { pos1 = line.find('\t'); pos2 = line.find('\t', pos1 + 1); ip1 = line.substr(pos1 + 1, pos2 - pos1 - 1); pos1 = line.find('\t', pos2 + 1); pos2 = line.find('\t', pos1 + 1); ip2 = line.substr(pos1 + 1, pos2 - pos1 - 1); pos1 = line.find('\t', pos1 + 1); pos1 = line.find('\t', pos1 + 1); pos1 = line.find('\t', pos1 + 1); pos2 = line.find('\t', pos1 + 1); bytes = atoi(line.substr(pos1 + 1, pos2 - pos1 - 1).c_str()); if (bandwidth.find(ip1) != bandwidth.end()) bandwidth[ip1] += bytes; else bandwidth[ip1] = bytes; if (bandwidth.find(ip2) != bandwidth.end()) bandwidth[ip2] += bytes; else bandwidth[ip2] = bytes; } infile.close(); return 0; }



5Likes
LinkBack URL
About LinkBacks




), if you expect that the key will be found in the dictionary more often than not, then consider changing: