I'm loading a very long list from disk into an unordered_set. If I use a set of strings, it is very fast. A test list of about 7 MB loads in about 1 second. However, using a set of char pointers takes about 2.1 minutes! For a 59.6 MB list, the string version loads in 10 seconds. The char* version takes 525 minutes!
Here is the code for the string version:
Here is the code for the char* version:Code:unordered_set<string> Set; string key; while (getline(fin, key)) { Set.insert(key); }
The "new(mem)" is because I'm using a custom memory manager so I can allocate big blocks of memory and give them out to tiny objects like c strings. However, I've tested this with regular "new" and the results are identical. I've also used my memory manager in other tools with no problems.Code:struct unordered_eqstr { bool operator()(const char* s1, const char* s2) const { return strcmp(s1, s2) == 0; } }; struct unordered_deref { template <typename T> size_t operator()(const T* p) const { return hash<T>()(*p); } }; unordered_set<const char*, unordered_deref, unordered_eqstr> Set; string key; while (getline(fin, key)) { char* str = new(mem) char[key.size()+1]; strcpy(str, key.c_str()); Set.insert(str); }
The two structs are necessary to make the insert and find hash based on the actual c string and not its address. The unordered_deref I found on stack overflow.
Eventually I need to load multi-gigabyte files. This is why I'm using a custom memory manager, but it's also why this horrible slow down is unacceptable. Any ideas?



1Likes
LinkBack URL
About LinkBacks



