I am faced with a strange performance problem in my code (made it multithreaded, with almost no extra work and no data sharing at all, and while the result was still right, it became slightly slower), and thought I would try to reproduce it in simpler code.
Much to my surprise, I could actually reproduce it.
Here is my test code, using std::thread from C++0x -
Defining MT makes it multithreaded.Code:#include <iostream> #include <cstdlib> #ifdef MT #include <thread> #endif void slow_exp(int base, long long int exp, int *r) { *r = 1; for (long long int i = 0; i < exp; ++i) { *r *= base; } } int f(int base, long long int exp) { #ifndef MT int r; slow_exp(base, exp, &r); return r; #else int r1, r2; std::thread thread1(slow_exp, base, exp / 2, &r1); std::thread thread2(slow_exp, base, exp - exp / 2, &r2); thread1.join(); thread2.join(); return r1*r2; #endif } int main() { std::cout << f(3, 4000000000LL) << std::endl; }
Just needed something that runs O(n) and the compiler is too dumb to optimize out completely.
The result -
The result (taking into account overflows and such) was the same, but the multithreaded version is slightly slower! user time is almost twice, suggesting it's doing twice as much work. How can this be? There isn't even any data sharing.Code:cyberfish@cyberfish-desktop:/tmp/tt$ g++ -O3 -std=gnu++0x tt.cpp -pthread cyberfish@cyberfish-desktop:/tmp/tt$ time ./a.out 1818370049 real 0m2.116s user 0m2.110s sys 0m0.000s cyberfish@cyberfish-desktop:/tmp/tt$ g++ -O3 -DMT -std=gnu++0x tt.cpp -pthread cyberfish@cyberfish-desktop:/tmp/tt$ time ./a.out 1818370049 real 0m2.252s user 0m4.360s sys 0m0.000s
This is run on a Core 2 Duo (dual core) machine, and the single threaded version uses 50% CPU, while the 2 threaded version uses 100%.
Any guesses?
I'm hoping it's something fundamentally wrong with how I am doing it, and can transport the "fix" back to my real program.
Thanks



LinkBack URL
About LinkBacks




. Sorry I didn't mention it.