Today, I am trying to optimize a N-queen puzzle (finding all solutions, not just one) board by parallellizing it. Easy, some might think.
...Except, the problem is that I am getting very different results on different platforms!
Running the code on Windows (with N=11), gives roughly 1.2 seconds for 1 thread. By adding another, I cut it roughly in half, and adding another decreases it to around 450 ms.
That would make sense, since I have a dual core on this windows machine.
However, the goal is to make this run on a linux machine (I get so tired of why I always have to use these damn linux machines -_-), which has 26 or 24 cores, if the info returned by the OS is to be believed.
The problem is that whenever I add more than one thread, it gets slower. I cannot fathom why. So I am seeking advice, to see if someone knows something I don't.
So I attached the source to the post. I used Boost for multithreading, since I don't know of any better for C++ (I can't use the standard's multi-threading facilities either).
The code must compile with GCC 4.4. This means I can't use lambdas (hence why I use boost::bind and boost::ref), nor nullptr (but I define that as NULL).
The command list I use for compiling is:
g++44 "Vecka 7-8.cpp" "stdafx.cpp" -std=c++0x --pedantic -O3 -Wall -Wextra -o foo -Dnullptr=NULL -Iboost_1_49_0_beta1 -L/afs/isk.kth.se/home/*snip*/boost_1_49_0_beta1/lib/lib -lboost_thread -lboost_date_time -DBOARD_SIZE=11 -DNUM_THREADS=2
(If you see anything wrong, please let me know. I am not too familiar with gcc's command line.)
I've built boost using shared libraries (couldn't get static libraries to work).