1. ## To multithread or not ?

Hallo all,

I am still very new to C. I wrote a program that basically does evaluations on all binary numbers between 0 and 2^27. When aggressively compiled, the program is already quite fast (takes 40 sec on my 64bit , 2core MacBook). Would multithreading significantly improve the execution speed in a program which basically does through all binary numbers? If yes, is <pthread.h> the library to use? Here it is:
Code:
```#include <bitset>
#include <iostream>
#include <math.h>

int main()
{
static const unsigned n = 27;
static char *c[ n ] = {"aut","bel","bul","cyp","cze","deu","dnk","esp","est","fin","fra","gbr","grc","hun","irl","ita","ltu","lux","lva","mlt","nld","pol","prt","rom","svk","svn","swe"};
static unsigned int w[ n ] = {10,12,10,4,12,29,7,27,4,7,29,29,12,12,7,29,7,4,4,3,13,27,12,14,7,4,10};
static float p[ n ] = {8.34,10.59,7.6,0.8,10.32,82.18,5.47,45.27,1.34,5.31,63.92,61.09,11.18,10.03,4.39,59.21,3.37,0.48,2.27,0.41,16.39,38.09,10.67,21.45,5.4,2.02,9.22};

static const unsigned nc = 1 << n;

double pr = 0.0f;

unsigned int sumv = 0;
unsigned int sumw = 0;
float sump = 0.0f;

double COLEMAN[ 2 ] = {0.0};
double BANZHAF[ n ][ 2 ] = {{0.0}};

bool win_ni = false;
bool win_lb = false;

unsigned int i;
for ( i = nc; i--; )
{
std::bitset < n > v = i;
sumv = v.count();
if ( sumv < 15 ) continue;
sumw = 0;
sump = 0;

pr = pow( 0.5 , sumv ) * pow( 0.5 , n - sumv );

win_ni = false;
win_lb = false;

unsigned int k;
for ( k = n; k--; ) { if ( v[ n - k - 1 ] ) { sumw += w[ k ]; sump += p[ k ]; } }

if ( sumv >= 18 && sumw >= 255 && sump >= 308.01 ) { COLEMAN[ 0 ] += pr; win_ni = true; }
if ( sump >= 322.92 || sumv >= 24 ) { COLEMAN[ 1 ] += pr; win_lb = true; }

if(win_ni || win_lb)
{
unsigned int j;
for ( j = n ; j--; )
{
v.flip( j );
sumv = v.count();
sumw = 0;
sump = 0;

k = 0;
for ( k = n; k--; ) { if ( v[ n - k - 1 ] ) { sumw += w[ k ]; sump += p[ k ]; } }

if ( win_ni && ( sumv < 18 || sumw < 255 || sump < 308.01 ) ) BANZHAF[ j ][ 0 ] += pr;
if ( win_lb && ( ( sumv < 15 || sump < 322.92 ) && sumv < 24 ) ) BANZHAF[ j ][ 1 ] += pr;

v.flip( j );
}
}
}

printf( "Coleman  ; Efficiency\n" );
printf( "Nice 27  ; Lisbon 27\n" );
printf( "%f ; %f\n" , COLEMAN[ 0 ], COLEMAN[ 1 ] );
printf( "Banzhaf ; Voting ; Powers\n" );
printf( "State ; Nice 27  ; Lisbon 27\n" );
i = 0;
for ( i = n; i--; ) { printf("  %s ; %f ; %f\n" , c[ n - i - 1 ], BANZHAF[ i ][ 0 ], BANZHAF[ i ][ 1 ] ); }

return 0;
}```

2. Well, since its a bit late to try to decipher what you are doing, perhaps you could explain what you mean by 'goes through all the binary numbers'.

But in general, anything that can be written as a for loop can be multithreaded for a performance gain. There are exceptions, such as extremely short loops where teh thread setup time exceeds the gain. If you are only doing 2^27 calculations I would suggest that at 40 seconds execution time, that wont be an issue. You can probably get that down to ~16 seconds assumming you have hyperthreading enabled.

3. assuming all calculations can be done in parallel (ie, iterations do not depend/use the results of prior iterations), you will be able to get a speedup proportional to the number of threads. If you make two threads, that would make it run in almost half the time. "Almost" because the two cores share the same link to memory, as well as caches.

As for threading library, I recommend either pthread or Boost::Thread. Boost::Thread seems to be just a bit more portable (I couldn't find an implementation of pthread for 64-bit windows), and is more natively C++. Pthread is a C library. Pthread is tried and true though, being in POSIX OSes since prehistoric times, and it is a bit more feature-rich compared to Boost::Thread in its current state. Also, Boost::Thread will probably be made standard C++ (like the rest of the Boost library) in a few revisions.

anything that can be written as a for loop can be multithreaded for a performance gain
Not true. Not if results depend on prior results.
Code:
```int fib[100] = { 1, 1 };
for (i = 2;  i < 100; ++i) {
fib[i] = fib[i-1] + fib[i-2];
}```
It's not possible to parallelize that.

If you are only doing 2^27 calculations I would suggest that at 40 seconds execution time, that wont be an issue.
Huh...

You can probably get that down to ~16 seconds assumming you have hyperthreading enabled.
That is totally irrelevant. Hyperthreading requires the program to be multi-threaded. It basically simulates two cores on one. The speedup is more like 20&#37;, not >100% (where did that 16 sec estimation come from?). And the Core 2 doesn't even have hyperthreading.

4. ok, thanks for the comments. I will give multithreading a shot

SK

5. Originally Posted by cyberfish
assuming all calculations can be done in parallel (ie, iterations do not depend/use the results of prior iterations), you will be able to get a speedup proportional to the number of threads. If you make two threads, that would make it run in almost half the time. "Almost" because the two cores share the same link to memory, as well as caches.
Wrong. 4 logitcal processors ( on a dual core with HT) do not equal 4 times the procesing power. the additional logical processor from HT in practice increases performance about 30&#37; as much as an actual additional processor. Hence 40/2.6 = ~16

Just adding more threads will not give you a performance increase per se, it depends on several factors, nto the least of which is how many processign units you have and how efficiently the task parallelizes. Mosts tasks decrease in single iteration performance measures when parallelized. Ahmdals law applies here.

hence -
Originally Posted by abachler
But in general, anything that can be written as a for loop can be multithreaded...
I notice you left out the important qualifier statement, are you trolling?

Not true. Not if results depend on prior results.
Code:
```int fib[100] = { 1, 1 };
for (i = 2;  i < 100; ++i) {
fib[i] = fib[i-1] + fib[i-2];
}```
It's not possible to parallelize that.
yes, it is. Its just not a trivial process, I never claimed it to be trivial, only to be generally possible.

Huh...

That is totally irrelevant. Hyperthreading requires the program to be multi-threaded. It basically simulates two cores on one. The speedup is more like 20%, not >100% (where did that 16 sec estimation come from?). And the Core 2 doesn't even have hyperthreading.
2 physical cores * 1.3 = 2.6, 40/2.6 = ~16

Again, you are leaving out qualifiers in order to troll.

Code:
```
i[3] = i[1] + i[2]                          i[4] = i[2] + i[3]
Core A                                       Core B
mov eax , i[2]                             nop
mov edx , i[1]                             mov eax , i[2]
mov i[3] , eax                             nop
nop                                           mov edx , i[3]
nop                                           mov i[4] , eax```
and you just completed in 7 clock cycles what woudl have taken 8 otherwise. Not a huge performance gain, but its can be parallelized. Granted I coudl write an even further optimized single thread version that would blow away the parallel version, but that is for another time.

6. Wrong. 4 logitcal processors ( on a dual core with HT) do not equal 4 times the procesing power. the additional logical processor from HT in practice increases performance about 30&#37; as much as an actual additional processor. Hence 40/2.6 = ~16
My bad. I thought you meant you can speed up a 40sec single-threaded program to 16sec just by enabling Hyperthreading, which is what your first post seems to suggest.

Just adding more threads will not give you a performance increase per se, it depends on several factors, nto the least of which is how many processign units you have and how efficiently the task parallelizes. Mosts tasks decrease in single iteration performance measures when parallelized. Ahmdals law applies here.
I was assuming the OP is interested only in spawning two threads (one for each core).

I notice you left out the important qualifier statement, are you trolling?
My apologies, but I thought loops that depend on results of previous iterations are very common that that ought to be pointed out.

Again, you are leaving out qualifiers in order to troll.
Mind pointing out what qualifiers?

I am not sure what the last part of your post is supposed to point out.

And I repeat: The Core 2, which the OP uses, does not support Hyperthreading.

7. Multi-threading, as has been pointed out, does not always equal better performance. It also introduces several other issues that may or may not be obvious at this point in your design. If you design for multi-thread from the outset then it's not a big deal. However some designs are not well suited to multi-threading. It's not quite as simple as firing off a thread and letting it run. It IS this easy to make a new thread, it's NOT as easy to ensure you don't get any race conditions between threads.