1. ## Euler Problem 10 With pthreads

I'm trying to implement threads with Project Euler problem 10. I am getting different results for the sum of the primes every time I run the program. I realize the code is quite atrocious; this is my first experiment with threads. In addition, if anyone could suggest a better way to distribute the load, I would appreciate it.

Code:
```#include <stdio.h>
#include <stdlib.h>

#define NUM	2000000

int isPrime(long int x)
{
long int y;
int z = 0;
for(y = 1; y <= x; y++)
{
if(x % y == 0)
z++;
}

if(z == 2)
return 1;
else
return 0;
}

void *first(void *sum)
{
long int x;
long int *y;
y = (long int*)sum;
for(x = 2; x < (NUM * 1/4); x++)
{
if(isPrime(x))
*y += x;
}

printf("Sum of primes in first(), %ld\n", *y);
}

void *second(void *sum)
{
long int x;
long int *y;
y = (long int*)sum;
for(x = (NUM * 1/4); x < (NUM * 1/2); x++)
{
if(isPrime(x))
*y += x;
}

}

void *third(void *sum)
{
long int x;
long int *y;
y = (long int*)sum;
for(x = (NUM * 1/2); x < (NUM * 3/4); x++)
{
if(isPrime(x))
*y += x;
}

}

void *fourth(void *sum)
{
long int x;
long int *y;
y = (long int*)sum;
for(x = (NUM * 3/4); x < NUM; x++)
{
if(isPrime(x))
*y += x;
}

}

int main(void)
{
void *sum = malloc(sizeof(long int));
int rc;
long int *z;

if(rc)
printf("ERROR; return code from pthread_create() is %i\n", rc);

if(rc)
printf("ERROR; return code from pthread_create() is %i\n", rc);

if(rc)
printf("ERROR; return code from pthread_create() is %i\n", rc);

if(rc)
printf("ERROR; return code from pthread_create() is %i\n", rc);

z = (long int*)sum;
printf("=======================\n");
printf("%ld\n", *z);
printf("=======================\n");

return 0;
}``` 2. So this problem doesn't really need threads, it only took half a second on my computer without threads, but if you want to learn, then read a good tutorial (https://computing.llnl.gov/tutorials/pthreads/). The biggest problem for speed is your prime-checking algorithm:

• Don't check all the way up to x, you only need to go to the square root of x. This can be y < sqrt(x), or y*y < x in your for loop.
• Don't start at 1. Start your loop at 3 and go up by twos: y += 2
• Check 2 explicitly before your loop: if (x & 1) return 0;

That should save significant time when checking bigger numbers.

An int or a long is not big enough to store the sum. Try a long long or unsigned long long.

You don't need to malloc memory, just declare sum a long long. malloc doesn't initialize the memory anyway, you would need to do sum = 0;

All 4 threads access the same sum variable with no mutex or resource lock. This is problematic since they overwrite eachothers values. Declare an array of 4 sum variables and pass separate ones into each thread function:
Code:
```long long int sums[NUM_THREADS] = {0};
There is no guaranteed execution order for threads in the pthread library. It's totally possible that control returns to the main thread and you print *z and exit before all four threads are done executing. You need to use pthread_join to wait for each one:
Code:
```pthread_join(threads, NULL);
...```
All this repetition with array elements should make you think loops. The only trick here is getting your thread functions into an array:
Code:
```pthread_t threads[NUM_THREADS];
void *(*funcs[NUM_THREADS])(void *) = {first, second, third, fourth};
long long int sum = 0, sums[NUM_THREADS] = {0};

for (i = 0; i < NUM_THREADS; i++) {
if rc
print error and exit
else
}
for (i = 0; i < NUM_THREADS; i++) {
if error
print error and exit
else
sum += sums[i];
}

print sum``` 3. The first thing I'll point out is casting issues: you have casts where you don't need them and don't have casts where you do need them. You needn't cast sum to a void*, because it already is one. You needn't cast a void* to a long* in your thread functions, because void* is implicitly converted to any other object pointer type. You should not use %ld to print out pointer values (your threads), but instead use %p, and if you use %p, you must cast the threads to void*. However, wouldn't it just make sense to print out the thread number, e.g. "Thread 1 created."?

Now, your main problem is that you're not waiting for your threads to finish. You need to call pthread_join() on each thread. Your program is exiting right as the threads are starting, which is why you don't get a consistent output (and why the output is, I presume, much smaller than you'd expect).

To be fully safe, you'll also want to protect access to *y in the threads with a mutex. When you say "*y += x" you're performing two actions on *y: you're reading from it and then writing to it. Another thread could modify *y (since all threads have access to it) in between, causing you problems. 4. Thank you cas and anduril, for your advice. I will look into pthread_join() and implementing mutexes. Anduril, I wasn't sure how to do a loop when initializing the threads to call different functions; thanks for clearing that up. 5. If you must divide up the work amoung mutiple threads, don't divide it up the way you have. smaller numbers take far less time to determine their primeness, therefore the four threads will finish at quite different speeds. Instead do something like:
(Note that you would preprocess the values 1-9 as a special case first)
You don't even need four functions then. Just have one function that takes an integer (either 1, 3, 7, or 9) and start there, adding 10 each time around the loop.

Also, you needn't create four additional threads, because you then have five threads and one of them doing nothing. Instead you would only create three additional threads and for the main thread, just do some of the work there directly, e.g. a direct call to fourth.

Of course, as has been said, you do not need threading for this, not in the slightest.
What it all comes down to is:
"Work smarter, not harder".

Or another relevant quote:
"You have a problem and so added threading. Now you have two problems" 6. I reworked the problem using these optimizations for the algorithm and the results were far more acceptable. However, I suppose its good to get a handle on threads sooner than later. 7. What it all comes down to is:
"Work smarter, not harder".
When you are dealing with larger prime numbers, you might need the Sieve of Eratosthenes. At ideone.com I can get the answer within 0.01 seconds, which I doubt your program gets anywhere close to. Popular pages Recent additions euler, load distribution, parallel, pthread 