# Thread: Using multiThreads to speed up program

1. Originally Posted by Salem
You only need 1 bit per random number (you're only storing a true/false).
So the whole RAND_MAX range would only need 256MB of memory.
But the lookup is O(1) when you're done.

By contrast, an array of just all the known primes upto RAND_MAX would contain approx 100M entries.
Stored as 4-byte integers, this would come to 400MB.
Since it is ordered, bsearch() is possible (in ~26 iterations per lookup) (it's O(log2(N))
Hi again.
So I used the sieve algorithm, initialized a bool array the size of RAND_MAX which is 2147483647
and it is working.

However, looping through the array to intialize with actual content is taking forever and makes my code even worse.
The only time that it works in my favor is with huge numbers like
55555555 randoms numbers. but anything smaller and it starts to lose effect.

I tried using threads to initalize it simultaneously but results were the same .

with the previous code I managed to beat my professor's code, but now he improved his code again , and I am back to the thinking board.

He of course trust us to use the web in the look for answers.
any ideas how to improve the sieves algorithm?

2. Not without seeing your current code.

Sieve of Eratosthenes - Wikipedia
"Sieve of Eratosthenes: algorithm steps for primes below 121 (including optimization of starting from prime's square)."

3. Originally Posted by Salem
Not without seeing your current code.

Sieve of Eratosthenes - Wikipedia
"Sieve of Eratosthenes: algorithm steps for primes below 121 (including optimization of starting from prime's square)."
Code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <math.h>

#define MAX 2147483646
char *primes;

int *amount;
long sum = 0;
long primeCounter = 0;

int isPrime(int num)
{
// if (primes[num - 1])
// 	return 1;

// else
// {
if (num <= 1)
return 0;
if (num <= 3){
primes[num-1]=1;
return 1;
}
if (num % 2 == 0 || num % 3 == 0)
return 0;
int sqr = sqrt(num);
for (int i = 5; i <= sqr; i = i + 6)
{
if (num % i == 0 || num % (i + 2) == 0)
return 0;
}
primes[num-1]=1;
return 1;
// }
}

void *func()
{
long local_sum = 0, local_counter = 0, count = 0;
int limit = (*amount) / (*num_of_threads);
for (int i = 0; i < limit; ++i)
{
int random = rand();
if (isPrime(random))
{
local_sum += random;
local_counter++;
count++;
if (count == (*amount) / (*num_of_threads))
return NULL;
}
}
sum += local_sum;
primeCounter += local_counter;
if (count == (*amount))
return NULL;
return NULL;
}

// void *func1()
// {
// 	// for (int i = 0; i < MAX * 0.25; i++)
// 	// {
// 	// 	primes[i] = 0;
// 	// }

// 	int limit = (sqrt(MAX) + 1) * 0.25;

// 	for (long long i = 2; i < limit; i++)
// 	{
// 		if (primes[i - 1])
// 		{
// 			for (long long j = i * i; j <= MAX; j += i)
// 			{
// 				primes[j - 1] = 0;
// 			}
// 		}
// 	}

// 	return NULL;
// }

// void *func2()
// {
// 	// for (int i = MAX * 0.25; i < MAX * 0.5; i++)
// 	// {
// 	// 	primes[i] = 0;
// 	// }
// 	int limit = (sqrt(MAX) + 1) * 0.5;

// 	for (long long i = MAX * 0.25; i < limit; i++)
// 	{
// 		if (primes[i - 1])
// 		{
// 			for (long long j = i * i; j <= MAX; j += i)
// 			{
// 				primes[j - 1] = 0;
// 			}
// 		}
// 	}

// 	return NULL;
// }

// void *func3()
// {
// 	// for (int i = MAX * 0.5; i < MAX * 0.75; i++)
// 	// {
// 	// 	primes[i] = 0;
// 	// }

// 	int limit = (sqrt(MAX) + 1) * 0.75;

// 	for (long long i = MAX * 0.5; i < limit; i++)
// 	{
// 		if (primes[i - 1])
// 		{
// 			for (long long j = i * i; j <= MAX; j += i)
// 			{
// 				primes[j - 1] = 0;
// 			}
// 		}
// 	}

// 	return NULL;
// }

// void *func4()
// {
// 	// for (int i = MAX * 0.75; i < MAX; i++)
// 	// {
// 	// 	primes[i] = 0;
// 	// }

// 	int limit = (sqrt(MAX) + 1);

// 	for (long long i = MAX * 0.75; i < limit; i++)
// 	{
// 		if (primes[i - 1])
// 		{
// 			for (long long j = i * i; j <= MAX; j += i)
// 			{
// 				primes[j - 1] = 0;
// 			}
// 		}
// 	}
// 	return NULL;
// }

int main(int argc, char *argv[])
{
// primes = malloc((MAX + 1) * sizeof(char));
// primes[MAX]={0};

// pthread_t t1, t2, t3, t4;
// pthread_create(&t1, NULL, &func1, NULL);
// pthread_create(&t2, NULL, &func2, NULL);
// pthread_create(&t3, NULL, &func3, NULL);
// pthread_create(&t4, NULL, &func4, NULL);

// for (int i = 0; i < MAX; i++)
// {
// 	primes[i] = 0;
// }

/* Loop through a portion of the array (up to the square root of MAX). If
* it's a prime, ensure all multiples of it are set to zero (false), as they
* clearly cannot be prime.
*/
// int limit = sqrt(MAX) + 1;
// for (long long i = 2; i < limit; i++)
// {
// 	if (primes[i - 1])
// 	{
// 		for (long long j = i * i; j <= MAX; j += i)
// 		{
// 			primes[j - 1] = 0;
// 		}
// 	}
// }

long number_of_processors = sysconf(_SC_NPROCESSORS_ONLN);
num_of_threads = malloc(number_of_processors * sizeof(int));

//make sure enough arguments were received
if (argc != 3)
{
printf("Too few arguments ");
printf("USAGE: ./primeCalc <prime pivot> <num of random numbers>");
exit(0);
}
//receive arguments from user
int randomPivot = atoi(argv[1]);
int numOfRandomNumbers = atoi(argv[2]);
amount = malloc(numOfRandomNumbers * sizeof(int));
*amount = numOfRandomNumbers;

srand(randomPivot);
//creating multiple threads , executing our function
for (int i = 0; i < *num_of_threads; ++i)
{
if (pthread_create(&threads[i], NULL, &func, NULL) != 0)
{
fprintf(stderr, "error: Cannot create thread # %d\n", i);
break;
}
}
for (int i = 0; i < *num_of_threads; ++i)
{
{
fprintf(stderr, "error: Cannot join thread # %d\n", i);
}
}

free(amount);

printf("%ld,%ld\n", sum, primeCounter);
exit(0);
}
The code that is slashed is my Sieve attempt while using multithreads.

4. Well that was interesting.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <math.h>
#include <unistd.h>

#define TABLE_SIZE      ((INT_MAX/CHAR_BIT/sizeof(int)/2)+1)
#define BITS_PER_ENTRY  (sizeof(int)*CHAR_BIT)

// Halve the length of the table by only storing odd results
// 2 is a special case.
// If the bit is clear, it's a prime
unsigned int table[TABLE_SIZE] = { 0 };

long sum = 0;
long primeCounter = 0;

// map 1,3,5,7,9 etc to 0,1,2,3,4
int valueToIndex(int value)
{
int result = value / 2;
return result;
}

void setComposite(int composite)
{
if (composite % 2 == 0)
return;
int value = valueToIndex(composite);
int pos = value / BITS_PER_ENTRY;
int bit = value % BITS_PER_ENTRY;
unsigned int mask = 1u << bit;
}

#ifdef USE_LOOKUP
int isPrime(int number)
{
if (number <= 2)
return 1;
if (number % 2 == 0)
return 0;
int value = valueToIndex(number);
int pos = value / BITS_PER_ENTRY;
int bit = value % BITS_PER_ENTRY;
unsigned int mask = 1u << bit;
return (table[pos] & mask) == 0;
}
#else
int isPrime(int num)
{
if (num <= 1)
return 0;
if (num <= 3)
return 1;
if (num % 2 == 0 || num % 3 == 0)
return 0;

int limit = sqrt(num);
for (int i = 5; i <= limit; i = i + 6)
{
if (num % i == 0 || num % (i + 2) == 0)
return 0;
}
return 1;
}
#endif

int nextCandidatePrime(int p)
{
do {
p += 2;                     // the next odd number
} while (!isPrime(p));
return p;
}

void createPrimeTable(int limit)
{
int s = sqrt(limit);
for (int p = 3; p < s; p = nextCandidatePrime(p)) {
int end = limit - p;
for (int c = p * p; c < end; c += p) {
setComposite(c);
}
}
}

typedef struct {
unsigned int seed;
int limit;
} funcdata_st;

void *func(void *p)
{
funcdata_st *info = (funcdata_st *) p;
long local_sum = 0, local_counter = 0, count = 0;
for (int i = 0; i < info->limit; ++i) {
int random = rand_r(&info->seed); //!! NB - using rand_r !!
if (isPrime(random)) {
local_sum += random;
local_counter++;
}
}
sum += local_sum;
primeCounter += local_counter;
return NULL;
}

int main(int argc, char *argv[])
{
//make sure enough arguments were received
if (argc != 3) {
printf("Too few arguments ");
printf("USAGE: ./primeCalc <prime pivot> <num of random numbers>");
exit(0);
}
#ifdef USE_LOOKUP
createPrimeTable(INT_MAX);
printf("Table built\n");
#endif

long number_of_processors = sysconf(_SC_NPROCESSORS_ONLN);
int randomPivot = atoi(argv[1]);
int numOfRandomNumbers = atoi(argv[2]);

funcdata_st info[number_of_processors];

for (int i = 0; i < number_of_processors; ++i) {
info[i].seed = randomPivot + i; // makes no sense for all threads to generate the same sequence
info[i].limit = numOfRandomNumbers / number_of_processors;
if (pthread_create(&threads[i], NULL, func, &info[i]) != 0) {
fprintf(stderr, "error: Cannot create thread # %d\n", i);
break;
}
}
for (int i = 0; i < number_of_processors; ++i) {
fprintf(stderr, "error: Cannot join thread # %d\n", i);
}
}

printf("%ld,%ld\n", sum, primeCounter);
}
On my machine, making the sieve for all 32-bit ints takes about 9 seconds.
It might be thread-able if it used bytes rather than bits to store whether a number was prime or not.

Using the lookup table approach is useless for 1M primes.
Code:
\$ gcc -DUSE_LOOKUP -O2 -g foo.c -lm -pthread
\$ time ./a.out 7 1000000
Table built
51378394789491,49097

real	0m9.152s
user	0m9.156s
sys	0m0.024s

\$ gcc -O2 -g foo.c -lm -pthread
\$ time ./a.out 7 1000000
51378394789491,49097

real	0m0.400s
user	0m3.076s
sys	0m0.000s
However, it really comes into it's own at 100M primes.
Code:
\$ gcc -DUSE_LOOKUP -O2 -g foo.c -lm -pthread
\$ time ./a.out 7 100000000
Table built
5114402999725783,4891795

real	0m9.884s
user	0m15.204s
sys	0m0.016s

\$ gcc -O2 -g foo.c -lm -pthread
\$ time ./a.out 7 100000000
5114362197774199,4891775

real	0m33.602s
user	4m12.964s
sys	0m0.092s

5. Originally Posted by Salem
Well that was interesting.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <math.h>
#include <unistd.h>

#define TABLE_SIZE      ((INT_MAX/CHAR_BIT/sizeof(int)/2)+1)
#define BITS_PER_ENTRY  (sizeof(int)*CHAR_BIT)

// Halve the length of the table by only storing odd results
// 2 is a special case.
// If the bit is clear, it's a prime
unsigned int table[TABLE_SIZE] = { 0 };

long sum = 0;
long primeCounter = 0;

// map 1,3,5,7,9 etc to 0,1,2,3,4
int valueToIndex(int value)
{
int result = value / 2;
return result;
}

void setComposite(int composite)
{
if (composite % 2 == 0)
return;
int value = valueToIndex(composite);
int pos = value / BITS_PER_ENTRY;
int bit = value % BITS_PER_ENTRY;
unsigned int mask = 1u << bit;
}

#ifdef USE_LOOKUP
int isPrime(int number)
{
if (number <= 2)
return 1;
if (number % 2 == 0)
return 0;
int value = valueToIndex(number);
int pos = value / BITS_PER_ENTRY;
int bit = value % BITS_PER_ENTRY;
unsigned int mask = 1u << bit;
return (table[pos] & mask) == 0;
}
#else
int isPrime(int num)
{
if (num <= 1)
return 0;
if (num <= 3)
return 1;
if (num % 2 == 0 || num % 3 == 0)
return 0;

int limit = sqrt(num);
for (int i = 5; i <= limit; i = i + 6)
{
if (num % i == 0 || num % (i + 2) == 0)
return 0;
}
return 1;
}
#endif

int nextCandidatePrime(int p)
{
do {
p += 2;                     // the next odd number
} while (!isPrime(p));
return p;
}

void createPrimeTable(int limit)
{
int s = sqrt(limit);
for (int p = 3; p < s; p = nextCandidatePrime(p)) {
int end = limit - p;
for (int c = p * p; c < end; c += p) {
setComposite(c);
}
}
}

typedef struct {
unsigned int seed;
int limit;
} funcdata_st;

void *func(void *p)
{
funcdata_st *info = (funcdata_st *) p;
long local_sum = 0, local_counter = 0, count = 0;
for (int i = 0; i < info->limit; ++i) {
int random = rand_r(&info->seed); //!! NB - using rand_r !!
if (isPrime(random)) {
local_sum += random;
local_counter++;
}
}
sum += local_sum;
primeCounter += local_counter;
return NULL;
}

int main(int argc, char *argv[])
{
//make sure enough arguments were received
if (argc != 3) {
printf("Too few arguments ");
printf("USAGE: ./primeCalc <prime pivot> <num of random numbers>");
exit(0);
}
#ifdef USE_LOOKUP
createPrimeTable(INT_MAX);
printf("Table built\n");
#endif

long number_of_processors = sysconf(_SC_NPROCESSORS_ONLN);
int randomPivot = atoi(argv[1]);
int numOfRandomNumbers = atoi(argv[2]);

funcdata_st info[number_of_processors];

for (int i = 0; i < number_of_processors; ++i) {
info[i].seed = randomPivot + i; // makes no sense for all threads to generate the same sequence
info[i].limit = numOfRandomNumbers / number_of_processors;
if (pthread_create(&threads[i], NULL, func, &info[i]) != 0) {
fprintf(stderr, "error: Cannot create thread # %d\n", i);
break;
}
}
for (int i = 0; i < number_of_processors; ++i) {
fprintf(stderr, "error: Cannot join thread # %d\n", i);
}
}

printf("%ld,%ld\n", sum, primeCounter);
}
On my machine, making the sieve for all 32-bit ints takes about 9 seconds.
It might be thread-able if it used bytes rather than bits to store whether a number was prime or not.

Using the lookup table approach is useless for 1M primes.
Code:
\$ gcc -DUSE_LOOKUP -O2 -g foo.c -lm -pthread
\$ time ./a.out 7 1000000
Table built
51378394789491,49097

real    0m9.152s
user    0m9.156s
sys    0m0.024s

\$ gcc -O2 -g foo.c -lm -pthread
\$ time ./a.out 7 1000000
51378394789491,49097

real    0m0.400s
user    0m3.076s
sys    0m0.000s
However, it really comes into it's own at 100M primes.
Code:
\$ gcc -DUSE_LOOKUP -O2 -g foo.c -lm -pthread
\$ time ./a.out 7 100000000
Table built
5114402999725783,4891795

real    0m9.884s
user    0m15.204s
sys    0m0.016s

\$ gcc -O2 -g foo.c -lm -pthread
\$ time ./a.out 7 100000000
5114362197774199,4891775

real    0m33.602s
user    4m12.964s
sys    0m0.092s
Thank you! This indeed matches my professor's code,
which feels out of my current level. But it helped me understand better how threads work.

6. For the sieve version, have you tried putting the table generator in a separate thread, and allowing the primes to wait on generation? Actually, maybe several threads? (If you're using bool entries you don't need to synchronize, since it's always storing a 1. If you're using a bit-array, you'd need to synchronize, which might make things too slow.)

On the generator side, maybe add the candidates to a min-heap or something, one per nthreads, and let the threads keep checking the lowest candidate value against the progress made on prime generation. After all, you can short-circuit the test whenever you get a positive result.

7. Must you use rand or are you free to use own inlined random generator?
And are you allowed write 64bit version?

Salem,I made some prime number code that stores a zero int in array when it finds a prime,could that work with multiple threads?

Popular pages Recent additions