Using multiThreads to speed up program

**mbn** · 05-05-2021

Hello, I am new to this forum and also a beginner in C.
I was asked to write a program where I need to generate X random numbers and determine if they are prime numbers. The catch is, I should make the program run as fast as I can using threads.

I was hoping you could look at my code and give me advices on how to improve it. For some reason, it doesn't seem like the multi-threading is helping to make the program run faster. ( I tried running the code without threads VS with threads)

SUMMARY OF CODE:
created two functions:
isPrime() - determine if an integer is prime or not
func() - this function i will send to the thread

everytime i find a prime number I increment the prime count;

created an array of 5 threads (not really sure if I should use more or less)

Code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#include <math.h>

#define NUM_THREADS 5
#define run_n 1000000


pthread_mutex_t lock;
long sum = 0;
long primeCounter = 0;
int count = 0;


int isPrime(int num)
{
    if (num <= 1)
        return 0;
    if (num <= 3)

        return 1;
    if (num % 2 == 0 || num % 3 == 0)

        return 0;
    for (int i = 5; i <= sqrt(num); i = i + 6)

    {
        if (num % i == 0 || num % (i + 2) == 0)

            return 0;
    }
    return 1;
}


void *func(void *var)
{
    for (int i = 0; i < run_n / NUM_THREADS; ++i)


    {
        int random = rand();
        if (isPrime(random))
        {
            sum += random;

            primeCounter++;
        }
        pthread_mutex_lock(&lock);

        count++;

        if (count == run_n)

            return 0;

        pthread_mutex_unlock(&lock);
    }
    return NULL;
}


int main(int argc, char *argv[])


{
    pthread_t tid;

    if (argc != 3)

    {
printf("Too few arguments ");
printf("USAGE: ./primeCalc <prime pivot> <num of random numbers>");
        exit(0);
    }

    int randomPivot = atoi(argv[1]);
    int random;


    pthread_t threads[NUM_THREADS];
    srand(randomPivot);

    for (int i = 0; i < NUM_THREADS; ++i)
    {

        if (pthread_create(&threads[i], NULL, &func, (void *)&randomPivot) != 0)

        {

            fprintf(stderr, "error: Cannot create thread # %d\n", i);

            break;
        }

        else

        {
            printf("thread %d was created\n", i);
        }
    }

    for (int i = 0; i < NUM_THREADS; ++i)

    {

        if (pthread_join(threads[i], NULL) != 0)

        {

            fprintf(stderr, "error: Cannot join thread # %d\n", i);

        }
    }
    pthread_mutex_destroy(&lock);
    printf("%ld,%ld\n", sum, primeCounter);
    pthread_exit(NULL);
    exit(0);
}

I am running this code on Linux , having 4 CPU(s), threads-per-core is 1

**Salem** · 05-05-2021

> for (int i = 5; i <= sqrt(num); i = i + 6)
Precalculate the square root, you're doing this every time around the loop.

> for (int i = 0; i < run_n / NUM_THREADS; ++i)
Not sure why you're not using your var parameter here.

> int random = rand();
1. Beware of the possibly limited range of rand(). What is RAND_MAX on your system?
2. Use rand_r() in a multi-threaded program. rand(3): pseudo-random number generator - Linux man page

> if (count == run_n)
> return 0;
Yeah, you return with the mutex still locked. This isn't good.
It doesn't actually seem to serve any purpose though, as only the last thread on the last iteration would hit this condition, and the loop is going to end anyway.

> sum += random;
> primeCounter++;
You print both of these, but neither are protected by the mutex.

> #define run_n 1000000
Possibly too small for you to notice a difference.
Creating threads isn't free, and your mutex on every iteration in every thread is going to sap performance.

Code:

void *func(void *var)
{
    long local_sum = 0, local_counter = 0;
    unsigned int seed = 1;  //!! use your var in some way
    for (int i = 0; i < run_n / NUM_THREADS; ++i)
    {
        int random = rand_r(&seed);
        if (isPrime(random))
        {
            local_sum += random;
            local_counter++;
        }
    }
    pthread_mutex_lock(&lock);
    sum += local_sum;
    primeCounter += local_counter;
    pthread_mutex_unlock(&lock);
    return NULL;
}

**I C everything** · 05-05-2021

simplest is use secondary thread taking care of slower print to screen or file
while first thread finds primes and outputs to one buffer,the secondary thread outputs from another buffer,that earlier is filled with primes

4 threads output to 4 buffers simultanously?

**mbn** · 05-05-2021

Originally Posted by Salem

> for (int i = 5; i <= sqrt(num); i = i + 6)
Precalculate the square root, you're doing this every time around the loop.

> for (int i = 0; i < run_n / NUM_THREADS; ++i)
Not sure why you're not using your var parameter here.

> int random = rand();
1. Beware of the possibly limited range of rand(). What is RAND_MAX on your system?
2. Use rand_r() in a multi-threaded program. rand(3): pseudo-random number generator - Linux man page

> if (count == run_n)
> return 0;
Yeah, you return with the mutex still locked. This isn't good.
It doesn't actually seem to serve any purpose though, as only the last thread on the last iteration would hit this condition, and the loop is going to end anyway.

> sum += random;
> primeCounter++;
You print both of these, but neither are protected by the mutex.

> #define run_n 1000000
Possibly too small for you to notice a difference.
Creating threads isn't free, and your mutex on every iteration in every thread is going to sap performance.

Code:

void *func(void *var)
{
    long local_sum = 0, local_counter = 0;
    unsigned int seed = 1;  //!! use your var in some way
    for (int i = 0; i < run_n / NUM_THREADS; ++i)
    {
        int random = rand_r(&seed);
        if (isPrime(random))
        {
            local_sum += random;
            local_counter++;
        }
    }
    pthread_mutex_lock(&lock);
    sum += local_sum;
    primeCounter += local_counter;
    pthread_mutex_unlock(&lock);
    return NULL;
}

Thank you very much! I fixed my code accordingly and now it works great.

In theory, is there a way to do improve the current results?
As of now I only used threads and mutex .

**Salem** · 05-05-2021

What are you trying to show?

**mbn** · 05-05-2021

Originally Posted by Salem

What are you trying to show?

a course challenge

**Salem** · 05-05-2021

That's a why, not a what.

If you want more specific help, post the text of the challenge question.

**mbn** · 05-05-2021

Originally Posted by Salem

That's a why, not a what.

If you want more specific help, post the text of the challenge question.

The challenge is simple, given a number X, create X "random" numbers
and count how many of them are prime numbers && sum the totals of primes that we found.
Do this at minimum time.

The X numbers are
10,000,000 with seed 6,7,9
100,000,000 with seed 12
(using the rand function, making sure the numbers are not really random)

for comparison reasons we were given an executable file that holds an efficient code (we can't see the code). we are supposed to try and make an even more efficient program.

hope I was able to explain the assignment !

**aghast** · 05-05-2021

The rand() function typically returns 16 or 32-bit numbers. Those numbers are within the memory capacity of most modern PCs.

You're asked to generate 10E6 or 100E6 numbers at random to check if they are prime. Your current isPrime() function loops over integers up to the sqrt() of the number, making the function O(sqrt N) and making your entire program at least O(N sqrt N).

If you switch to an O(1) version of isPrime() you might get some good performace results. Try using a sieve approach in your isPrime to see if that improves things.

On the other hand, starting threads takes time. So I suspect you should start as few threads as possible, especially considering that you're trying to speed up isPrime() to O(1). So I think you should build yourself a timing framework -- something that starts counting time, then runs the program, then stops counting time and subtracts the end - start time. You'll want to test both large and small numbers of threads (1, 2, 100). I think your best result might be either 1 (no threads other than main) or 2 (sieve, random), or maybe 2 * numcores + 1.

**Salem** · 05-05-2021

> for comparison reasons we were given an executable file that holds an efficient code (we can't see the code).
> we are supposed to try and make an even more efficient program.
How do you compare so far?
Are you comparing like-for-like "debug" / "release" builds?

100M numbers is only 5% of the range of a 32-bit RAND_MAX.
A pre-computed sieve table is going to be a couple of GB in size with very low utilisation and pretty much a cache miss for every lookup.

**I C everything** · 05-07-2021

Originally Posted by mbn

The challenge is simple, given a number X, create X "random" numbers
and count how many of them are prime numbers && sum the totals of primes that we found.
Do this at minimum time.

The X numbers are
10,000,000 with seed 6,7,9
100,000,000 with seed 12
(using the rand function, making sure the numbers are not really random)

for comparison reasons we were given an executable file that holds an efficient code (we can't see the code). we are supposed to try and make an even more efficient program.

hope I was able to explain the assignment !

ok,that explains it,my experience is from similar prime numbers testing ,printing from 1 to max,found out printing together with inttoascii is slowest,so I put this in separate thread and option to write to file

**mbn** · 05-09-2021

Originally Posted by Salem

> for comparison reasons we were given an executable file that holds an efficient code (we can't see the code).
> we are supposed to try and make an even more efficient program.
How do you compare so far?
Are you comparing like-for-like "debug" / "release" builds?

100M numbers is only 5% of the range of a 32-bit RAND_MAX.
A pre-computed sieve table is going to be a couple of GB in size with very low utilisation and pretty much a cache miss for every lookup.

Hi again,
can you further explain the second paragraph ?
If I make a pre-computed sieve table, isn't that cheating? (guessing you meant that i should compute it, and then run my algorithm with O(1) complexity of finding out if a num is prime or not.

another question that I have, it there any way of creating threads and minimizing context-switch delay?

also, if I was to use the sieve algorithm, which number should I insert? which num would be my N assuming rand(55) is the max random seed in this assignment?

btw, I appreciate the replies, very useful!

**Salem** · 05-09-2021

> 100M numbers is only 5% of the range of a 32-bit RAND_MAX.
RAND_MAX is say 2147483647, which is 20x bigger than 100M, hence 5%

> If I make a pre-computed sieve table, isn't that cheating?
Only if your assignment specifically states 'do not use a pre-computed' table.
Besides, you wanted efficiency.
A table only makes sense if you're going to be asking 'is this prime' for the same number on multiple occasions.

> another question that I have, it there any way of creating threads and minimizing context-switch delay?
Yeah, maximise the amount of work each thread does.

> also, if I was to use the sieve algorithm, which number should I insert?
You insert all the primes.

> which num would be my N assuming rand(55) is the max random seed in this assignment?
The seed value makes NO difference to the maximum value that rand() can return, which is always going to be RAND_MAX.
Which in turn has no bearing on whether a given number happens to be prime or not.

**mbn** · 05-09-2021

Originally Posted by Salem

> 100M numbers is only 5% of the range of a 32-bit RAND_MAX.
RAND_MAX is say 2147483647, which is 20x bigger than 100M, hence 5%

> If I make a pre-computed sieve table, isn't that cheating?
Only if your assignment specifically states 'do not use a pre-computed' table.
Besides, you wanted efficiency.
A table only makes sense if you're going to be asking 'is this prime' for the same number on multiple occasions.

> another question that I have, it there any way of creating threads and minimizing context-switch delay?
Yeah, maximise the amount of work each thread does.

> also, if I was to use the sieve algorithm, which number should I insert?
You insert all the primes.

> which num would be my N assuming rand(55) is the max random seed in this assignment?
The seed value makes NO difference to the maximum value that rand() can return, which is always going to be RAND_MAX.
Which in turn has no bearing on whether a given number happens to be prime or not.

Thank you again.
unfortunately from what I gathered , the Sieve of Earthstones algorithms uses an array, where N is the largest number that we can get. the rand function creates numbers greater than 1,000,000,000 so I can't use it. (If I understood the algorithm correctly).

But you are right about needing to create a table that stores the prime that I have found so far so I won't make the algorithm run twice over the same input.

Trying to find a hash function in C that will help me extract that infromation in O(1) complexity.

**Salem** · 05-09-2021

You only need 1 bit per random number (you're only storing a true/false).
So the whole RAND_MAX range would only need 256MB of memory.
But the lookup is O(1) when you're done.

By contrast, an array of just all the known primes upto RAND_MAX would contain approx 100M entries.
Stored as 4-byte integers, this would come to 400MB.
Since it is ordered, bsearch() is possible (in ~26 iterations per lookup) (it's O(log₂(N))

Thread: Using multiThreads to speed up program

Thread Tools

Search Thread

Display

Using multiThreads to speed up program

Similar Threads

Run a program on multithreads and compare speed!

Multithreads for windows

Completion Port and Multithreads :: MFC

multithreads

Multithreads & Pointer to bool :: MFC

Tags for this Thread