Thread: Using multiThreads to speed up program

  1. #1
    Registered User
    Join Date
    May 2021
    Posts
    9

    Using multiThreads to speed up program

    Hello, I am new to this forum and also a beginner in C.
    I was asked to write a program where I need to generate X random numbers and determine if they are prime numbers. The catch is, I should make the program run as fast as I can using threads.

    I was hoping you could look at my code and give me advices on how to improve it. For some reason, it doesn't seem like the multi-threading is helping to make the program run faster. ( I tried running the code without threads VS with threads)

    SUMMARY OF CODE:
    created two functions:
    isPrime() - determine if an integer is prime or not
    func() - this function i will send to the thread

    everytime i find a prime number I increment the prime count;

    created an array of 5 threads (not really sure if I should use more or less)


    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <pthread.h>
    #include <math.h>
    
    #define NUM_THREADS 5
    #define run_n 1000000
    
    
    pthread_mutex_t lock;
    long sum = 0;
    long primeCounter = 0;
    int count = 0;
    
    
    int isPrime(int num)
    {
        if (num <= 1)
            return 0;
        if (num <= 3)
    
            return 1;
        if (num % 2 == 0 || num % 3 == 0)
    
            return 0;
        for (int i = 5; i <= sqrt(num); i = i + 6)
    
        {
            if (num % i == 0 || num % (i + 2) == 0)
    
                return 0;
        }
        return 1;
    }
    
    
    void *func(void *var)
    {
        for (int i = 0; i < run_n / NUM_THREADS; ++i)
    
    
        {
            int random = rand();
            if (isPrime(random))
            {
                sum += random;
    
                primeCounter++;
            }
            pthread_mutex_lock(&lock);
    
            count++;
    
            if (count == run_n)
    
                return 0;
    
            pthread_mutex_unlock(&lock);
        }
        return NULL;
    }
    
    
    int main(int argc, char *argv[])
    
    
    {
        pthread_t tid;
    
        if (argc != 3)
    
        {
    printf("Too few arguments ");
    printf("USAGE: ./primeCalc <prime pivot> <num of random numbers>");
            exit(0);
        }
    
        int randomPivot = atoi(argv[1]);
        int random;
    
    
        pthread_t threads[NUM_THREADS];
        srand(randomPivot);
    
        for (int i = 0; i < NUM_THREADS; ++i)
        {
    
            if (pthread_create(&threads[i], NULL, &func, (void *)&randomPivot) != 0)
    
            {
    
                fprintf(stderr, "error: Cannot create thread # %d\n", i);
    
                break;
            }
    
            else
    
            {
                printf("thread %d was created\n", i);
            }
        }
    
        for (int i = 0; i < NUM_THREADS; ++i)
    
        {
    
            if (pthread_join(threads[i], NULL) != 0)
    
            {
    
                fprintf(stderr, "error: Cannot join thread # %d\n", i);
    
            }
        }
        pthread_mutex_destroy(&lock);
        printf("%ld,%ld\n", sum, primeCounter);
        pthread_exit(NULL);
        exit(0);
    }
    I am running this code on Linux , having 4 CPU(s), threads-per-core is 1

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > for (int i = 5; i <= sqrt(num); i = i + 6)
    Precalculate the square root, you're doing this every time around the loop.

    > for (int i = 0; i < run_n / NUM_THREADS; ++i)
    Not sure why you're not using your var parameter here.

    > int random = rand();
    1. Beware of the possibly limited range of rand(). What is RAND_MAX on your system?
    2. Use rand_r() in a multi-threaded program. rand(3): pseudo-random number generator - Linux man page

    > if (count == run_n)
    > return 0;
    Yeah, you return with the mutex still locked. This isn't good.
    It doesn't actually seem to serve any purpose though, as only the last thread on the last iteration would hit this condition, and the loop is going to end anyway.


    > sum += random;
    > primeCounter++;
    You print both of these, but neither are protected by the mutex.

    > #define run_n 1000000
    Possibly too small for you to notice a difference.
    Creating threads isn't free, and your mutex on every iteration in every thread is going to sap performance.

    Code:
    void *func(void *var)
    {
        long local_sum = 0, local_counter = 0;
        unsigned int seed = 1;  //!! use your var in some way
        for (int i = 0; i < run_n / NUM_THREADS; ++i)
        {
            int random = rand_r(&seed);
            if (isPrime(random))
            {
                local_sum += random;
                local_counter++;
            }
        }
        pthread_mutex_lock(&lock);
        sum += local_sum;
        primeCounter += local_counter;
        pthread_mutex_unlock(&lock);
        return NULL;
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User I C everything's Avatar
    Join Date
    Apr 2019
    Posts
    101
    simplest is use secondary thread taking care of slower print to screen or file
    while first thread finds primes and outputs to one buffer,the secondary thread outputs from another buffer,that earlier is filled with primes

    4 threads output to 4 buffers simultanously?
    you tell me you can C,why dont you C your own bugs?

  4. #4
    Registered User
    Join Date
    May 2021
    Posts
    9
    Quote Originally Posted by Salem View Post
    > for (int i = 5; i <= sqrt(num); i = i + 6)
    Precalculate the square root, you're doing this every time around the loop.

    > for (int i = 0; i < run_n / NUM_THREADS; ++i)
    Not sure why you're not using your var parameter here.

    > int random = rand();
    1. Beware of the possibly limited range of rand(). What is RAND_MAX on your system?
    2. Use rand_r() in a multi-threaded program. rand(3): pseudo-random number generator - Linux man page

    > if (count == run_n)
    > return 0;
    Yeah, you return with the mutex still locked. This isn't good.
    It doesn't actually seem to serve any purpose though, as only the last thread on the last iteration would hit this condition, and the loop is going to end anyway.


    > sum += random;
    > primeCounter++;
    You print both of these, but neither are protected by the mutex.

    > #define run_n 1000000
    Possibly too small for you to notice a difference.
    Creating threads isn't free, and your mutex on every iteration in every thread is going to sap performance.

    Code:
    void *func(void *var)
    {
        long local_sum = 0, local_counter = 0;
        unsigned int seed = 1;  //!! use your var in some way
        for (int i = 0; i < run_n / NUM_THREADS; ++i)
        {
            int random = rand_r(&seed);
            if (isPrime(random))
            {
                local_sum += random;
                local_counter++;
            }
        }
        pthread_mutex_lock(&lock);
        sum += local_sum;
        primeCounter += local_counter;
        pthread_mutex_unlock(&lock);
        return NULL;
    }
    Thank you very much! I fixed my code accordingly and now it works great.

    In theory, is there a way to do improve the current results?
    As of now I only used threads and mutex .

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    What are you trying to show?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    May 2021
    Posts
    9
    Quote Originally Posted by Salem View Post
    What are you trying to show?
    a course challenge

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    That's a why, not a what.

    If you want more specific help, post the text of the challenge question.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Registered User
    Join Date
    May 2021
    Posts
    9

    Smile

    Quote Originally Posted by Salem View Post
    That's a why, not a what.

    If you want more specific help, post the text of the challenge question.
    The challenge is simple, given a number X, create X "random" numbers
    and count how many of them are prime numbers && sum the totals of primes that we found.
    Do this at minimum time.

    The X numbers are
    10,000,000 with seed 6,7,9
    100,000,000 with seed 12
    (using the rand function, making sure the numbers are not really random)

    for comparison reasons we were given an executable file that holds an efficient code (we can't see the code). we are supposed to try and make an even more efficient program.

    hope I was able to explain the assignment !

  9. #9
    Registered User
    Join Date
    Apr 2021
    Posts
    139
    The rand() function typically returns 16 or 32-bit numbers. Those numbers are within the memory capacity of most modern PCs.

    You're asked to generate 10E6 or 100E6 numbers at random to check if they are prime. Your current isPrime() function loops over integers up to the sqrt() of the number, making the function O(sqrt N) and making your entire program at least O(N sqrt N).

    If you switch to an O(1) version of isPrime() you might get some good performace results. Try using a sieve approach in your isPrime to see if that improves things.

    On the other hand, starting threads takes time. So I suspect you should start as few threads as possible, especially considering that you're trying to speed up isPrime() to O(1). So I think you should build yourself a timing framework -- something that starts counting time, then runs the program, then stops counting time and subtracts the end - start time. You'll want to test both large and small numbers of threads (1, 2, 100). I think your best result might be either 1 (no threads other than main) or 2 (sieve, random), or maybe 2 * numcores + 1.

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > for comparison reasons we were given an executable file that holds an efficient code (we can't see the code).
    > we are supposed to try and make an even more efficient program.
    How do you compare so far?
    Are you comparing like-for-like "debug" / "release" builds?

    100M numbers is only 5% of the range of a 32-bit RAND_MAX.
    A pre-computed sieve table is going to be a couple of GB in size with very low utilisation and pretty much a cache miss for every lookup.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Registered User I C everything's Avatar
    Join Date
    Apr 2019
    Posts
    101
    Quote Originally Posted by mbn View Post
    The challenge is simple, given a number X, create X "random" numbers
    and count how many of them are prime numbers && sum the totals of primes that we found.
    Do this at minimum time.

    The X numbers are
    10,000,000 with seed 6,7,9
    100,000,000 with seed 12
    (using the rand function, making sure the numbers are not really random)

    for comparison reasons we were given an executable file that holds an efficient code (we can't see the code). we are supposed to try and make an even more efficient program.

    hope I was able to explain the assignment !
    ok,that explains it,my experience is from similar prime numbers testing ,printing from 1 to max,found out printing together with inttoascii is slowest,so I put this in separate thread and option to write to file
    you tell me you can C,why dont you C your own bugs?

  12. #12
    Registered User
    Join Date
    May 2021
    Posts
    9
    Quote Originally Posted by Salem View Post
    > for comparison reasons we were given an executable file that holds an efficient code (we can't see the code).
    > we are supposed to try and make an even more efficient program.
    How do you compare so far?
    Are you comparing like-for-like "debug" / "release" builds?

    100M numbers is only 5% of the range of a 32-bit RAND_MAX.
    A pre-computed sieve table is going to be a couple of GB in size with very low utilisation and pretty much a cache miss for every lookup.
    Hi again,
    can you further explain the second paragraph ?
    If I make a pre-computed sieve table, isn't that cheating? (guessing you meant that i should compute it, and then run my algorithm with O(1) complexity of finding out if a num is prime or not.

    another question that I have, it there any way of creating threads and minimizing context-switch delay?

    also, if I was to use the sieve algorithm, which number should I insert? which num would be my N assuming rand(55) is the max random seed in this assignment?

    btw, I appreciate the replies, very useful!
    Last edited by mbn; 05-09-2021 at 01:25 AM.

  13. #13
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > 100M numbers is only 5% of the range of a 32-bit RAND_MAX.
    RAND_MAX is say 2147483647, which is 20x bigger than 100M, hence 5%

    > If I make a pre-computed sieve table, isn't that cheating?
    Only if your assignment specifically states 'do not use a pre-computed' table.
    Besides, you wanted efficiency.
    A table only makes sense if you're going to be asking 'is this prime' for the same number on multiple occasions.

    > another question that I have, it there any way of creating threads and minimizing context-switch delay?
    Yeah, maximise the amount of work each thread does.

    > also, if I was to use the sieve algorithm, which number should I insert?
    You insert all the primes.

    > which num would be my N assuming rand(55) is the max random seed in this assignment?
    The seed value makes NO difference to the maximum value that rand() can return, which is always going to be RAND_MAX.
    Which in turn has no bearing on whether a given number happens to be prime or not.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  14. #14
    Registered User
    Join Date
    May 2021
    Posts
    9
    Quote Originally Posted by Salem View Post
    > 100M numbers is only 5% of the range of a 32-bit RAND_MAX.
    RAND_MAX is say 2147483647, which is 20x bigger than 100M, hence 5%

    > If I make a pre-computed sieve table, isn't that cheating?
    Only if your assignment specifically states 'do not use a pre-computed' table.
    Besides, you wanted efficiency.
    A table only makes sense if you're going to be asking 'is this prime' for the same number on multiple occasions.

    > another question that I have, it there any way of creating threads and minimizing context-switch delay?
    Yeah, maximise the amount of work each thread does.

    > also, if I was to use the sieve algorithm, which number should I insert?
    You insert all the primes.

    > which num would be my N assuming rand(55) is the max random seed in this assignment?
    The seed value makes NO difference to the maximum value that rand() can return, which is always going to be RAND_MAX.
    Which in turn has no bearing on whether a given number happens to be prime or not.
    Thank you again.
    unfortunately from what I gathered , the Sieve of Earthstones algorithms uses an array, where N is the largest number that we can get. the rand function creates numbers greater than 1,000,000,000 so I can't use it. (If I understood the algorithm correctly).

    But you are right about needing to create a table that stores the prime that I have found so far so I won't make the algorithm run twice over the same input.

    Trying to find a hash function in C that will help me extract that infromation in O(1) complexity.

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    You only need 1 bit per random number (you're only storing a true/false).
    So the whole RAND_MAX range would only need 256MB of memory.
    But the lookup is O(1) when you're done.

    By contrast, an array of just all the known primes upto RAND_MAX would contain approx 100M entries.
    Stored as 4-byte integers, this would come to 400MB.
    Since it is ordered, bsearch() is possible (in ~26 iterations per lookup) (it's O(log2(N))
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Run a program on multithreads and compare speed!
    By lordvader in forum C Programming
    Replies: 5
    Last Post: 06-16-2020, 11:01 PM
  2. Multithreads for windows
    By Jablan in forum C Programming
    Replies: 3
    Last Post: 03-09-2014, 05:38 AM
  3. Completion Port and Multithreads :: MFC
    By kuphryn in forum Windows Programming
    Replies: 0
    Last Post: 11-06-2002, 11:37 PM
  4. multithreads
    By JagWire in forum C Programming
    Replies: 1
    Last Post: 06-28-2002, 11:22 AM
  5. Multithreads & Pointer to bool :: MFC
    By kuphryn in forum Windows Programming
    Replies: 7
    Last Post: 06-26-2002, 10:50 AM

Tags for this Thread