Multithreading prime calculations

This is a discussion on Multithreading prime calculations within the C Programming forums, part of the General Programming Boards category; Hi, Currently I am playing with multithreading a bit. I use POSIX threads (header file: pthread.h). I quickly read some ...

  1. #1
    C / C++
    Join Date
    Jan 2006
    Location
    The Netherlands
    Posts
    312

    Multithreading prime calculations

    Hi,

    Currently I am playing with multithreading a bit. I use POSIX threads (header file: pthread.h). I quickly read some tutorials and I wanted to apply what I had learned to a prime calculation program. The user inputs number of threads and maximum number to calculate. This works perfectly, no compiler errors or warnings and no segmentation faults. The prime numbers are correct and in the right order (I roughly checked up to 100).

    However, when I am benchmarking I get longer times when using more threads.

    I think this is the problem area:

    Code:
    for (i = 3; i < primes; i += threads+1) // +1 compensates for isPrime calculation in parent
        {
            for (j = 0; j < threads; ++j)
            {
                temp[j] = i + j;
                pthread_create(&tid[j], NULL, (void *)isPrime, &temp[j]);
            }
    
            *tempParent = i + threads;
            isPrime((unsigned int *)tempParent);
    
            for (j = 0; j < threads; ++j)
            {
                ret[j] = pthread_join(tid[j], NULL);
    
                if(temp[j] != 0)
                {
                    printf ("%d\n", temp[j]);
                }
            }
    
            if(*tempParent != 0)
            {
                printf ("%d\n", *tempParent);
            }
        }
    temp, parentTemp, ret, tid are allocated using malloc(). The isPrime(unsigned int *n) function sets the value of the pointer passed to it to zero if *n isn't a prime number.

    I want the threads and the parent to isPrime() at the same time, but that doesn't happen, because there is no speed gain?

    I calculated up to 1,000,000, roughly benchmarking with time_t and time() in seconds. With 1 thread compared to 5 threads, 5 threads is seconds slower!

    Thanks
    Operating Systems:
    - Ubuntu 9.04
    - XP

    Compiler: gcc

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    What procesor are you using (how many cores does it have), and what else is running on the machine?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Why would it be any faster? Seriously. Instead of burning all of your CPU time running division calculations, now you're splitting it across threads, adding that overhead. I'm not really seeing the advantage of this being threaded.


    Quzah.
    Hope is the first step on the road to disappointment.

  4. #4
    C / C++
    Join Date
    Jan 2006
    Location
    The Netherlands
    Posts
    312
    @matsp:
    My processor: Intel C2D E6550 (@ 2.3Ghz). So 2 cores (ubuntu detects 2).
    What do you mean what else? :P I measure a few percent of other programs (Netbeans IDE, System Monitor). I am testing this on Ubuntu 8.10.

    @quzah:
    Maybe you're right it can't go any faster with threads. I forgot to mention that my CPU load doesn't go over 20% when using x threads.

    With my unthreaded program to calculate primes, it goes to 50% immediately (1 core 100%).

    So multithreading should be used for different purposes? Not CPU-heavy calculating?
    Operating Systems:
    - Ubuntu 9.04
    - XP

    Compiler: gcc

  5. #5
    The larch
    Join Date
    May 2006
    Posts
    3,573
    If you can use two cores, 2 threads should be faster than one. But with more threads you are just doing more work than necessary.

    I think multithreading is great for CPU-heavy things (if you don't have syncronization issues) but you shouldn't have more threads than you have cores.

    However, when you are just listing primes, a great optimization would be to use results that you have already found (isPrime could do a trial division only with primes found so far) and that is probably harder in multithreading.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  6. #6
    C / C++
    Join Date
    Jan 2006
    Location
    The Netherlands
    Posts
    312
    Ok that's clear. I know about that trick to use existing primes to calculate a next one, that's a nice one .

    But for example if you want to save to a file, output to the screen and do some calculations at the same time threads are also used?
    Operating Systems:
    - Ubuntu 9.04
    - XP

    Compiler: gcc

  7. #7
    Maz
    Maz is offline
    Registered User Maz's Avatar
    Join Date
    Nov 2005
    Location
    Finland
    Posts
    194
    Generally threads are used when we have some blocking things but wish the other parts of a program would still run. But as a rule of thumb, further you can go without threads, easier will your life be - or that's my personal feeling...

    And as is said here, when you use more threads than you have CPU's/cores, context switching will cause overhead. So will synchronizations where threads wait till other threads complete (also in multiprocessor environment).

    If you really want to squeeze all out of your machine, try changing scheduler to SCHED_FIFO, increase priority, and use set_affinity to sit one thread on core1, and the other thread to core2.

    Just note that with SCHED_FIFO, no processes with lower or same priority will not get scheduled, unless your program calls yield. So if you have tight loop with bare calculations/memory accesses, and you set the priority higher than shell's priority for example... Well, you'r shell wont get to run. So beware with sched fifo. You may also want to see SCHED_RR, which will (eventually) let other tasks to gain some time.

    functions to check are

    sched_setaffinity()
    pthread_attr_setschedpolicy()
    pthread_setschedprio()

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,305
    Nobody seems to have picked up on the problem yet.
    Your program is spending most of the time waiting for threads to finish...

    Most numbers are deterined to be composite very quickly. The number of division tests necessary to determine the primeness of a number is basically kinda logarithmic. One third of numbers are divisible by three and one fifth of numbers are divisible by five etc..
    Most of the time a number will be found to be composite after only a few divisions, some will take just a few more, fewer still will require even more than that, and very very few will require even more than that. You get the picture.

    What you've got will actually get worse as you add more threads, even if you had a lot more cores than you actually do.
    Any time that one thread determines that a number is composite much faster than any of the other threads, there is time wasted waiting for other threads to finish.

    E.g. say 4 threads (including this thread):
    On a particular round say Main thread takes 0.5ms
    Second thread takes 0.1ms
    Third thread takes 2.5ms
    Fourth thread takes 0.1ms
    So, how much time does it take for those four prime tests? Best case assume perfect parallelisation = 2.5ms
    There were four threads, so that's 2.5ms*4 = 10ms worth of work that could be done if there was full utilisation on four cores. Actual time spent doing work = sum of each thread = 0.5+0.1+2.5+0.1 = 3.2ms
    So, 3.2 out of 10ms hmm that's 32% utilisation - ouch!
    Now the interesting thing to note is that it actually doesn't matter which thread takes the longest - same result.

    Due to the fact that threads will often take very different amounts of time and yet you don't restart any of them until they have all finished, yes you will get very poor performance indeed, compared to just using one thread.

    Not only that, but the threads being created often have less work to do than what it takes to create the thread to begin with. The threading overhead is the last nail in the coffin for this piece of code.

    If you're going to use threads, you must give them a decent amount of work to do for their lifetime. You need a total redesign. Instead of giving it one number to determine the primeness of each time, give it a batch of say 5000 or more numbers at a time to determine the primeness of. Using locking, each thread can do the chunk of work, store it's results, pick off the next available batch of numbers to test, and begin processing them immediately. You may have to do more to sort the resulting primes output, but you'll still get something that is way faster.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    iMalc: 10 points for a good explanation. I was thinking that doing one prime at a time would be a lot of overhead, but I didn't actually pick up on the fact that most potential primes are quickly found to be non-primes (unless we pre-pick likely primpes, e.g 2^n-1).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,305
    Oh and if you didn't catch it: Create all threads at the start of the prime testing, and they should run right up until there are no more batches of numbers remaining to be tested.

    On top of that, if you're serious about speed, you might need a better algorithm. Other algorithms could outperform what you're using without them having to do any threading, even if you fully utilise four cores.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #11
    C / C++
    Join Date
    Jan 2006
    Location
    The Netherlands
    Posts
    312
    Thanks for all the explanations .
    Firstly I'm going to make my prime calculation algorithm more efficient. Then I can go speed it up with an extra thread using your tips.
    Operating Systems:
    - Ubuntu 9.04
    - XP

    Compiler: gcc

  12. #12
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,189
    For trivial primes (64 bit or smaller) using the block workunit method gives the best results. As stated, ~1/3 of all composite numbers will be found to be composite by division by 3. If you assign a single prime as a workunit, the overhead in obtaining the lock will far outweigh the actual cost of proving the number composite. Under windows, obtaining a critical section takes approximately 20 processor cycles on a P4. That will completely swamp most primality tests against a single factor. Assigning a large block of numbers, e.g. 65536, to a thread at once, you minimize the time that the threads are waiting for the lock, and in fact most of the time they will not be waiting as they will acquire the lock as soon as requested.
    Until you can build a working general purpose reprogrammable computer out of basic components from radio shack, you are not fit to call yourself a programmer in my presence. This is cwhizard, signing off.

  13. #13
    C / C++
    Join Date
    Jan 2006
    Location
    The Netherlands
    Posts
    312
    Thank you, I'll keep that in mind. And I'm developing on Ubuntu currently
    Operating Systems:
    - Ubuntu 9.04
    - XP

    Compiler: gcc

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Calculating prime factor of a huge number.
    By Bakster in forum C Programming
    Replies: 15
    Last Post: 02-20-2009, 11:06 AM
  2. prime number program with function
    By mackieinva in forum C Programming
    Replies: 17
    Last Post: 09-20-2007, 08:36 AM
  3. prime numbers, counters, help!
    By vege^ in forum C++ Programming
    Replies: 1
    Last Post: 03-10-2003, 03:32 PM
  4. Prime Wonder
    By vasanth in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 10-07-2002, 11:49 PM
  5. Homework help
    By Jigsaw in forum C++ Programming
    Replies: 2
    Last Post: 03-06-2002, 04:56 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21