Thread: C & MPI, Number of computer threads

  1. #1
    Registered User
    Join Date
    Oct 2012
    Posts
    13

    Question C & MPI, Number of computer threads

    Hello guys

    I work on a small parallel program, which calculates pi number from area of a quarter circle. I'm new on ubuntu & C programming.

    This is the code.

    Code:
    #include <stdio.h>
    #include <mpi.h>
    #include <math.h>
    #include <time.h>
    
    #define Ndx 600000000
    
    int main (int argc, char* argv[])
    
    {    
        int labindex, nw;
        MPI_Init (&argc, &argv);                  /* starts MPI */
        MPI_Comm_rank (MPI_COMM_WORLD, &labindex);      /* get current process id */
        MPI_Comm_size (MPI_COMM_WORLD, &nw);            /* get number of processes */
    
    
        int i;
        double dx, x, y, sum, pi;
        double range_start;
        double te, te1;
    
    
        dx=1./Ndx;
        range_start=labindex*(1./nw);
        sum=0;
        for(i=0;i<Ndx/nw;i++)
        {
            x=range_start+i*dx+dx/2;
            y=sqrt(1-pow(x,2));
            sum=sum+dx*y;
        }
    
       MPI_Reduce(&sum,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);
        pi=pi*4; //quarter to full circle
        
        if (labindex==0)
        {    
            te1=clock();
            te=((float)te1)/CLOCKS_PER_SEC;
    
    
            printf("pi=%.20f\n",pi);
            printf("difference is=%.20f\n",pi-acos(-1));    
            printf("total time elapsed %f s",te);
        }
    
        MPI_Finalize();
    
        return 0;
    
    }
    The question is:
    My laptop has i7 processor with hyper-threading. And as a guess I should be able to use at most 8 workers/processors when running the program.You may see the outputs of the program below. Strange thing is I'm able to run the code with 80 workers. And it seems quite fast. How can this be?

    Thank you

    Code:
    mpicc pi.c -Wall -o pi -lm
    Code:
    mpirun -np 1 ./pi
    pi=3.14159265358961548031
    difference is=-0.00000000000017763568
    total time elapsed 19.639999

    Code:
    mpirun -np 2 ./pi
    pi=3.14159265358966033332
    difference is=-0.00000000000013278267
    total time elapsed 9.570000

    Code:
    mpirun -np 4 ./pi
    pi=3.14159265358984729488
    difference is=0.00000000000005417888
    total time elapsed 4.800000

    Code:
    mpirun -np 8 ./pi
    pi=3.14159265358979489235
    difference is=0.00000000000000177636
    total time elapsed 2.610000

    Code:
    mpirun -np 80 ./pi
    pi=3.14159265358977268789
    difference is=-0.00000000000002042810
    total time elapsed 0.430000

  2. #2
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Nooby 1 Kenooby View Post
    The question is:
    My laptop has i7 processor with hyper-threading. And as a guess I should be able to use at most 8 workers/processors when running the program.
    Really? You expect to be able to use at most 8 processes at the same time?

    No, that guess is completely bonkers, since Linux can run hundreds of processes even on very small embedded devices.

    Quote Originally Posted by Nooby 1 Kenooby View Post
    Strange thing is I'm able to run the code with 80 workers. And it seems quite fast. How can this be?
    Because the Linux kernel is efficient. The overhead of a process or thread is rather small, so you can have lots of them.

    Even if you have only a small number of CPU cores, you can have thousands of threads, and not lose much computational efficiency. In fact, the actual limitation tends to be memory use.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > Strange thing is I'm able to run the code with 80 workers. And it seems quite fast. How can this be?
    Now did the prompt really return in less than half a second?
    Or did you wait for several seconds for this small number to be printed.

    man page clock section 3
    You might want to change your timing code. clock() measures CPU time, not elapsed time as measured by a clock on the wall (or your watch). It seems to be measuring the time taken for a single MPI thread, not the whole process.

    You could try using the time command as follows
    Code:
    time mpirun -np 1 ./pi
    which will report the real time, user time and system time of the process.

    You should see a sweet spot close to the number of actual cores on your processor (assuming MPI makes good use of them). Much beyond that, you'll see the total time increasing again as you get hit by more and more swapping activity.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Salem View Post
    It seems to be measuring the time taken for a single MPI thread, not the whole process.
    Salem, it won't change much, because the MPI_Reduce() call acts like a barrier; the root worker (labindex==0) is the one measured and does the most work. Of course, measuring the CPU time taken by one worker is not really relevant anyway.

    The amount of work done is constant, 600 million iterations of the innermost for loop. The larger the number of workers, the less the iterations each worker does. The MPI_Reduce() call collects one double from each process/thread, sums them together, and saves the result to the root worker (labindex==0), which outputs the result.

    The entire computation uses so little memory it will fit entirely in the cache. In this case, the overhead is really just the thread/process overhead and context switch time (plus the startup overhead, and MPI overhead). It does not amount to that much.

    That said,
    Quote Originally Posted by Salem View Post
    You could try using the time command as follows
    Code:
    time mpirun -np 1 ./pi
    which will report the real time, user time and system time of the process.
    I fully agree, and recommend the same.

    Especially with parallel/distributed programs, the wall clock (real time) is the most important one to measure.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Game: Computer Guess My Number
    By Laythe in forum C++ Programming
    Replies: 1
    Last Post: 03-31-2012, 11:25 PM
  2. Replies: 4
    Last Post: 11-05-2009, 11:06 PM
  3. Computer guessing a number
    By Sembhi in forum C++ Programming
    Replies: 1
    Last Post: 12-10-2007, 05:09 PM
  4. a point of threads to create multiple threads
    By v3dant in forum C Programming
    Replies: 3
    Last Post: 10-06-2004, 09:48 AM
  5. Replies: 23
    Last Post: 01-31-2003, 03:13 AM

Tags for this Thread