Thread: Bizarre MPI Behaviour

  1. #1
    Registered User
    Join Date
    Jun 2009
    Posts
    13

    Bizarre MPI Behaviour

    Hi, I'm writing an MPI program and part of this program just needs to do a simple dot product. To do this I wrote the following code:

    Code:
    float innerProduct(float *v1,float *v2){
        ftemp=0.0;
        for(fi=0;fi<mySize;fi++){
            ftemp+=v1[fi]*v2[fi];
        }
        MPI_Allreduce(&ftemp,&ftemp2,1,MPI_FLOAT,MPI_SUM,MPI_COMM_WORLD);
        return ftemp2;
    }
    However it would not computer the correct dot product so I got rid of the Allreduce and did things more explicitly and put in debug statements and it now looks like this:

    Code:
    float innerProduct(float *v1,float *v2){
        ftemp=0.0;
        for(fi=0;fi<mySize;fi++){
            ftemp+=v1[fi]*v2[fi];
        }
        //MPI Gather Step
        cout <<"Server " << myRank << " has ftemp = " << ftemp << endl;
        MPI_Gather(&ftemp,1,MPI_FLOAT,dotReceive,1,MPI_FLOAT,0,MPI_COMM_WORLD);
        ftemp2=0.0;
        if(myRank==0){
            cout << "&&&& Server 0 RECEIVED [";
            for(fi=0;fi<nProcs;fi++){
                ftemp2+=dotReceive[fi];
                cout << dotReceive[fi] << " ftemp2="<<ftemp2<<" | ";
            }
            cout << "] sum = "<<ftemp2<< endl;
        }
        MPI_Bcast(&ftemp2,1,MPI_FLOAT,0,MPI_COMM_WORLD);
        //MPI_Allreduce(&ftemp,&ftemp2,1,MPI_FLOAT,MPI_SUM,MPI_COMM_WORLD);
        return ftemp2;
    }

    Now, for some reason which I can't for the life of me figure out this code outputs:


    Server 0 has ftemp = 5625216
    &&&& Server 0 RECEIVED [5625216 ftemp2=5625216 | 39245132 ftemp2=44870348 | 106419424 ftemp2=151289776 | 207148304 ftemp2=358438080 | ] sum = 358438080
    Server 3 has ftemp = 207148304
    Server 1 has ftemp = 39245132
    Server 2 has ftemp = 106419424

    WTF?? It appears that the root process is getting the correct values but isn't actually summing properly?! Those numbers do not add up to that sum. Does anyone have a clue what is going on here? Any help would be greatly appreciated

  2. #2
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Does anyone have a clue what is going on here?
    Welcome to the wonderful world of floating point arithmetic. Floating point values are stored differently in computers such that simple arithmetic will not give the exact correct result. If you want better precision, you can declare ftemp as a double instead of a float. This will probably give you the answer you are looking for. An even better solution is to use integer arithmetic instead of floating point values (If that's possible).

  3. #3
    Registered User
    Join Date
    Jun 2009
    Posts
    13
    Is there any way I can make it work with floats? Unfortunately there are severe storage limitations for my code and basically it runs on the biggest vector the computer it's running on can hold so if I switch to doubles it'll half the size.

  4. #4
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    I don't see how you can make it work with floats and get the precision you are looking for. Is there a reason you can't use ints?

  5. #5
    Registered User
    Join Date
    Jun 2009
    Posts
    13
    Because this is just a test vector when I actually finish the code it'll be decimal numbers.

  6. #6
    Registered User
    Join Date
    Jul 2009
    Posts
    36
    Quote Originally Posted by maverick_starst View Post
    Because this is just a test vector when I actually finish the code it'll be decimal numbers.
    It appears you are only using integer values in your floats, so why not use ints? Floats can only represent consecutive integers from 0 through 2^24 (16777216). (They can represent integers exactly beyond that, but only if they have significands that are maximum of 24 bits.)

    If you must use floating-point, then use doubles. They can represent consecutive integers from 0 through 2^53 (900719925474099).

    There are tons of links on the web that can help you understand floating-point, but here's one of mine that may help: What Powers of Two Look Like Inside a Computer - Exploring Binary

    Rick

  7. #7
    Registered User
    Join Date
    Jun 2009
    Posts
    13
    Like I said, the results here are ints because i'm just using a test vector that is b[i]=i+1. However the real vector will have components that are randomly generated numbers between 0 and 1. Is there a c++ class that's like a fixed point decimal number but that's 32 bit not 64 bits like a double? Because, like I said I know my values are going to be between 0 and 1 and if I switch to doubles then I'm essentially cutting my maximum vector size in half.

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by maverick_starst View Post
    Like I said, the results here are ints because i'm just using a test vector that is b[i]=i+1. However the real vector will have components that are randomly generated numbers between 0 and 1. Is there a c++ class that's like a fixed point decimal number but that's 32 bit not 64 bits like a double? Because, like I said I know my values are going to be between 0 and 1 and if I switch to doubles then I'm essentially cutting my maximum vector size in half.
    You get seven (decimal) digits of accuracy in a float (twenty-four bits of mantissa is roughly equal to 7.22 decimal digits). If you've got an answer of several hundred million, then your accuracy runs out in the hundreds place (as you've seen). If you've got an answer in the thousands, then your accuracy runs out in the thousandths place....

  9. #9
    Registered User
    Join Date
    Jul 2009
    Posts
    36
    Quote Originally Posted by maverick_starst View Post
    However the real vector will have components that are randomly generated numbers between 0 and 1.
    Your problem is your test values then. You are using integers greater than a float can handle accurately. Test with values within the range you desire -- 0 and 1. They won't "sum properly" as you say, but the answer will only differ in the ten-millionths place or so. Will that be accurate enough? Then floats will suffice.
    Last edited by DoctorBinary; 07-19-2009 at 07:46 AM.

  10. #10
    Registered User
    Join Date
    Jul 2009
    Posts
    36
    Quote Originally Posted by DoctorBinary View Post
    2^53 (900719925474099)
    I made a cut and paste error: 2^53 is 9007199254740992.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. MPI in C
    By ltee in forum C Programming
    Replies: 5
    Last Post: 03-26-2009, 06:10 AM
  2. Sorting whit MPI
    By isato in forum C Programming
    Replies: 0
    Last Post: 03-03-2009, 10:38 AM
  3. C - mpi programming
    By mpi_beginner in forum C Programming
    Replies: 8
    Last Post: 02-13-2009, 01:19 PM
  4. Malloc and MPI
    By moddinati in forum C Programming
    Replies: 17
    Last Post: 03-07-2008, 07:55 PM
  5. MPI programming
    By kris.c in forum Tech Board
    Replies: 1
    Last Post: 12-08-2006, 12:25 PM