Thread: avoiding numeric/truncation errors

  1. #1
    Registered User
    Join Date
    Feb 2011
    Posts
    42

    avoiding numeric/truncation errors

    What is the best way to avoid truncation and rounding errors? I'm attempting to implement some math subprograms like finding the volume of a sphere, which takes a (4/3)PI(Rcubed) formula, but the (4/3) will obviously round down to 1.00 rather than the 1.33 I need, destroying my formula. Any advice? Thanks

    Another part of the program has me also taking an array of float values, sorting them into ascending order, and displaying them along with their sum, however this same truncation/rounding error causes me some problems here too, because many of the float values are decimal which are not being representing on output.
    Last edited by philgrek; 04-13-2011 at 09:50 PM.

  2. #2
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    If you do the calculations with a floating type, they'll be (more) precise. 4.0/3, for example, will give a value around 1.3333, as a double.

    I'm not sure what "decimal which are not being representing on output means", but floating point is always going to be somewhat imprecise. You can use double, or long double, to try for more precision; but if values are wildly off, you're doing something wrong. Adding 1.4 and 1.4 should give a value close to 2.8, for example. If you get something like 2.799999999999999 that's fine; but if you get, say, 3, you're likely making a mistake somewhere.

  3. #3
    Registered User
    Join Date
    Feb 2011
    Posts
    42
    What I meant was that i'm entering float values into an array, sorting them, and printing the array out. The values i'm using for input are .3476789, 1004008.67, .0000099, 1.3435678, 78.345678, 54321678.567, and 22.6. However, when displayed back in ascending order they are:
    0.000010
    0.000000
    1.345678
    22.600000
    78.000000
    1004008.000000
    54321680.000000
    Sum of the array is: 55325788.000000
    The sum of the Array should be : 55325789.8739346

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Since none of us are mind readers... please post the minimum compileable segment of code that demonstrates the problem.

  5. #5
    Registered User
    Join Date
    Feb 2011
    Posts
    42
    I caught an error in my sorting algorithm, I had the temp value defined as int, so that was obviously causing a problem. However, even though fixing that repaired some of the problems, there are still a few:
    xArray[0]: 0.000010
    xArray[1]: 0.347679
    xArray[2]: 1.343568
    xArray[3]: 22.600000
    xArray[4]: 78.345680
    xArray[5]: 1004008.687500
    xArray[6]: 54321680.000000
    Sum of the array is: 55325788.000000
    The sum of the Array should be : 55325789.8739346

    The value in xArray[6] should have output at 54321680.567..... And the two totals are 1.8739346 apart...

  6. #6
    Registered User
    Join Date
    Feb 2011
    Posts
    42
    Code:
    float sumFloats(float xArray[], int numFloats){
          
            int j, k;
            float sumArray;
            for(j=0; j < numFloats - 1; j++){
                    for(k = j + 1; k < numFloats; k++){
                            if(*(xArray + j) >= *(xArray + k)){
                                    float temp = *(xArray +j);
                                    *(xArray + j) = *(xArray + k);
                                    *(xArray + k) = temp;
                            }
                    }
            }
    
            for(k = 0; k < numFloats; k++){
                    printf("xArray[%d]: %f\n", k, xArray[k]);
                    sumArray = sumArray + xArray[k];
            }
    
            printf("Sum of the array is: %f\n", sumArray);
            printf("The sum of the Array should be : 55325789.8739346\n\n");
    }

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Ok... one more time.... Please post your code!

  8. #8
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    void sumFloats(double xArray[], int numFloats){
    
            int j, k;
            double sumArray = 0.0;
            for(j=0; j < numFloats - 1; j++){
                    for(k = j + 1; k < numFloats; k++){
                            if(*(xArray + j) >= *(xArray + k)){
                                    double temp = *(xArray +j);
                                    *(xArray + j) = *(xArray + k);
                                    *(xArray + k) = temp;
                            }
                    }
            }
    
            for(k = 0; k < numFloats; k++){
                    printf("xArray[%d]: %f\n", k, xArray[k]);
                    sumArray = sumArray + xArray[k];
            }
    
            printf("Sum of the array is: %f\n", sumArray);
            printf("The sum of the Array should be : 55325789.8739346\n\n");
    }
    int main (void)
    {
        double myarray[] = { .3476789, 1004008.67, .0000099, 1.3435678, 78.345678, 54321678.567, 22.6 };
        sumFloats(myarray, sizeof(myarray) / sizeof(*myarray));
        return 0;
    }
    #if 0
    My output: 
    xArray[0]: 0.000010
    xArray[1]: 0.347679
    xArray[2]: 1.343568
    xArray[3]: 22.600000
    xArray[4]: 78.345678
    xArray[5]: 1004008.670000
    xArray[6]: 54321678.567000
    Sum of the array is: 55325789.873935
    The sum of the Array should be : 55325789.8739346
    #endif
    Apart from this single change, notice that I'm using double instead of float here. The larger precision of a double seems to fix your problem. You might be pedantic and say that 55325789.873935 != 55325789.8739346, but it's damn close. If you really need a number so precise, though, I recommend using gmp instead.
    Last edited by whiteflags; 04-13-2011 at 11:53 PM.

  9. #9
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    I think it should be taught as part of the course that float gives 7 decimal digits and double gives almost 16. Basic knowledge before one consciously chooses float vs. double.

    I don't think installing some third-party high precision math library is what is expected of the student to solve the problem.
    Last edited by nonoob; 04-14-2011 at 11:50 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Truncation
    By JSM in forum C Programming
    Replies: 5
    Last Post: 10-22-2010, 02:41 PM
  2. No truncation?
    By SirCrono6 in forum C++ Programming
    Replies: 6
    Last Post: 02-19-2006, 01:18 AM
  3. weird truncation
    By kristy in forum C Programming
    Replies: 6
    Last Post: 08-06-2003, 04:03 PM
  4. Please help... Avoiding run-time errors.
    By marCpluSpluS in forum C++ Programming
    Replies: 8
    Last Post: 08-31-2002, 05:17 PM
  5. truncation...
    By Unregistered in forum C++ Programming
    Replies: 1
    Last Post: 11-12-2001, 02:15 PM