Thread: preventing rounding problems with doubles

  1. #1
    Registered User
    Join Date
    Nov 2004
    Posts
    7

    Unhappy preventing rounding problems with doubles

    Hi there!

    I have a problem with an algorithm I wrote: I must use double representation for the values but different sequences of arithmetic operations which are supposed to result in equal numbers are delivering slightly different numbers, e.g. 4*10^(-12) difference. This is definitely due to the binary representation of the numbers but is it somehow possible to circumvent these problems?
    For instance:
    -1629.000+1629.000 = -4.3200998334214e-012
    Right now I am using a lot of rounding to decimal places but there are still some points I am missing. Is there another way to generally prevent these problems?

    Thanks a lot for considering my question!!!
    --mccoz

  2. #2
    Registered User
    Join Date
    Mar 2004
    Posts
    536
    Quote Originally Posted by mccoz
    Hi there!

    I have a problem with an algorithm I wrote: I must use double representation for the values but different sequences of arithmetic operations which are supposed to result in equal numbers are delivering slightly different numbers, e.g. 4*10^(-12) difference. This is definitely due to the binary representation of the numbers but is it somehow possible to circumvent these problems?
    For instance:
    -1629.000+1629.000 = -4.3200998334214e-012
    Right now I am using a lot of rounding to decimal places but there are still some points I am missing. Is there another way to generally prevent these problems?

    Thanks a lot for considering my question!!!
    --mccoz
    If you have data values (or intermediate products in your calculations) that cannot be represented exactly as a floating point number in whatever system you are using, you can't guarantee that roundoff error can be circumvented.

    Now as to your example, it turns out that 1269.000 and -1269.000 are exactly representable as floating point numbers in every C compiler that I have access to, so there will be no rounding error.

    try this
    Code:
    #include <stdio.h>
    
    int main()
    {
      double x, y, z;
    
      x = -1269.000;
      y =  1269.000;
      z = x + y;
    
      printf("x = %.20e, y = %.20e\nz = %.20e\n", x, y, z);
      
      return 0;
    }
    I would like to know if you get anything other than

    z = 0.00000000000000000000e+00

    I hate to repeat myself, but if the data items and all intermediate terms in the calculations are exactly representable as floats (or doubles), you won't have round-off errors.


    maybe the following shows how people can be confused by printouts:

    Code:
    #include <stdio.h>
    
    int main()
    {
      double x, y, z;
    
      x = -1269.0000000002;
      y =  1269.0000000003;
      z = x + y;
    
      printf("With default precision output format, roundoff error is puzzling:\n");
      printf("x = %e, y = %e, z = %e\n", x, y, z);
      printf("\n\n");
      printf("But if you print more significant digits, it makes more sense:\n");
      printf("x = %.20e, y = %.20e\nz = %.20e\n", x, y, z);
      
      return 0;
    }
    The first printf() shows apparent roundoff error in the addition, but actually the values being added don't have a zero sum. This is shown by the second printf()


    Regards,

    Dave

  3. #3
    Registered User
    Join Date
    Nov 2004
    Posts
    7
    Thanks for the hint, Dave!

    I used the values which the visual c++ debugger gives in its watches but those are obviously not accurate enough. Using print I indeed got:
    x = 1.62899999999999570000e+003, y = -1.62900000000000000000e+003
    z = -4.32009983342140910000e-012

    Thus the representation error has occurred somewhere before that line. I have large numbers > LONGUINT_MAX and thus have to use doubles but 3 decimal places is suffiently accurate enough for my purpose.
    I suppose, then there is no other way than rounding...?!

  4. #4
    Registered User
    Join Date
    Nov 2004
    Posts
    7
    p.s.: I am rounding so often that it really is an efficiency issue for the program. That's why I am concerned with this question.

  5. #5
    Registered User
    Join Date
    Mar 2004
    Posts
    536
    Quote Originally Posted by mccoz
    Thanks for the hint, Dave!

    I used the values which the visual c++ debugger gives in its watches but those are obviously not accurate enough. Using print I indeed got:
    x = 1.62899999999999570000e+003, y = -1.62900000000000000000e+003
    z = -4.32009983342140910000e-012

    Thus the representation error has occurred somewhere before that line. I have large numbers > LONGUINT_MAX and thus have to use doubles but 3 decimal places is suffiently accurate enough for my purpose.
    I suppose, then there is no other way than rounding...?!
    Stop me if you have heard this before: If any of your data values or any intermediate computational values are not representable exactly as a float (or double) on your system, you can't guarantee that answers will be exact.

    Numbers on any computer system are necessarily limited to rational approximations to real numbers. Built-in variables and their operators have values limited by whatever representation is defined by that compiler. Period.


    Regards,

    Dave

  6. #6
    Registered User
    Join Date
    Nov 2004
    Posts
    7
    Thanks Dave!

    I thought there might be some trick how to circumvent having to deal with these small deviations since that accuracy is not needed. It's pretty awkward to always use rounding and do range tests for comparison. Larger integers would be a good solution for my case but I read that they are not ready for general use yet in C++.

  7. #7
    Registered User
    Join Date
    Mar 2004
    Posts
    536
    Quote Originally Posted by mccoz
    Thanks Dave!

    I thought there might be some trick how to circumvent having to deal with these small deviations since that accuracy is not needed. It's pretty awkward to always use rounding and do range tests for comparison. Larger integers would be a good solution for my case but I read that they are not ready for general use yet in C++.
    The thing that lots of people get hung up on is when they try something like
    Code:
      if (x == y) {
       // do something 
      }
      else {
        // do something else
      }
    The way around this is to recognize that roundoff error is possible, and do something like this to define an acceptable error:

    Code:
      double tolerance = 1.0e-8;
      
      //...... lots of stuff to calculate doubles x and y
    
      if (fabs(x - y) < tolerance) {
        // do something
      }
      else {
        // do something else
      }
    Regards,

    Dave

  8. #8
    Toaster Zach L.'s Avatar
    Join Date
    Aug 2001
    Posts
    2,686
    If you don't know that your numbers will be small, however, you may wish to define a percent of error which is tolerable rather than a concrete difference.
    The word rap as it applies to music is the result of a peculiar phonological rule which has stripped the word of its initial voiceless velar stop.

  9. #9
    Registered User
    Join Date
    Nov 2004
    Posts
    7
    Yes, thanks for all your answers! It is clear to me now that there is no other solution than to stick with the rounding and range / tolerance test for comparing doubles.
    Thanks again to all of you!

  10. #10
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    You cannot do an exact comparison for equality on floats and/or doubles. This is totally undefined even in assembler because no two floats or doubles are ever going to be exactly alike. This kind of comparison should be avoided.

    There is another way around this. Use a fudge value that will test if a value approaches being equal to another value then they are assumed to be equal. The roundoff fudge factor is totally up to you.

    Rounding cannot be avoided on floats and doubles. There are certain numbers that simply CANNOT be represented in binary using the floating point format. For instance the larger the value in a float the more inaccurate they are. Floats are extremely accurate with small values but as the integral part of the value increases, the accuracy becomes very shaky. This is because there are not enough bits left in the data type to represent the numbers following the decimal point. Some numbers only have 11 possible representations between them. In reality there are an infinite number of representations between any two numbers, but this is not practical nor possible at this time with the current floating point technology.

    You can use more registers or use the FPU instructions to merge two floats together and gain a larger amount of values that can be represented, but again you are always going to face rounding errors.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. parse doubles from socket read?
    By willy in forum C Programming
    Replies: 4
    Last Post: 05-28-2008, 05:32 AM
  2. DJGPP problems
    By stormswift in forum C Programming
    Replies: 2
    Last Post: 02-26-2002, 04:35 PM
  3. Coding Problems
    By RpiMatty in forum C++ Programming
    Replies: 12
    Last Post: 01-06-2002, 02:47 AM
  4. Problems with my RPG app
    By valar_king in forum Game Programming
    Replies: 1
    Last Post: 12-15-2001, 08:07 PM
  5. problems with too many warning messages?
    By Isometric in forum C Programming
    Replies: 9
    Last Post: 11-25-2001, 01:23 AM