# preventing rounding problems with doubles

• 11-04-2004
mccoz
preventing rounding problems with doubles
Hi there!

I have a problem with an algorithm I wrote: I must use double representation for the values but different sequences of arithmetic operations which are supposed to result in equal numbers are delivering slightly different numbers, e.g. 4*10^(-12) difference. This is definitely due to the binary representation of the numbers but is it somehow possible to circumvent these problems?
For instance:
-1629.000+1629.000 = -4.3200998334214e-012
Right now I am using a lot of rounding to decimal places but there are still some points I am missing. Is there another way to generally prevent these problems?

Thanks a lot for considering my question!!!
--mccoz
• 11-04-2004
Dave Evans
Quote:

Originally Posted by mccoz
Hi there!

I have a problem with an algorithm I wrote: I must use double representation for the values but different sequences of arithmetic operations which are supposed to result in equal numbers are delivering slightly different numbers, e.g. 4*10^(-12) difference. This is definitely due to the binary representation of the numbers but is it somehow possible to circumvent these problems?
For instance:
-1629.000+1629.000 = -4.3200998334214e-012
Right now I am using a lot of rounding to decimal places but there are still some points I am missing. Is there another way to generally prevent these problems?

Thanks a lot for considering my question!!!
--mccoz

If you have data values (or intermediate products in your calculations) that cannot be represented exactly as a floating point number in whatever system you are using, you can't guarantee that roundoff error can be circumvented.

Now as to your example, it turns out that 1269.000 and -1269.000 are exactly representable as floating point numbers in every C compiler that I have access to, so there will be no rounding error.

try this
Code:

```#include <stdio.h> int main() {   double x, y, z;   x = -1269.000;   y =  1269.000;   z = x + y;   printf("x = %.20e, y = %.20e\nz = %.20e\n", x, y, z);     return 0; }```
I would like to know if you get anything other than

z = 0.00000000000000000000e+00

I hate to repeat myself, but if the data items and all intermediate terms in the calculations are exactly representable as floats (or doubles), you won't have round-off errors.

maybe the following shows how people can be confused by printouts:

Code:

```#include <stdio.h> int main() {   double x, y, z;   x = -1269.0000000002;   y =  1269.0000000003;   z = x + y;   printf("With default precision output format, roundoff error is puzzling:\n");   printf("x = %e, y = %e, z = %e\n", x, y, z);   printf("\n\n");   printf("But if you print more significant digits, it makes more sense:\n");   printf("x = %.20e, y = %.20e\nz = %.20e\n", x, y, z);     return 0; }```
The first printf() shows apparent roundoff error in the addition, but actually the values being added don't have a zero sum. This is shown by the second printf()

Regards,

Dave
• 11-04-2004
mccoz
Thanks for the hint, Dave!

I used the values which the visual c++ debugger gives in its watches but those are obviously not accurate enough. Using print I indeed got:
x = 1.62899999999999570000e+003, y = -1.62900000000000000000e+003
z = -4.32009983342140910000e-012

Thus the representation error has occurred somewhere before that line. I have large numbers > LONGUINT_MAX and thus have to use doubles but 3 decimal places is suffiently accurate enough for my purpose.
I suppose, then there is no other way than rounding...?!
• 11-04-2004
mccoz
p.s.: I am rounding so often that it really is an efficiency issue for the program. That's why I am concerned with this question.
• 11-04-2004
Dave Evans
Quote:

Originally Posted by mccoz
Thanks for the hint, Dave!

I used the values which the visual c++ debugger gives in its watches but those are obviously not accurate enough. Using print I indeed got:
x = 1.62899999999999570000e+003, y = -1.62900000000000000000e+003
z = -4.32009983342140910000e-012

Thus the representation error has occurred somewhere before that line. I have large numbers > LONGUINT_MAX and thus have to use doubles but 3 decimal places is suffiently accurate enough for my purpose.
I suppose, then there is no other way than rounding...?!

Stop me if you have heard this before: If any of your data values or any intermediate computational values are not representable exactly as a float (or double) on your system, you can't guarantee that answers will be exact.

Numbers on any computer system are necessarily limited to rational approximations to real numbers. Built-in variables and their operators have values limited by whatever representation is defined by that compiler. Period.

Regards,

Dave
• 11-04-2004
mccoz
Thanks Dave!

I thought there might be some trick how to circumvent having to deal with these small deviations since that accuracy is not needed. It's pretty awkward to always use rounding and do range tests for comparison. Larger integers would be a good solution for my case but I read that they are not ready for general use yet in C++.
• 11-04-2004
Dave Evans
Quote:

Originally Posted by mccoz
Thanks Dave!

I thought there might be some trick how to circumvent having to deal with these small deviations since that accuracy is not needed. It's pretty awkward to always use rounding and do range tests for comparison. Larger integers would be a good solution for my case but I read that they are not ready for general use yet in C++.

The thing that lots of people get hung up on is when they try something like
Code:

```  if (x == y) {   // do something   }   else {     // do something else   }```
The way around this is to recognize that roundoff error is possible, and do something like this to define an acceptable error:

Code:

```  double tolerance = 1.0e-8;     //...... lots of stuff to calculate doubles x and y   if (fabs(x - y) < tolerance) {     // do something   }   else {     // do something else   }```
Regards,

Dave
• 11-04-2004
Zach L.
If you don't know that your numbers will be small, however, you may wish to define a percent of error which is tolerable rather than a concrete difference.
• 11-05-2004
mccoz
Yes, thanks for all your answers! It is clear to me now that there is no other solution than to stick with the rounding and range / tolerance test for comparing doubles.
Thanks again to all of you!
• 11-05-2004
VirtualAce
You cannot do an exact comparison for equality on floats and/or doubles. This is totally undefined even in assembler because no two floats or doubles are ever going to be exactly alike. This kind of comparison should be avoided.

There is another way around this. Use a fudge value that will test if a value approaches being equal to another value then they are assumed to be equal. The roundoff fudge factor is totally up to you.

Rounding cannot be avoided on floats and doubles. There are certain numbers that simply CANNOT be represented in binary using the floating point format. For instance the larger the value in a float the more inaccurate they are. Floats are extremely accurate with small values but as the integral part of the value increases, the accuracy becomes very shaky. This is because there are not enough bits left in the data type to represent the numbers following the decimal point. Some numbers only have 11 possible representations between them. In reality there are an infinite number of representations between any two numbers, but this is not practical nor possible at this time with the current floating point technology.

You can use more registers or use the FPU instructions to merge two floats together and gain a larger amount of values that can be represented, but again you are always going to face rounding errors.