# Thread: Quick question on double precision

1. ## Quick question on double precision

Hi,

I am just messing around on some pre-existing project, and I noticed that if I use something like:

Code:
```double a = 1.354;
double b = 5617963.0 + a;```
Stepping through the debugger i'm getting:

a = 1.3540000000000001
b = 5617964.5000000000

As if the addition of two doubles is only using single precision?

Yet if I create a dummy test project and copy in those exact same lines I get:

a = 1.3540000000000001
b = 5617964.3540000003

Which seems much more like a double should preform. Both projects use Visiual Studio 2008, so my question is, why is the former losing so much precision, and is there an option in VS that would cause it to do that?

Edit:
Even making sure both numbers are doubles I still get the same results:

Code:
```double a = 1.354;
double b = 5617963.0;
double c = b + a;```
Thanks.

2. Does "_controlfp" appear anywhere in your program, or are you using a third-party library that might call it?

3. thanks for the reply iMalc,

It is a fairly large project and it uses lots of 3rd party libs I barely know about, I can't find '_controlfp' anywhere in the code though. I don't know what else could cause it.

4. You need to find a basic text on numerical analysis.

The addition of two double variables does not use single precision. What you are describing is a fundamental property of all floating point representations. Precision is finite, so floating point types only represent a discrete set of values.

In base 10, for example, it is not possible to represent the value 1/3 in a finite number of decimal places (the decimal representation is 0.3333 <recurring forever>). If you are required to represent that fraction in a finite number of decimal places, you will truncate/round at some point.

The same phenomen occurs with all numeric bases. The catch is that the values with an infinite representation depend on the base. Floating point generally works with base 2 (binary) representation of the mantissa. The value 0.1 (decimal) has an infinitely recurring representation in base 2.

When you add two values of reasonably different magnitude, some truncation also occurs. Again, to use an example in decimal, imagine you are limited to 4 significant figures. The value 123.0 + 0.456 is actually 123.456 but, with truncation to 4 significant figures, you will get the value 123.5 (with rounding up). Again, the same sort of effect occurs with any base.

5. Originally Posted by McFury
Code:
```double a = 1.354;
double b = 5617963.0 + a;```
Stepping through the debugger i'm getting:

a = 1.3540000000000001
b = 5617964.5000000000

As if the addition of two doubles is only using single precision?
The value for b you say you see in the debugger makes no sense. Even if it were rounded to one decimal place, it still wouldn't be 5617964.5.

In any case, it's hard to judge what precision you are getting based on printed output -- in general you don't know how many digits are being printed (see my article Print Precision of Dyadic Fractions Varies by Language - Exploring Binary for details).

Originally Posted by McFury
Code:
```double a = 1.354;
double b = 5617963.0;
double c = b + a;```
I put these values in a Python 3.1 program to see their exact double values (Python 3.1 is one of those languages capable of printing all digits):

The value of a is 1.354000000000000092370555648813024163246154785156 25 (I don't know why the cboard editor put a space between the '6' and '25'), and the value of c is 5617964.35400000028312206268310546875. You can see both match their "true" values up to about 16 or 17 digits, which is about the precision of a double.

6. Originally Posted by DoctorBinary
The value for b you say you see in the debugger makes no sense. Even if it were rounded to one decimal place, it still wouldn't be 5617964.5.
Except that floating point values are not rounded in decimal terms. The rounding is of binary digits. Things are a little further complicated as floating point types support a mantissa and an exponent (the value is mantissa times 10 to the power of exponent) - that's another story.

The net effect, however, is that floating point types support a set of discrete values. Lets say x1 and x2 can both be represented exactly in your floating point type, with no other representable values lies between them (that's what discrete means). Try to enter a value between x1 and x2, then the result is either x1 or x2 (the choice depends on how rounding is implemented).

There is no guarantee that the difference between x1 and x2 is less than 0.1. Practically, for "large" values, it can be greater than .1.

As to why things are different in the debugger versus the executable .... lots of reasons. Debuggers can use different floating point representations internally (eg a software emulation of greater precision). There is also the question of how the values are printed. Which brings us to your next comments.
Originally Posted by DoctorBinary
In any case, it's hard to judge what precision you are getting based on printed output -- in general you don't know how many digits are being printed (see my article Print Precision of Dyadic Fractions Varies by Language - Exploring Binary for details).
Nice story, but it's not quite true.

The number of digits output is typically independent of the precision of the underlying data type.

Originally Posted by DoctorBinary
I put these values in a Python 3.1 program to see their exact double values (Python 3.1 is one of those languages capable of printing all digits):

The value of a is 1.354000000000000092370555648813024163246154785156 25 (I don't know why the cboard editor put a space between the '6' and '25'), and the value of c is 5617964.35400000028312206268310546875. You can see both match their "true" values up to about 16 or 17 digits, which is about the precision of a double.
Sorry, but this is incorrect.

That long string of trailing digits means that (1) python is using an extended precision type and/or (2) that digits have been output until some stopping criterion is met (i.e. the I/O function gives up at some point, to avoid getting into an infinite loop).

Describing that output as a "true" value is a fallacy.

Computer programming languages do things differently (eg use of floating point versus extended precision types) but they can't bypass basic physical or mathematical constraints.

The exact value of many decimal fractions (0.1, 0.2, 0.354) still has an infinite representation in binary (values of 0.5 and powers of it are exceptions to that). It is also a mathematical fact that some values that can be represented to a finite number of binary places (eg some of the values that may be stored in a floating point variable) have an infinite representation in decimal. If the value output by Python was a "true" value, it would be looping forever on some values.

7. There is no guarantee that the difference between x1 and x2 is less than 0.1. Practically, for "large" values, it can be greater than .1.
This 'fudge factor' for floats and doubles is defined as FLT_EPSILON and DBL_EPSILON respectively in MSVC.

So a float equality could be coded as:

Code:
```bool FloatEqual(float value1,float value2)
{
bool result = false;
float diff = value2 - value1;

if (diff > -FLT_EPSILON && diff < FLT_EPSILON)
{
result = true;
}

return result;
}```
Using 0.001f as the float epsilon is common but in all cases is wrong. One could use fabs() in the code and eliminate one condition but I opted to leave it out.

8. Originally Posted by grumpy
Except that floating point values are not rounded in decimal terms. The rounding is of binary digits.
Rounding internally is to binary places; printf rounds to decimal places. Try printing 5617964.35400000028312206268310546875 with a format specifier of ".1f" (answer: 5617964.4)

Originally Posted by grumpy
That long string of trailing digits means that (1) python is using an extended precision type and/or (2) that digits have been output until some stopping criterion is met (i.e. the I/O function gives up at some point, to avoid getting into an infinite loop).
Those values are correct. They are the exact decimal representations of the binary values in their respective doubles. There is no chance of an infinite loop (see my next response).

Originally Posted by grumpy
It is also a mathematical fact that some values that can be represented to a finite number of binary places (eg some of the values that may be stored in a floating point variable) have an infinite representation in decimal. If the value output by Python was a "true" value, it would be looping forever on some values..
Not true. Every finite binary fraction terminates in decimal.

Originally Posted by grumpy
Describing that output as a "true" value is a fallacy.
I didn't say the output values (the double values) are the "true" values. This is what I said:

Originally Posted by DoctorBinary
You can see both match their "true" values up to about 16 or 17 digits....
The "true" values being the decimal values: 1.354 and 5617964.354

9. Hi,

Thanks for the discussion guys. Basically my question was just asking why executing the same code, with the same compiler, just in different projects was creating a completely different result.

I don't really think comments like this:

[grumpy]
"You need to find a basic text on numerical analysis."
Are really needed? Thanks for your input though.

[DoctorBinary]
"The value for b you say you see in the debugger makes no sense."
This is what I thought to, but that is the value I get stepping through the debugger. I just wanted to try and find out why.

[DoctorBinary]
"in general you don't know how many digits are being printed "
I was just looking at what the debugger has listed in the 'locals' tab.

10. Originally Posted by McFury
Stepping through the debugger i'm getting:

a = 1.3540000000000001
b = 5617964.5000000000
Can you look at the contents of memory for these variables -- b in particular? If so, can you post it? (Just post the 8 byte hex value and we can decode it.) That is the surefire way to see what's in your variable vs. what is being displayed.

11. Hi again,

Turns out iMalc was correct... Somewhere in the 3rd party lib's they MUST be calling '_controlfp', although I cannot find it...

Turns out that when I checked the precision it had been set to '_PC_24 (24 bits)'. So I just put in a call to set the precision back to 53 bits (for double) to use its normal precision.

Code:
```unsigned int current;
_controlfp_s(&current, _PC_53, _MCW_PC);```
Once I set this it seems to stay set, and the code works fine now, so whatever was calling it calls it only once at startup. I can understand wanting certain sections of code to run faster but doesn't it seem a little silly to just silently set this and not set it back after the 'faster' code you want executed has completed?

Thanks for all of your posts!

12. Interesting...so it's being treated like a float under the covers!

Now the rounding you see makes sense. 5617964.354 in pure binary is 10101011011100100101100.01011... This has 23 bits before the radix point, leaving only one bit after the radix point. The '.0' is rounded to .1 -- that is, 0.5 in decimal -- since bits 25 and beyond total > 1/4.

Thanks for updating us!

13. It's not unheard of for some libraries to change it, and ignore anything else that might be affected, or even not change it back when deinitialised.
Heck I do it myself in my software 3D renderer project. Not that it is packaged in any kind of reusable library at the moment. (In which case I would certainly make users aware of it if I left it in that state)

Glad to hear you got it sorted!

14. Originally Posted by Bubba
This 'fudge factor' for floats and doubles is defined as FLT_EPSILON and DBL_EPSILON respectively in MSVC.
FLT_EPSILON and DBL_EPSILON are defined by the C standard as "the difference between 1 and the least value greater than 1 that is representable in the given floating point type".

The specification of those values does not mean that all consecutive representable values in a floating point value differ by those _EPSILON values.

Originally Posted by McFury
I don't really think comments like this:
[grumpy]
"You need to find a basic text on numerical analysis."
Are really needed? Thanks for your input though.
The comment was a simple statement of fact. All basic texts on numerical analysis describe a number of the phenomena you have described, and the reason for them. In fact, a number of the phenomena you describe are the reason numerical analysis exists. If you take offense at being pointed to detailed discussions relevant to your question (or parts of it: the _controlfp() concern is another aspect) that is your problem.

DocBinary: corrections noted. Thanks for that.