Thread: Quick question on double precision

  1. #1
    Registered User
    Join Date
    Apr 2004
    Posts
    28

    Quick question on double precision

    Hi,

    I am just messing around on some pre-existing project, and I noticed that if I use something like:

    Code:
    double a = 1.354;
    double b = 5617963.0 + a;
    Stepping through the debugger i'm getting:

    a = 1.3540000000000001
    b = 5617964.5000000000

    As if the addition of two doubles is only using single precision?

    Yet if I create a dummy test project and copy in those exact same lines I get:

    a = 1.3540000000000001
    b = 5617964.3540000003

    Which seems much more like a double should preform. Both projects use Visiual Studio 2008, so my question is, why is the former losing so much precision, and is there an option in VS that would cause it to do that?

    Edit:
    Even making sure both numbers are doubles I still get the same results:

    Code:
    double a = 1.354;
    double b = 5617963.0;
    double c = b + a;
    Thanks.
    Last edited by McFury; 02-15-2010 at 07:46 PM.

  2. #2
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Does "_controlfp" appear anywhere in your program, or are you using a third-party library that might call it?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  3. #3
    Registered User
    Join Date
    Apr 2004
    Posts
    28
    thanks for the reply iMalc,

    It is a fairly large project and it uses lots of 3rd party libs I barely know about, I can't find '_controlfp' anywhere in the code though. I don't know what else could cause it.

  4. #4
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    You need to find a basic text on numerical analysis.

    The addition of two double variables does not use single precision. What you are describing is a fundamental property of all floating point representations. Precision is finite, so floating point types only represent a discrete set of values.

    In base 10, for example, it is not possible to represent the value 1/3 in a finite number of decimal places (the decimal representation is 0.3333 <recurring forever>). If you are required to represent that fraction in a finite number of decimal places, you will truncate/round at some point.

    The same phenomen occurs with all numeric bases. The catch is that the values with an infinite representation depend on the base. Floating point generally works with base 2 (binary) representation of the mantissa. The value 0.1 (decimal) has an infinitely recurring representation in base 2.

    When you add two values of reasonably different magnitude, some truncation also occurs. Again, to use an example in decimal, imagine you are limited to 4 significant figures. The value 123.0 + 0.456 is actually 123.456 but, with truncation to 4 significant figures, you will get the value 123.5 (with rounding up). Again, the same sort of effect occurs with any base.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  5. #5
    Registered User
    Join Date
    Jul 2009
    Posts
    36
    Quote Originally Posted by McFury View Post
    Code:
    double a = 1.354;
    double b = 5617963.0 + a;
    Stepping through the debugger i'm getting:

    a = 1.3540000000000001
    b = 5617964.5000000000

    As if the addition of two doubles is only using single precision?
    The value for b you say you see in the debugger makes no sense. Even if it were rounded to one decimal place, it still wouldn't be 5617964.5.

    In any case, it's hard to judge what precision you are getting based on printed output -- in general you don't know how many digits are being printed (see my article Print Precision of Dyadic Fractions Varies by Language - Exploring Binary for details).

    Quote Originally Posted by McFury View Post
    Code:
    double a = 1.354;
    double b = 5617963.0;
    double c = b + a;
    I put these values in a Python 3.1 program to see their exact double values (Python 3.1 is one of those languages capable of printing all digits):

    The value of a is 1.354000000000000092370555648813024163246154785156 25 (I don't know why the cboard editor put a space between the '6' and '25'), and the value of c is 5617964.35400000028312206268310546875. You can see both match their "true" values up to about 16 or 17 digits, which is about the precision of a double.

  6. #6
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by DoctorBinary View Post
    The value for b you say you see in the debugger makes no sense. Even if it were rounded to one decimal place, it still wouldn't be 5617964.5.
    Except that floating point values are not rounded in decimal terms. The rounding is of binary digits. Things are a little further complicated as floating point types support a mantissa and an exponent (the value is mantissa times 10 to the power of exponent) - that's another story.

    The net effect, however, is that floating point types support a set of discrete values. Lets say x1 and x2 can both be represented exactly in your floating point type, with no other representable values lies between them (that's what discrete means). Try to enter a value between x1 and x2, then the result is either x1 or x2 (the choice depends on how rounding is implemented).

    There is no guarantee that the difference between x1 and x2 is less than 0.1. Practically, for "large" values, it can be greater than .1.

    As to why things are different in the debugger versus the executable .... lots of reasons. Debuggers can use different floating point representations internally (eg a software emulation of greater precision). There is also the question of how the values are printed. Which brings us to your next comments.
    Quote Originally Posted by DoctorBinary View Post
    In any case, it's hard to judge what precision you are getting based on printed output -- in general you don't know how many digits are being printed (see my article Print Precision of Dyadic Fractions Varies by Language - Exploring Binary for details).
    Nice story, but it's not quite true.

    The number of digits output is typically independent of the precision of the underlying data type.

    Quote Originally Posted by DoctorBinary View Post
    I put these values in a Python 3.1 program to see their exact double values (Python 3.1 is one of those languages capable of printing all digits):

    The value of a is 1.354000000000000092370555648813024163246154785156 25 (I don't know why the cboard editor put a space between the '6' and '25'), and the value of c is 5617964.35400000028312206268310546875. You can see both match their "true" values up to about 16 or 17 digits, which is about the precision of a double.
    Sorry, but this is incorrect.

    That long string of trailing digits means that (1) python is using an extended precision type and/or (2) that digits have been output until some stopping criterion is met (i.e. the I/O function gives up at some point, to avoid getting into an infinite loop).

    Describing that output as a "true" value is a fallacy.

    Computer programming languages do things differently (eg use of floating point versus extended precision types) but they can't bypass basic physical or mathematical constraints.

    The exact value of many decimal fractions (0.1, 0.2, 0.354) still has an infinite representation in binary (values of 0.5 and powers of it are exceptions to that). It is also a mathematical fact that some values that can be represented to a finite number of binary places (eg some of the values that may be stored in a floating point variable) have an infinite representation in decimal. If the value output by Python was a "true" value, it would be looping forever on some values.
    Last edited by grumpy; 02-16-2010 at 04:12 PM.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  7. #7
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    There is no guarantee that the difference between x1 and x2 is less than 0.1. Practically, for "large" values, it can be greater than .1.
    This 'fudge factor' for floats and doubles is defined as FLT_EPSILON and DBL_EPSILON respectively in MSVC.

    So a float equality could be coded as:

    Code:
    bool FloatEqual(float value1,float value2)
    {
       bool result = false;
       float diff = value2 - value1;
       
       if (diff > -FLT_EPSILON && diff < FLT_EPSILON)
       {
            result = true;
       }
     
       return result;
    }
    Using 0.001f as the float epsilon is common but in all cases is wrong. One could use fabs() in the code and eliminate one condition but I opted to leave it out.
    Last edited by VirtualAce; 02-16-2010 at 05:30 PM.

  8. #8
    Registered User
    Join Date
    Jul 2009
    Posts
    36
    Quote Originally Posted by grumpy View Post
    Except that floating point values are not rounded in decimal terms. The rounding is of binary digits.
    Rounding internally is to binary places; printf rounds to decimal places. Try printing 5617964.35400000028312206268310546875 with a format specifier of ".1f" (answer: 5617964.4)

    Quote Originally Posted by grumpy View Post
    That long string of trailing digits means that (1) python is using an extended precision type and/or (2) that digits have been output until some stopping criterion is met (i.e. the I/O function gives up at some point, to avoid getting into an infinite loop).
    Those values are correct. They are the exact decimal representations of the binary values in their respective doubles. There is no chance of an infinite loop (see my next response).

    Quote Originally Posted by grumpy View Post
    It is also a mathematical fact that some values that can be represented to a finite number of binary places (eg some of the values that may be stored in a floating point variable) have an infinite representation in decimal. If the value output by Python was a "true" value, it would be looping forever on some values..
    Not true. Every finite binary fraction terminates in decimal.

    Quote Originally Posted by grumpy View Post
    Describing that output as a "true" value is a fallacy.
    I didn't say the output values (the double values) are the "true" values. This is what I said:

    Quote Originally Posted by DoctorBinary View Post
    You can see both match their "true" values up to about 16 or 17 digits....
    The "true" values being the decimal values: 1.354 and 5617964.354
    Last edited by DoctorBinary; 02-17-2010 at 09:53 AM. Reason: Added one more point, about the "true" values

  9. #9
    Registered User
    Join Date
    Apr 2004
    Posts
    28
    Hi,

    Thanks for the discussion guys. Basically my question was just asking why executing the same code, with the same compiler, just in different projects was creating a completely different result.

    I don't really think comments like this:

    [grumpy]
    "You need to find a basic text on numerical analysis."
    Are really needed? Thanks for your input though.

    [DoctorBinary]
    "The value for b you say you see in the debugger makes no sense."
    This is what I thought to, but that is the value I get stepping through the debugger. I just wanted to try and find out why.

    [DoctorBinary]
    "in general you don't know how many digits are being printed "
    I was just looking at what the debugger has listed in the 'locals' tab.

    Thanks for your feedback.

  10. #10
    Registered User
    Join Date
    Jul 2009
    Posts
    36
    Quote Originally Posted by McFury View Post
    Stepping through the debugger i'm getting:

    a = 1.3540000000000001
    b = 5617964.5000000000
    Can you look at the contents of memory for these variables -- b in particular? If so, can you post it? (Just post the 8 byte hex value and we can decode it.) That is the surefire way to see what's in your variable vs. what is being displayed.

  11. #11
    Registered User
    Join Date
    Apr 2004
    Posts
    28
    Hi again,

    Turns out iMalc was correct... Somewhere in the 3rd party lib's they MUST be calling '_controlfp', although I cannot find it...

    Turns out that when I checked the precision it had been set to '_PC_24 (24 bits)'. So I just put in a call to set the precision back to 53 bits (for double) to use its normal precision.

    Code:
    unsigned int current;
    _controlfp_s(&current, _PC_53, _MCW_PC);
    Once I set this it seems to stay set, and the code works fine now, so whatever was calling it calls it only once at startup. I can understand wanting certain sections of code to run faster but doesn't it seem a little silly to just silently set this and not set it back after the 'faster' code you want executed has completed?

    Thanks for all of your posts!

  12. #12
    Registered User
    Join Date
    Jul 2009
    Posts
    36
    Interesting...so it's being treated like a float under the covers!

    Now the rounding you see makes sense. 5617964.354 in pure binary is 10101011011100100101100.01011... This has 23 bits before the radix point, leaving only one bit after the radix point. The '.0' is rounded to .1 -- that is, 0.5 in decimal -- since bits 25 and beyond total > 1/4.

    Thanks for updating us!

  13. #13
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    It's not unheard of for some libraries to change it, and ignore anything else that might be affected, or even not change it back when deinitialised.
    Heck I do it myself in my software 3D renderer project. Not that it is packaged in any kind of reusable library at the moment. (In which case I would certainly make users aware of it if I left it in that state)

    Glad to hear you got it sorted!
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  14. #14
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by Bubba View Post
    This 'fudge factor' for floats and doubles is defined as FLT_EPSILON and DBL_EPSILON respectively in MSVC.
    FLT_EPSILON and DBL_EPSILON are defined by the C standard as "the difference between 1 and the least value greater than 1 that is representable in the given floating point type".

    The specification of those values does not mean that all consecutive representable values in a floating point value differ by those _EPSILON values.


    Quote Originally Posted by McFury View Post
    I don't really think comments like this:
    [grumpy]
    "You need to find a basic text on numerical analysis."
    Are really needed? Thanks for your input though.
    The comment was a simple statement of fact. All basic texts on numerical analysis describe a number of the phenomena you have described, and the reason for them. In fact, a number of the phenomena you describe are the reason numerical analysis exists. If you take offense at being pointed to detailed discussions relevant to your question (or parts of it: the _controlfp() concern is another aspect) that is your problem.

    DocBinary: corrections noted. Thanks for that.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Looping Question
    By ewandougie in forum C++ Programming
    Replies: 9
    Last Post: 12-27-2009, 08:21 PM
  2. Storage and interpretation of double precision numbers
    By KBriggs in forum C Programming
    Replies: 5
    Last Post: 08-12-2009, 01:40 PM
  3. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  4. Copying 2-d arrays
    By Holtzy in forum C++ Programming
    Replies: 11
    Last Post: 03-14-2008, 03:44 PM
  5. numerical precision of a double
    By mc61 in forum C Programming
    Replies: 6
    Last Post: 03-11-2008, 12:23 PM