Thread: Head Banging Floating Point Conversions

  1. #1
    Code Monkey Davros's Avatar
    Join Date
    Jun 2002
    Posts
    812

    Head Banging Floating Point Conversions

    Hi there,

    Can I assume that a floating point (long double) will be encoded the same way on every 32 bit (INTEL) computer? Can I assume that on a 32 bit INTEL, a long double is able to store 19 decimals digits reliably? I'm beginning to think not.

    I'm trying to convert a long double to a string, with 18 digits of total precision (or 17 digits after the decimal point in scientific form). In theory, this should be OK as a long double is able to represent 19 digit decimals accurately, or at least that's how it's documented, and I only want to convert to a precision of 18. However, I've found that while this works on most machines, it doesn't work on every one. So I ended up storing and decoding long double BCD values myself to see what I was getting.

    Suprisingly I found that it can differ, i.e:

    On my owm machine:
    long double val 10066.52L is represented as (sign) 0| (exp*) 0000000D| (sig) 9D4A147AE147AE14

    *the bias has been removed from the exponent value.

    On a problem machine:
    long double val 10066.52L is represented as 0|0000000D|9D4A147AE147B000

    NOTICE the low order bits are different in the significand.

    When I decode the above representations on paper (which is hard work), I get:

    10066.5199999999999|9957367435854393989

    and

    10066.5200000000004|3655745685101

    respectively.

    I've marked, with a '|', where the conversion should truncate and be rounded. So you can see on my machine 10066.52L converts to string "10066.52", while on the other machine it converts to a head bangingly frustrating "10066.5200000000004".

    In the second example, it appears that the full BCB precision available is simply not being used by the CPU.

    Is this a flaw in that CPU? Or is this normal behaviour?

    Both CPUs are INTEL, the problem CPU is a slightly later model than my own.

    What's the solution? I'm thinking about lowering the precision I expect from string conversions - what should I lower it to so that I can be sure that it will work - always?

    Any help or comments appreciated.

    Thanks

    Andy
    OS: Windows XP
    Compilers: MinGW (Code::Blocks), BCB 5

    BigAngryDog.com

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    All Intel 32-bit CPUs should be the same, but the compilers and compiler settings may not be the same.

    Sometimes long double is 64 bit, sometimes 80, but it might be something else too.


    You could try comparing the members of std::numeric_limits<long double> on the two machines.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    *******argv[] - hu? darksaidin's Avatar
    Join Date
    Jul 2003
    Posts
    314
    Well, the dilemma is that there is no binary equivalent of 0.52 (the 10066 doesn't matter for now). If you do the math by hand you'll you come up with

    Code:
    0.52 x 2 = 1.04 => 1
    0.04 x 2 = 0.08 => 0
    0.08 x 2 = 0.16 => 0
    0.16 x 2 = 0.32 => 0
    0.32 x 2 = 0.64 => 0
    0.64 x 2 = 1.28 => 1
    0.28 x 2 = 0.56 => 0
    0.56 x 2 = 1.12 => 1
    0.12 x 2 = 0.24 => 0
    0.24 x 2 = 0.48 => 0
    0.48 x 2 = 0.96 => 0
    0.96 x 2 = 1.92 => 1
    0.92 x 2 = 1.84 => 1
    0.84 x 2 = 1.68 => 1
    0.68 x 2 = 1.36 => 1
    0.36 x 2 = 0.72 => 0
    0.72 x 2 = 1.44 => 1
    0.44 x 2 = 0.88 => 0
    0.88 x 2 = 1.76 => 1
    0.76 x 2 = 1.52 => 1
    0.52 !!! see top. endless.
    (I spared myself the exponent/shifting part. It wouldn't change the result.)

    I can only guess that there is some kind of optimization in place (cpu or c) that breaks out of the endless loop before all 23 bits of your float are calculated. In other words, something is broken
    [code]

    your code here....

    [/code]

  4. #4
    Code Monkey Davros's Avatar
    Join Date
    Jun 2002
    Posts
    812
    Thanks for the reply.

    This isn't a compiler issue - it is the same exe run on both machines.

    In any case, I'm actually getting program to list the bytes of a long double, and if one were actually encoded as 64 bits - I would see it.

    From the decodes in my first post, I can see it's an issue with the CPU. Certainly the full BCD precision is not being used in one the CPUs.

    I am tempted to state that this is a CPU flaw. However, I read (buried in MSN documentation) that:

    "a=1.345f;
    b=1.123f;
    c=a+b;

    ...The value of c is 2.4679999352 or 2.468000."

    Which implies variation in precision behaviour is 'normal'.

    Urgh! Not good enough in my book. But is it something I have to live with? Or is the problem I'm having due to a flaw with one CPU model I can regard as an isolated case & ignore?

    What level of precision can I rely upon for 32 bit CPUs?

    Thanks
    Last edited by Davros; 02-22-2004 at 10:11 AM.
    OS: Windows XP
    Compilers: MinGW (Code::Blocks), BCB 5

    BigAngryDog.com

  5. #5
    Code Monkey Davros's Avatar
    Join Date
    Jun 2002
    Posts
    812
    >Well, the dilemma is that there is no binary equivalent of 0.52

    Hi darksaidin!

    No I'm aware that 0.52 is an irrational number when represented as binary. However, given an known binary precision, then it should be possible to convert to a decimal within a corresponding precision limit.

    In other word,

    10066.52 when encoded and decoded, should become:

    10066.51999999999999957367435854393989

    That's OK cos we truncate and round the last decimal, provided we truncate no more than 19 decimal digits for long doubles.

    The 10066 is important here because it uses up precision, giving a maximum of 14 useful decimal digits after the decimal place.

    On the 'problem' CPU it 10066.52 is BCD encoded in differently - i.e. the low order significand bytes are zeroed and the last byte is rounded up.

    Also, I've test a range of long double values. Some are encoded properly, some are not.
    Last edited by Davros; 02-22-2004 at 10:23 AM.
    OS: Windows XP
    Compilers: MinGW (Code::Blocks), BCB 5

    BigAngryDog.com

  6. #6
    *******argv[] - hu? darksaidin's Avatar
    Join Date
    Jul 2003
    Posts
    314
    I had something about long double in mind so the 10066 won't really matter as there are sufficient bits left for those 20 repeating bits. But then I also said float and talked about 23 bits so... well, ya

    Anyways, the best idea would probably be to google for some IEEE 754 specs and see what you can make of it. If full precision is not guaranteed, you'll have to live with it - unless speed doesn't matter. In that case you could implement your own float class.
    [code]

    your code here....

    [/code]

  7. #7
    Code Monkey Davros's Avatar
    Join Date
    Jun 2002
    Posts
    812
    >Anyways, the best idea would probably be to google for some IEEE 754 specs and see what you can make of it.

    Thanks. Will do.

    >In that case you could implement your own float class.

    That's what I'm thinking. But for my purposes, this may be a large undertaking.
    OS: Windows XP
    Compilers: MinGW (Code::Blocks), BCB 5

    BigAngryDog.com

  8. #8
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    There are arbitrary precision math libraries available. One of them is even available for free under the GPL.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  9. #9
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    What Every Computer Scientist Should Know About Floating-Point Arithmetic

    GMP may be the library CornedBee was refering to.
    If you want to compile it yourself, you'll need MinGW.
    You can download pre-built binaries for Windows via ftp here: ftp://deltatrinity.dynip.com/gmp-4.1.2_DLL_SharedLibs/.


    gg

  10. #10
    Code Monkey Davros's Avatar
    Join Date
    Jun 2002
    Posts
    812
    Thanks everyone.
    OS: Windows XP
    Compilers: MinGW (Code::Blocks), BCB 5

    BigAngryDog.com

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. question about a working linked list
    By cold_dog in forum C++ Programming
    Replies: 23
    Last Post: 09-13-2006, 01:00 AM
  2. How accurate is the following...
    By emeyer in forum C Programming
    Replies: 22
    Last Post: 12-07-2005, 12:07 PM
  3. floating point question
    By Eric Cheong in forum C Programming
    Replies: 8
    Last Post: 09-10-2004, 10:48 PM
  4. Floating point faster than fixed-point
    By VirtualAce in forum A Brief History of Cprogramming.com
    Replies: 5
    Last Post: 11-08-2001, 11:34 PM
  5. Replies: 2
    Last Post: 09-10-2001, 12:00 PM