i'm wondering how much greater is extended precision accuracy compared to double precision?
This is a discussion on extended vs double precision within the C Programming forums, part of the General Programming Boards category; i'm wondering how much greater is extended precision accuracy compared to double precision?...
i'm wondering how much greater is extended precision accuracy compared to double precision?
Depends on the machine you are running on - in some machines, extended precision is 80 bits (x86 [x87], for example), in other machines it may be 128 bits, or 64 bits.
If we assume it's x86, then you get 80 bits, you get:
64 [1] bits of mantissa, which gives approximately 21-22 digits[2].
15 bits of exponent.
1 bit of sign.
In double precision, the number conists of:
52 [1] bits of mantissa, which gives approximately 17-18 digits[2].
11 bits of exponent
1 bit of sign.
For completeness, single precision float consists of:
23 [1] bits of mantissa, which gives approximately 7-8 digits[2].
8 bits of exponent
1 bit of sign.
Note that the above is "best case" figures, and if you for example subtract numbers that are close to each other, the resulting number will be filled by zeros on the right-hand side. The number of resulting digits is dependant on the number of digits lost in such a division, e.g. 123456789.123456 - 1234556789.00000 will loose 9 digits in the calculation. The same applies when adding large and small numbers together. 123456789.0000 + 0.123456890123456789 will mean that only some of the latter number is used, becasue the two numbers are first "normalized" - this means that the exponent is equal on both numbers in the addition. The original input number is of course still retaining it's precision, but it's loosing some of it temporarily during the calculation.
This "(temporary) loss of precision" means that sometimes, you need more digits during the middle of the calculation than you do at the end.
[1] The mantissa has an implicit 1, which means that the number is actually 1 bit larger than the stored value [except for the value zero].
[2] The digits that can be represented by a binary sequence in decimal form is "number_of_bits / log2(10)". Since log2(10) is approximately 3, it gives us the above figures.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
It's interesting to observe the differences between floats and doubles as they go through identical operations. I see that there is a difference depending on the compiler used as well!
First I used the mingw gcc. Output for 1.95:Code:#include <stdio.h> #define fnum 1.95 #define loopct 100 int main(void) { int i; char strnum[10] = "1.95"; float f = fnum; double d = fnum; long double ld = fnum; printf("\n float accuracy test starting value: %s \n\n", strnum); printf(" float f: sizeof: %2d , value: %.20f \n", sizeof(f), f); printf(" double d: sizeof: %2d , value: %.20f \n", sizeof(d), d); printf("long double ld: sizeof: %2d , value: %.30Lf \n", sizeof(ld), ld); printf("\n looping x 2 ,,, x to quit: \n\n"); for(i=0; i <= loopct; i++) { f = f * 2; d = d * 2; ld = ld * 2; printf("%3d f= %30.22f d= %30.22f ", i, f, d); getchar(); } printf(" float f: value: %.20f \n", f); printf(" double d: value: %.20f \n", d); printf("long double ld: value: %.30Lf \n", ld); return 0; }
I note that %Lf is not working right here.Code:float accuracy test starting value: 1.95 float f: sizeof: 4 , value: 1.95000004768371580000 double d: sizeof: 8 , value: 1.95000000000000000000 long double ld: sizeof: 12 , value: -567251933470801750000000000000000000000000 looping x 2 ,,, x to quit: 0 f= 3.9000000953674316000000 d= 3.8999999999999999000000 1 f= 7.8000001907348633000000 d= 7.7999999999999998000000 2 f= 15.6000003814697270000000 d= 15.6000000000000000000000 3 f= 31.2000007629394530000000 d= 31.1999999999999990000000 4 f= 62.4000015258789060000000 d= 62.3999999999999990000000 5 f= 124.8000030517578100000000 d= 124.8000000000000000000000 6 f= 249.6000061035156200000000 d= 249.5999999999999900000000 7 f= 499.2000122070312500000000 d= 499.1999999999999900000000 8 f= 998.4000244140625000000000 d= 998.3999999999999800000000 9 f= 1996.8000488281250000000000 d= 1996.8000000000000000000000 10 f= 3993.6000976562500000000000 d= 3993.5999999999999000000000 11 f= 7987.2001953125000000000000 d= 7987.1999999999998000000000 12 f= 15974.4003906250000000000000 d= 15974.4000000000000000000000 13 f= 31948.8007812500000000000000 d= 31948.7999999999990000000000 14 f= 63897.6015625000000000000000 d= 63897.5999999999990000000000 15 f= 127795.2031250000000000000000 d= 127795.2000000000000000000000 16 f= 255590.4062500000000000000000 d= 255590.3999999999900000000000 17 f= 511180.8125000000000000000000 d= 511180.7999999999900000000000 18 f= 1022361.6250000000000000000000 d= 1022361.6000000000000000000000 19 f= 2044723.2500000000000000000000 d= 2044723.2000000000000000000000
And this is using the Digital Mars dmc. Output for 1.95:
My calculator gets 2044723.20 at iteration 19... I 'guess' that's correct...Code:float accuracy test starting value: 1.95 float f: sizeof: 4 , value: 1.95000004768371582030 double d: sizeof: 8 , value: 1.94999999999999995550 long double ld: sizeof: 10 , value: 1.949999999999999955500000000000 looping x 2 ,,, x to quit: 0 f= 3.9000000953674316406000 d= 3.8999999999999999111000 1 f= 7.8000001907348632812000 d= 7.7999999999999998223000 2 f= 15.6000003814697265620000 d= 15.5999999999999996440000 3 f= 31.2000007629394531250000 d= 31.1999999999999992890000 4 f= 62.4000015258789062500000 d= 62.3999999999999985790000 5 f= 124.8000030517578124900000 d= 124.7999999999999971500000 6 f= 249.6000061035156249900000 d= 249.5999999999999943100000 7 f= 499.2000122070312499900000 d= 499.1999999999999886200000 8 f= 998.4000244140624999900000 d= 998.3999999999999772400000 9 f= 1996.8000488281250000000000 d= 1996.7999999999999544000000 10 f= 3993.6000976562500001000000 d= 3993.5999999999999090000000 11 f= 7987.2001953125000002000000 d= 7987.1999999999998181000000 12 f= 15974.4003906250000000000000 d= 15974.3999999999996360000000 13 f= 31948.8007812500000010000000 d= 31948.7999999999992710000000 14 f= 63897.6015625000000020000000 d= 63897.5999999999985430000000 15 f= 127795.2031250000000000000000 d= 127795.1999999999970900000000 16 f= 255590.4062500000000100000000 d= 255590.3999999999941800000000 17 f= 511180.8125000000000300000000 d= 511180.7999999999883600000000 18 f= 1022361.6249999999999000000000 d= 1022361.5999999999767000000000 19 f= 2044723.2499999999999000000000 d= 2044723.1999999999534000000000 20 f= 4089446.4999999999999000000000 d= 4089446.3999999999068000000000
Last edited by HowardL; 09-19-2007 at 02:07 PM.