# Thread: extended vs double precision

1. ## extended vs double precision

i'm wondering how much greater is extended precision accuracy compared to double precision?

2. Depends on the machine you are running on - in some machines, extended precision is 80 bits (x86 [x87], for example), in other machines it may be 128 bits, or 64 bits.

If we assume it's x86, then you get 80 bits, you get:
64 [1] bits of mantissa, which gives approximately 21-22 digits[2].
15 bits of exponent.
1 bit of sign.

In double precision, the number conists of:
52 [1] bits of mantissa, which gives approximately 17-18 digits[2].
11 bits of exponent
1 bit of sign.

For completeness, single precision float consists of:
23 [1] bits of mantissa, which gives approximately 7-8 digits[2].
8 bits of exponent
1 bit of sign.

Note that the above is "best case" figures, and if you for example subtract numbers that are close to each other, the resulting number will be filled by zeros on the right-hand side. The number of resulting digits is dependant on the number of digits lost in such a division, e.g. 123456789.123456 - 1234556789.00000 will loose 9 digits in the calculation. The same applies when adding large and small numbers together. 123456789.0000 + 0.123456890123456789 will mean that only some of the latter number is used, becasue the two numbers are first "normalized" - this means that the exponent is equal on both numbers in the addition. The original input number is of course still retaining it's precision, but it's loosing some of it temporarily during the calculation.

This "(temporary) loss of precision" means that sometimes, you need more digits during the middle of the calculation than you do at the end.

[1] The mantissa has an implicit 1, which means that the number is actually 1 bit larger than the stored value [except for the value zero].

[2] The digits that can be represented by a binary sequence in decimal form is "number_of_bits / log2(10)". Since log2(10) is approximately 3, it gives us the above figures.

--
Mats

3. It's interesting to observe the differences between floats and doubles as they go through identical operations. I see that there is a difference depending on the compiler used as well!
Code:
```#include <stdio.h>
#define fnum   1.95
#define loopct 100

int main(void)
{
int i;
char strnum[10] = "1.95";
float f = fnum;
double d = fnum;
long double ld = fnum;

printf("\n            float accuracy test starting value: &#37;s \n\n", strnum);
printf("       float f:  sizeof: %2d , value: %.20f \n", sizeof(f), f);
printf("      double d:  sizeof: %2d , value: %.20f \n", sizeof(d), d);
printf("long double ld:  sizeof: %2d , value: %.30Lf \n", sizeof(ld), ld);

printf("\n      looping x 2 ,,, x to quit: \n\n");
for(i=0; i <= loopct; i++) {
f  =  f * 2;
d  =  d * 2;
ld = ld * 2;
printf("%3d  f= %30.22f d= %30.22f ", i, f, d);
getchar();
}

printf("       float f: value: %.20f \n", f);
printf("      double d: value: %.20f \n", d);
printf("long double ld: value: %.30Lf \n", ld);

return 0;
}```
First I used the mingw gcc. Output for 1.95:
Code:
```            float accuracy test starting value: 1.95

float f:  sizeof:  4 , value: 1.95000004768371580000
double d:  sizeof:  8 , value: 1.95000000000000000000
long double ld:  sizeof: 12 , value: -567251933470801750000000000000000000000000

looping x 2 ,,, x to quit:

0  f=       3.9000000953674316000000 d=       3.8999999999999999000000
1  f=       7.8000001907348633000000 d=       7.7999999999999998000000
2  f=      15.6000003814697270000000 d=      15.6000000000000000000000
3  f=      31.2000007629394530000000 d=      31.1999999999999990000000
4  f=      62.4000015258789060000000 d=      62.3999999999999990000000
5  f=     124.8000030517578100000000 d=     124.8000000000000000000000
6  f=     249.6000061035156200000000 d=     249.5999999999999900000000
7  f=     499.2000122070312500000000 d=     499.1999999999999900000000
8  f=     998.4000244140625000000000 d=     998.3999999999999800000000
9  f=    1996.8000488281250000000000 d=    1996.8000000000000000000000
10  f=    3993.6000976562500000000000 d=    3993.5999999999999000000000
11  f=    7987.2001953125000000000000 d=    7987.1999999999998000000000
12  f=   15974.4003906250000000000000 d=   15974.4000000000000000000000
13  f=   31948.8007812500000000000000 d=   31948.7999999999990000000000
14  f=   63897.6015625000000000000000 d=   63897.5999999999990000000000
15  f=  127795.2031250000000000000000 d=  127795.2000000000000000000000
16  f=  255590.4062500000000000000000 d=  255590.3999999999900000000000
17  f=  511180.8125000000000000000000 d=  511180.7999999999900000000000
18  f= 1022361.6250000000000000000000 d= 1022361.6000000000000000000000
19  f= 2044723.2500000000000000000000 d= 2044723.2000000000000000000000```
I note that %Lf is not working right here.

And this is using the Digital Mars dmc. Output for 1.95:
Code:
```            float accuracy test starting value: 1.95

float f:  sizeof:  4 , value: 1.95000004768371582030
double d:  sizeof:  8 , value: 1.94999999999999995550
long double ld:  sizeof: 10 , value: 1.949999999999999955500000000000

looping x 2 ,,, x to quit:

0  f=       3.9000000953674316406000 d=       3.8999999999999999111000
1  f=       7.8000001907348632812000 d=       7.7999999999999998223000
2  f=      15.6000003814697265620000 d=      15.5999999999999996440000
3  f=      31.2000007629394531250000 d=      31.1999999999999992890000
4  f=      62.4000015258789062500000 d=      62.3999999999999985790000
5  f=     124.8000030517578124900000 d=     124.7999999999999971500000
6  f=     249.6000061035156249900000 d=     249.5999999999999943100000
7  f=     499.2000122070312499900000 d=     499.1999999999999886200000
8  f=     998.4000244140624999900000 d=     998.3999999999999772400000
9  f=    1996.8000488281250000000000 d=    1996.7999999999999544000000
10  f=    3993.6000976562500001000000 d=    3993.5999999999999090000000
11  f=    7987.2001953125000002000000 d=    7987.1999999999998181000000
12  f=   15974.4003906250000000000000 d=   15974.3999999999996360000000
13  f=   31948.8007812500000010000000 d=   31948.7999999999992710000000
14  f=   63897.6015625000000020000000 d=   63897.5999999999985430000000
15  f=  127795.2031250000000000000000 d=  127795.1999999999970900000000
16  f=  255590.4062500000000100000000 d=  255590.3999999999941800000000
17  f=  511180.8125000000000300000000 d=  511180.7999999999883600000000
18  f= 1022361.6249999999999000000000 d= 1022361.5999999999767000000000
19  f= 2044723.2499999999999000000000 d= 2044723.1999999999534000000000
20  f= 4089446.4999999999999000000000 d= 4089446.3999999999068000000000```
My calculator gets 2044723.20 at iteration 19... I 'guess' that's correct...