I can't figure out why my code is rounding to a wrong number.

**flp1969** · 11-18-2020

Here's a brief explanation about 'floating point' based on Salem's correct comments on the subject above.

"Precision" is the number of significative digits (in this case, BITS) of a representation. 'double' has 53 bits of precision, 'float', 24... And, yes, 'int' has 31 bits of precision in 32 or 64 bits systems (usually). Floating point is a estruture to store RATIONAL values (a fraction A/B, where B is fixed). Any floating point (following IEEE 754 standard -- which is, today, de facto, standard) encodes a fractional "normalized" value using the formula:

v=(-1)^S * ( 1 + F/(2^(P-1)) ) * 2^(E-bias)

Where S, F and E are POSITIVE integer values... using 'double precision', P=53, so F has 52 bits, E has 11 bits and S, only 1.

The value 1.8 is encoded as:

1.8 approx (-1)^0*(1 + 3602879701896397/4503599627370496)*2^(1023-1023). Here S=0, F=3602879701896397 and E=1023. This gives us 1.800000000000000044408920985006261616945266723632 8125. Notice if the value 3602879701896396 for F, the final value would be 1.799999999999999822364316059974953532218933105468 75, which is off by a greater error...

So, 'floating point' is ALWAYS an approximation based on a fraction of 2 integers.

As Salem explained, if you divide 180 by this representation of 1.8, you'll get:

Code:

$ bc
scale=50
180/((-1)^0*(1 + 3602879701896397/(2^52))*2^(1023-1023))
99.99999999999999753283772305520774881638218981451873

When truncated to `int` will give you 99, not 100.

My advice: Avoid using floating point at all costs!

PS: How did I got those values? This way:

Code:

#include <stdio.h>

// ieee-754 double precision structure.
struct dbl_s
{
  unsigned long long f: 52;
  unsigned long long e: 11;
  unsigned long long s: 1;
} __attribute__ ( ( packed ) );

int main ( void )
{
  double d = 1.8;
  struct dbl_s *p = ( struct dbl_s * ) &d;

  printf ( "%.52f = (-1)^%u*(1 + %llu/%llu)*2^(%u-1023)\n",
           d,
           ( unsigned int ) p->s, 
           ( unsigned long long ) p->f, 
           1ULL << 52, 
           ( unsigned int ) p->e );
}

Thread: I can't figure out why my code is rounding to a wrong number.

Thread Tools

Search Thread

Display

Hybrid View

Similar Threads

What's wrong with my code? (nth prime number only using stdio.h)

rounding the number

What is wrong with this code of perfect number generation

Can't figure out what's wrong with my Random Number Generator in C

Cant figure out what's wrong with my code..

Tags for this Thread