Thread: I can't figure out why my code is rounding to a wrong number.

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Here's a brief explanation about 'floating point' based on Salem's correct comments on the subject above.

    "Precision" is the number of significative digits (in this case, BITS) of a representation. 'double' has 53 bits of precision, 'float', 24... And, yes, 'int' has 31 bits of precision in 32 or 64 bits systems (usually). Floating point is a estruture to store RATIONAL values (a fraction A/B, where B is fixed). Any floating point (following IEEE 754 standard -- which is, today, de facto, standard) encodes a fractional "normalized" value using the formula:

    v=(-1)^S * ( 1 + F/(2^(P-1)) ) * 2^(E-bias)

    Where S, F and E are POSITIVE integer values... using 'double precision', P=53, so F has 52 bits, E has 11 bits and S, only 1.

    The value 1.8 is encoded as:

    1.8 approx (-1)^0*(1 + 3602879701896397/4503599627370496)*2^(1023-1023). Here S=0, F=3602879701896397 and E=1023. This gives us 1.800000000000000044408920985006261616945266723632 8125. Notice if the value 3602879701896396 for F, the final value would be 1.799999999999999822364316059974953532218933105468 75, which is off by a greater error...

    So, 'floating point' is ALWAYS an approximation based on a fraction of 2 integers.

    As Salem explained, if you divide 180 by this representation of 1.8, you'll get:

    Code:
    $ bc
    scale=50
    180/((-1)^0*(1 + 3602879701896397/(2^52))*2^(1023-1023))
    99.99999999999999753283772305520774881638218981451873
    When truncated to `int` will give you 99, not 100.

    My advice: Avoid using floating point at all costs!

    PS: How did I got those values? This way:

    Code:
    #include <stdio.h>
    
    // ieee-754 double precision structure.
    struct dbl_s
    {
      unsigned long long f: 52;
      unsigned long long e: 11;
      unsigned long long s: 1;
    } __attribute__ ( ( packed ) );
    
    int main ( void )
    {
      double d = 1.8;
      struct dbl_s *p = ( struct dbl_s * ) &d;
    
      printf ( "%.52f = (-1)^%u*(1 + %llu/%llu)*2^(%u-1023)\n",
               d,
               ( unsigned int ) p->s, 
               ( unsigned long long ) p->f, 
               1ULL << 52, 
               ( unsigned int ) p->e );
    }
    Last edited by flp1969; 11-18-2020 at 11:58 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 1
    Last Post: 02-23-2016, 12:46 AM
  2. rounding the number
    By lmanukyan in forum C Programming
    Replies: 3
    Last Post: 12-16-2015, 09:28 AM
  3. What is wrong with this code of perfect number generation
    By Shashank Mishra in forum C Programming
    Replies: 16
    Last Post: 06-02-2014, 03:27 AM
  4. Replies: 7
    Last Post: 09-08-2013, 10:42 PM
  5. Cant figure out what's wrong with my code..
    By ariella in forum C Programming
    Replies: 4
    Last Post: 07-14-2013, 06:40 AM

Tags for this Thread