Thread: FPN math

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,735
    No not trying to convert a simple integer, trying to construct an fpn from 2 seperate integers, one for the whole number part and another for the decimal part, I'm trying to do it that way cause the function in mitsy will be doing it that way, starts with normal integer then when hits '.' it passes it into another variable, resets the one currently working with and continues normally, afetr the loop ends it checks the aforementioned variable and enters fpn mode if it's not 0, which now that I think of it will exclude 0.N so I'll have to rework that, doesn't change that I need 2 variables to keep track of though, the reason I don't just use a native float is that I need to support compiling to non-native systems, the simplest way to do that is construct the binary directly, the binary can then be passed into instructions or in preprocessor mode be passed into a native float via functions and then used in the expression given to the preprocessor

  2. #2
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by awsdert View Post
    No not trying to convert a simple integer, trying to construct an fpn from 2 seperate integers, one for the whole number part and another for the decimal part
    Fixed point to floating point then? If you already have the bits isn't its just a matter of shifting then to the correct position?

    Of course, you have to recalculate the fractional part... if you are dealing the integral and fractional parts as 32 bits values, 2^32 (fractional) is the same as 1.0, so:

    n = (2^32*f)/(10^(log10(f) + 1)).

    Taking 3.14 as your example... f=14 can be encoded as (2^32*14)/(10^2) = 601295421 (0b00100011110101110000101000111101 in 32 bits binary - of course this calculation must be done with enough precision to avoid overflows). So 3.14 can be encoded as 0b11.[00100011110101110000101000111101]. Shifting the binary point 1 position to the left we get 0b1.100100011110101110000101000111101 and e=1. Now we have our "inplicit" one and the fractional part that will satisfy the floating point float format if restricted to 23 bits: [0b1.100_1000_1111_0101_1100_0010]_1000111101.

    So, M=0b10010001111010111000010 (0x48f5c2 - 23 bits), E=128 (0x80) (E=e+127) and S=0. Almost exactly what is expected for a floating point (float) value... The only difference is about rounding. Notice the _10001111101 final part, if this msb is 1 we need to add 1 to M, getting exactly the correct value:

    To be sure:

    v = (1 + 0x48f5c3 / 2^23) * 2^1 = (1+4781507/2^23)*2 = 3.14000010490417480468
    Last edited by flp1969; 10-29-2019 at 02:49 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. C++ and Math
    By darren78 in forum C++ Programming
    Replies: 2
    Last Post: 07-08-2010, 09:19 AM
  2. hex math
    By kroiz in forum C Programming
    Replies: 25
    Last Post: 01-20-2009, 03:46 PM
  3. Basic Math Problem. Undefined Math Functions
    By gsoft in forum C Programming
    Replies: 1
    Last Post: 12-28-2004, 03:14 AM
  4. math.h
    By sweets in forum C++ Programming
    Replies: 2
    Last Post: 05-05-2003, 01:27 PM
  5. Math Help
    By CAP in forum C Programming
    Replies: 2
    Last Post: 08-19-2002, 12:03 AM

Tags for this Thread