# Thread: FPN math

1. Had an idea on how to correct the exponent, I'm 1 off but still better than before:
Code:
```LDBL makeFPN( ullong num, ullong fpn, long exp ) {
LDBL dst = {0};
ullong one = 10;
long pos = 0;
long bitc = 1, bits_base_needs = 4;
if ( num ) {
dst.man = num;
for ( pos = 1; pos < FPNT_MAN_BIT; ++pos ) {
num /= 10;
one *= 10;
if ( !num ) break;
}
dst.exp = --pos;
}
if ( dst.man || fpn ) {
dst.exp += FPNT_EXP_MAX;
for ( ; pos < FPNT_MAN_BIT; ++pos, ++bitc ) {
fpn *= 2;
dst.man <<= 1;
if ( fpn >= one ) {
dst.man |= 1u;
fpn -= one;
if ( bitc == bits_base_needs ) dst.exp--;
}
if ( bitc == bits_base_needs ) bitc = 1;
}
}
pos = 0;
for ( ; pos < exp; ++pos, dst.man *= 2 );
for ( ; pos > exp; --pos, dst.man *= 2 );
if ( dst.exp ) dst.exp--;
return dst;
}```
Now I REALLY gotta go to work, laters.

2. Originally Posted by awsdert
Also now that I've looked at it a bit more you're version relies on integers bigger than a float, as you may have guessed I will be attempting long doubles later so I'd like to use a version that does not rely on larger integers because that opens up the problem of what to do when there is no such integers, while I can program array type integers I still struggle with the division part which would be essential at the top of your function, I would also need to program a log10 etc variant for those array integers, all that still excludes the possibility of not enough memory anyway.
Take a look at strtod() implementation from glibc... it uses multiple precision (MP) arithmetic. So you need integers with precision greater than the targeted floating point type. You only need integers to hold 2 times the precision of integer type (which is always bigger then the precision of the floating point type). This could be worse for convertions from char to "float", for example, where the expoent is between -126 and 126 (for normalized floats) - perhaps you'll need a MP with 256 bits (to be safe), to do those shifts. But it's not a big deal in terms of memory usage.

As you've seen, log10 and pow10 is not a big deal to implement...

With my test, if you are using GCC or CLANG, you can use __int128 type for long doubles.

3. PS: Maybe you can get everything you need from here. This is one of the best books I've seen on floating point...

4. Thanks but I can't look at GCC code since doing so could result in me accidentally violating the GPL they're probably using, anyway I was thinking about it at work (which I'm still at) and realised my math is only missing 1 set of looping on mantissa to be correct, I'll need to figure that out then I should be able to test against more advanced floats, to be precise another C would've been in there if the final 8 wasn't cut off, that C is what I need to generate and the exponent will fix itself in the process

5. Occured to me I hadn't checked whole numbers greater than 1 so stuck in a test for 101.0, after some tinkering I got that to produce correctly and now I'm back to trying to fix the mantissa & exponent on the 0.1 example:
Code:
```gcc -Wall -o "test_fpn" "test_fpn.c" && "./test_fpn"
test.exp 00 mine.exp 00 test.man 000000 mine.man 000000 test.hex 00 00 00 00 mine.hex 00 00 00 00 test.fpn 0.000000e+00 mine.fpn 0.000000e+00
test.exp 7F mine.exp 7F test.man 000000 mine.man 000000 test.hex 00 00 80 3F mine.hex 00 00 80 3F test.fpn 1.000000e+00 mine.fpn 1.000000e+00
test.exp 85 mine.exp 85 test.man 4A0000 mine.man 4A0000 test.hex 00 00 CA 42 mine.hex 00 00 CA 42 test.fpn 1.010000e+02 mine.fpn 1.010000e+02
test.exp 7B mine.exp 96 test.man 4CCCCD mine.man 000000 test.hex CD CC CC 3D mine.hex 00 00 00 4B test.fpn 1.000000e-01 mine.fpn 8.388608e+06
Compilation finished successfully.```
Code:
```LDBL makeFPN( ullong num, ullong fpn, long exp ) {
LDBL dst = {0};
long base = 10, pos = 0, bit,
pos_max = FPNT_MAN_BIT,
exp_max = FPNT_EXP_MAX;
ullong one = base, dec = 0;
dst.man = num;
for ( ; pos < pos_max; ++pos ) {
if ( num == 1 ) break;
num >>= 1;
one <<= 1;
}
if ( num || fpn )
dst.exp = exp_max + pos - 1;
if ( fpn ) dec = 1;
else fpn = one / base;
while ( pos < pos_max ) {
for ( bit = 1; bit < 4 && pos < pos_max; ++bit, ++pos ) {
fpn *= 2;
dst.man <<= 1;
if ( fpn >= one ) {
dst.man |= dec;
fpn -= one;
}
dst.exp -= dec;
}
}
if ( dst.man ) {
pos = 0;
for ( ; pos < exp; ++pos, dst.man *= 2 );
for ( ; pos > exp; --pos, dst.man *= 2 );
}
return dst;
}
typedef struct test_val {
ullong num;
ullong fpn;
long exp;
} test_val_t;

test_val_t values[] = {
{0}, {1,0}, {101,0}, {0,1,0}, {0,101,0}, {101,101,0},
{0,0,10}, {1,1,10}, {0,1,10}, {1,1,10},
{3,14,0}, {3,14,10} };
float floats[] = {
0, 1, 101, 0.1, 0.101, 1.1,
0.0e+10, 1.0e+10, 101.0e+10, 0.1e+10, 0.101e+10, 1.1e+10,
3.14, 3.14e+10 };```

6. Some more tinkering got me back my 'C's
Code:
```gcc -Wall -o "test_fpn" "test_fpn.c" && "./test_fpn"
test.exp 00 mine.exp 00 test.man 000000 mine.man 000000 test.hex 00 00 00 00 mine.hex 00 00 00 00 test.fpn 0.000000e+00 mine.fpn 0.000000e+00
test.exp 7F mine.exp 7F test.man 000000 mine.man 000000 test.hex 00 00 80 3F mine.hex 00 00 80 3F test.fpn 1.000000e+00 mine.fpn 1.000000e+00
test.exp 85 mine.exp 85 test.man 4A0000 mine.man 4A0000 test.hex 00 00 CA 42 mine.hex 00 00 CA 42 test.fpn 1.010000e+02 mine.fpn 1.010000e+02
test.exp 7B mine.exp 77 test.man 4CCCCD mine.man 0CCCCC test.hex CD CC CC 3D mine.hex CC CC 8C 3B test.fpn 1.000000e-01 mine.fpn 4.296875e-03
Compilation finished successfully.```
Code:
```LDBL makeFPN( ullong num, ullong fpn, long exp ) {
LDBL dst = {0};
long base = 10, pos = 0, bit,
pos_max = FPNT_MAN_BIT,
exp_max = FPNT_EXP_MAX;
ullong one = base, dec = 0;
dst.man = num;
if ( num ) {
for ( ; pos < pos_max; ++pos ) {
if ( num == 1 ) break;
num >>= 1;
one <<= 1;
}
}
if ( num || fpn )
dst.exp = exp_max + pos - 1;
if ( fpn ) dec = 1;
else fpn = one / base;
while ( pos < pos_max ) {
for ( bit = 1; bit < 4 && pos < pos_max; ++bit, ++pos ) {
fpn *= 2;
dst.man <<= 1;
if ( fpn >= one ) {
dst.man |= dec;
fpn -= one;
}
}
dst.exp -= dec;
}
if ( dst.man ) {
pos = 0;
for ( ; pos < exp; ++pos, dst.man *= 2 );
for ( ; pos > exp; --pos, dst.man *= 2 );
}
return dst;
}```

7. At long last I got it, for that test anyway, the next test seems to skip the mantissa altogether:
Code:
```gcc -Wall -o "test_fpn" "test_fpn.c" && "./test_fpn"
test.exp 00 mine.exp 00 test.man 000000 mine.man 000000 test.hex 00 00 00 00 mine.hex 00 00 00 00 test.fpn 0.000000e+00 mine.fpn 0.000000e+00
test.exp 7F mine.exp 7F test.man 000000 mine.man 000000 test.hex 00 00 80 3F mine.hex 00 00 80 3F test.fpn 1.000000e+00 mine.fpn 1.000000e+00
test.exp 85 mine.exp 85 test.man 4A0000 mine.man 4A0000 test.hex 00 00 CA 42 mine.hex 00 00 CA 42 test.fpn 1.010000e+02 mine.fpn 1.010000e+02
test.exp 7B mine.exp 7B test.man 4CCCCD mine.man 4CCCCD test.hex CD CC CC 3D mine.hex CD CC CC 3D test.fpn 1.000000e-01 mine.fpn 1.000000e-01
test.exp 7B mine.exp 7B test.man 4ED917 mine.man 000000 test.hex 17 D9 CE 3D mine.hex 00 00 80 3D test.fpn 1.010000e-01 mine.fpn 6.250000e-02
Compilation finished successfully.```
Code:
```LDBL makeFPN( ullong num, ullong fpn, long exp ) {
LDBL dst = {0};
long base = 10, pos = 0, bit,
pos_max = FPNT_MAN_BIT,
exp_max = FPNT_EXP_MAX;
ullong one = base, dec = 0;
dst.man = num;
if ( num ) {
for ( ; pos < pos_max; ++pos ) {
if ( num == 1 ) break;
num >>= 1;
one <<= 1;
}
}
else pos -= 4;
if ( num || fpn )
dst.exp = exp_max + pos - 1;
if ( fpn ) dec = 1;
else fpn = one / base;
while ( pos < pos_max ) {
for ( bit = 1; bit < 4 && pos < pos_max; ++bit, ++pos ) {
fpn *= 2;
dst.man <<= 1;
if ( fpn >= one ) {
dst.man |= dec;
fpn -= one;
}
}
//dst.exp -= dec;
}
if ( fpn && dec ) dst.man++;
if ( dst.man ) {
pos = 0;
for ( ; pos < exp; ++pos, dst.man *= 2 );
for ( ; pos > exp; --pos, dst.man *= 2 );
}
return dst;
}```

8. Gotta go to work now, I'm confident the mantissa is calculated correctly now but the exponent needs work:
Code:
```gcc -Wall -o "test_fpn" "test_fpn.c" && "./test_fpn"
test.exp 00 mine.exp 00 test.man 000000 mine.man 000000 test.hex 00 00 00 00 mine.hex 00 00 00 00 test.fpn 0.000000e+00 mine.fpn 0.000000e+00
test.exp 7F mine.exp 7F test.man 000000 mine.man 000000 test.hex 00 00 80 3F mine.hex 00 00 80 3F test.fpn 1.000000e+00 mine.fpn 1.000000e+00
test.exp 85 mine.exp 85 test.man 4A0000 mine.man 4A0000 test.hex 00 00 CA 42 mine.hex 00 00 CA 42 test.fpn 1.010000e+02 mine.fpn 1.010000e+02
test.exp 7B mine.exp 7F test.man 4CCCCD mine.man 4CCCCD test.hex CD CC CC 3D mine.hex CD CC CC 3F test.fpn 1.000000e-01 mine.fpn 1.600000e+00
test.exp 7B mine.exp 7D test.man 4ED917 mine.man 4ED917 test.hex 17 D9 CE 3D mine.hex 17 D9 CE 3E test.fpn 1.010000e-01 mine.fpn 4.040000e-01
Compilation finished successfully.```
Code:
```LDBL makeFPN( ullong num, ullong fpn, long exp ) {
LDBL dst = {0};
long base = 10, pos = 0, bit,
pos_max = FPNT_MAN_BIT,
exp_max = FPNT_EXP_MAX;
ullong one = 1, dec = 0;
dst.man = num;
if ( num ) {
for ( ; pos < pos_max; ++pos ) {
if ( num == 1 ) break;
num >>= 1;
one <<= 1;
}
while ( one <= fpn ) one *= base;
dst.exp = exp_max + pos - 1;
}
else if ( fpn ) {
while ( pos > -pos_max ) {
pos -= 1;
one *= base;
if (one > fpn) break;
}
dst.exp = exp_max + pos;
pos = -4;
}
if ( fpn ) dec = 1;
else fpn = one / base;
while ( pos < pos_max ) {
for ( bit = 1; bit < 4 && pos < pos_max; ++bit, ++pos ) {
fpn *= 2;
dst.man <<= 1;
if ( fpn >= one ) {
dst.man |= dec;
fpn -= one;
}
}
}
if ( fpn && dec ) dst.man++;
if ( dst.man ) {
pos = 0;
for ( ; pos < exp; ++pos, dst.man *= 2 );
for ( ; pos > exp; --pos, dst.man *= 2 );
}
return dst;
}```

9. Well I managed to get the exponent of 0.101 correct but I'm sure that is just a coincidence as 0.1's exponent comes out incorrect, either that or I'm missing a piece of logic:
Code:
```gcc -Wall -o "test_fpn" "test_fpn.c" && "./test_fpn"
test.exp 00 mine.exp 00 test.man 000000 mine.man 000000 test.hex 00 00 00 00 mine.hex 00 00 00 00 test.fpn 0.000000e+00 mine.fpn 0.000000e+00
test.exp 7F mine.exp 7F test.man 000000 mine.man 000000 test.hex 00 00 80 3F mine.hex 00 00 80 3F test.fpn 1.000000e+00 mine.fpn 1.000000e+00
test.exp 85 mine.exp 85 test.man 4A0000 mine.man 4A0000 test.hex 00 00 CA 42 mine.hex 00 00 CA 42 test.fpn 1.010000e+02 mine.fpn 1.010000e+02
test.exp 7B mine.exp 7D test.man 4CCCCD mine.man 4CCCCD test.hex CD CC CC 3D mine.hex CD CC CC 3E test.fpn 1.000000e-01 mine.fpn 4.000000e-01
test.exp 7B mine.exp 7B test.man 4ED917 mine.man 4ED917 test.hex 17 D9 CE 3D mine.hex 17 D9 CE 3D test.fpn 1.010000e-01 mine.fpn 1.010000e-01
Compilation finished successfully.```
Code:
```LDBL makeFPN( ullong num, ullong fpn, long exp ) {
LDBL dst = {0};
long base = 10, pos = 0, bit,
pos_max = FPNT_MAN_BIT,
exp_max = FPNT_EXP_MAX;
ullong one, dec = 0;
dst.man = num;
if ( num ) {
for ( ; pos < pos_max; ++pos ) {
if ( num == 1 ) break;
num >>= 1;
one <<= 1;
}
dst.exp = exp_max + pos - 1;
}
else if ( fpn ) {
for (
one = fpn, pos = -1;
one && pos > -pos_max;
--pos, one /= base
);
dst.exp = exp_max + pos - 1;
pos = -4;
}
for ( one = 1; one <= fpn; one *= base );
if ( fpn ) dec = 1;
else fpn = one / base;
while ( pos < pos_max ) {
for ( bit = 1; bit < 4 && pos < pos_max; ++bit, ++pos ) {
fpn *= 2;
dst.man <<= 1;
if ( fpn >= one ) {
dst.man |= dec;
fpn -= one;
}
}
}
if ( fpn && dec ) dst.man++;
if ( dst.man ) {
pos = 0;
for ( ; pos < exp; ++pos, dst.man *= 2 );
for ( ; pos > exp; --pos, dst.man *= 2 );
}
return dst;
}```

10. Some more changes and I got closer but not quite to desired exponent (later tests proved the method wrong but for now will do until fix slight issue in mantissa (which I'm assuming is a data issue):
Code:
```gcc -Wall -o "test_fpn" "test_fpn.c" && "./test_fpn"
test.exp 00 mine.exp 00 test.man 000000 mine.man 000000 test.fpn 0.000000e+00 mine.fpn 0.000000e+00 text '0'
test.exp 7F mine.exp 7F test.man 000000 mine.man 000000 test.fpn 1.000000e+00 mine.fpn 1.000000e+00 text '1'
test.exp 85 mine.exp 85 test.man 4A0000 mine.man 4A0000 test.fpn 1.010000e+02 mine.fpn 1.010000e+02 text '101'
test.exp 7B mine.exp 7B test.man 4CCCCD mine.man 4CCCCD test.fpn 1.000000e-01 mine.fpn 1.000000e-01 text '0.1'
test.exp 7B mine.exp 7B test.man 4ED917 mine.man 4ED917 test.fpn 1.010000e-01 mine.fpn 1.010000e-01 text '0.101'
test.exp 85 mine.exp 85 test.man 4A33B6 mine.man 4A33B7 test.fpn 1.011010e+02 mine.fpn 1.011010e+02 text '101.101'
test.exp 78 mine.exp 78 test.man 23D70A mine.man 4CCCCD test.fpn 1.000000e-02 mine.fpn 1.250000e-02 text '0.01'
Compilation finished successfully.```
Code:
```LDBL makeFPN( ullong num, ullong fpn, long exp, long zeros ) {
LDBL dst = {0};
long base = 10, pos = 0, bit,
pos_max = FPNT_MAN_BIT,
exp_max = FPNT_EXP_MAX;
ullong one = 1, dec = 0;
dst.man = num;
for ( one = 1; one <= fpn; one *= base );
if ( num ) {
for ( ; pos < pos_max; ++pos ) {
if ( num == 1 ) break;
num >>= 1;
}
dst.exp = exp_max + pos - 1;
}
else if ( fpn ) {
for ( pos = -4; zeros < 0 && pos > -pos_max; pos -= 3, ++zeros );
dst.exp = exp_max + pos - 1;
pos = -4;
}
if ( fpn ) dec = 1;
else fpn = one / base;
while ( pos < pos_max ) {
for ( bit = 1; bit < 4 && pos < pos_max; ++bit, ++pos ) {
fpn *= 2;
dst.man <<= 1;
if ( fpn >= one ) {
dst.man |= dec;
fpn -= one;
}
}
}
if ( fpn && dec ) dst.man++;
if ( dst.man ) {
pos = 0;
for ( ; pos < exp; ++pos, dst.man *= 2 );
for ( ; pos > exp; --pos, dst.man *= 2 );
}
return dst;
}
typedef struct test_val {
ullong num;
ullong fpn;
long exp;
long zeros;
} test_val_t;

test_val_t values[] = {
{0}, {1,0}, {101,0}, {0,1,0}, {0,101,0}, {101,101,0}, {0,1,0,-1},
{1,1,10,0}, {0,1,10,0}, {1,1,10,0}, {0,101,10,0}, {1,1,10,0},
{3,14,0}, {3,14,10} };
float floats[] = {
0, 1, 101, 0.1, 0.101, 101.101, 0.01,
1.0e+10, 101.0e+10, 0.1e+10, 0.101e+10, 1.1e+10,
3.14, 3.14e+10 };
char *text[] = {
"0", "1", "101", "0.1", "0.101", "101.101", "0.01",
"1.0e+10", "101.0e+10", "0.1e+10", "0.101e+10", "1.1e+10",
"3.14", "3.14e+10"
};
int main() {
int i;
LDBL test = {0}, mine = {0};
test_val_t *val;
for ( i = 0; i < 7; ++i ) {
test.fpn = floats[i];
val = &(values[i]);
mine = makeFPN( val->num, val->fpn, val->exp, val->zeros );
//printf( "test.sig %u mine.sig %u ", test.sig, mine.sig );
printf( "test.exp %02X mine.exp %02X ", test.exp, mine.exp );
printf( "test.man %06X mine.man %06X ", test.man, mine.man );
#if 0
print( "test.hex", test );
putchar(' ');
print( "mine.hex", mine );
putchar(' ');
#endif
printf("test.fpn %e mine.fpn %e text '%s'\n", test.fpn, mine.fpn, text[i] );
}
return 0;
}```

11. I've no further ideas, this is as close as I've gotten, I'll probably leave it until late 2mw night or monday morning, until then I welcome new ideas:
Code:
```gcc -Wall -o "test_fpn" "test_fpn.c" && "./test_fpn"
test.exp 00 mine.exp 00 test.man 000000 mine.man 000000 test.fpn 0.000000e+00 mine.fpn 0.000000e+00 text '0'
test.exp 7F mine.exp 7F test.man 000000 mine.man 000000 test.fpn 1.000000e+00 mine.fpn 1.000000e+00 text '1'
test.exp 85 mine.exp 85 test.man 4A0000 mine.man 4A0000 test.fpn 1.010000e+02 mine.fpn 1.010000e+02 text '101'
test.exp 7B mine.exp 7B test.man 4CCCCD mine.man 4CCCCD test.fpn 1.000000e-01 mine.fpn 1.000000e-01 text '0.1'
test.exp 7B mine.exp 7B test.man 4ED917 mine.man 4ED917 test.fpn 1.010000e-01 mine.fpn 1.010000e-01 text '0.101'
test.exp 85 mine.exp 85 test.man 4A33B6 mine.man 4A33B6 test.fpn 1.011010e+02 mine.fpn 1.011010e+02 text '101.101'
test.exp 78 mine.exp 78 test.man 23D70A mine.man 19999A test.fpn 1.000000e-02 mine.fpn 9.375000e-03 text '0.01'
Compilation finished successfully.```
Code:
```LDBL makeFPN( ullong num, ullong fpn, long exp, long zeros ) {
LDBL dst = {0};
long base = 10, pos = 0, bit,
pos_max = FPNT_MAN_BIT,
exp_max = FPNT_EXP_MAX;
ullong one, dec = 0;
dst.man = num;
for ( one = 1; one <= fpn; one *= base );
if ( num ) {
for ( ; pos < pos_max; ++pos ) {
if ( num == 1 ) break;
num >>= 1;
}
num = dst.man;
dst.exp = exp_max + pos - 1;
}
else if ( fpn ) {
bit = zeros;
for ( pos = -4; zeros < 0 && pos > -pos_max; pos -= 3, ++zeros );
zeros = bit;
dst.exp = exp_max + pos - 1;
pos = -4;
}
if ( fpn ) dec = 1;
else fpn = 1;
while ( pos < pos_max ) {
for ( bit = 1; bit < 4 && pos < pos_max; ++bit, ++pos ) {
fpn *= 2;
dst.man <<= 1;
if ( fpn >= one ) {
dst.man |= dec;
fpn -= one;
}
}
}
if ( dst.man ) {
if ( dst.exp < (exp_max-1) ) {
dst.man++;
for ( ; zeros < 0; dst.man <<= 1, zeros++ );
}
pos = 0;
for ( ; pos < exp; ++pos, dst.man *= 2 );
for ( ; pos > exp; --pos, dst.man *= 2 );
}
return dst;
}```

12. I don't have time to fix it now but I've learned the issue with the last one is that it continues the loop 1 time too many so anyone trying to fix it themselves keep that in mind.

13. It shows the site cannot be reached on my end.... What's the books name?

14. Originally Posted by Zeus_
It shows the site cannot be reached on my end.... What's the books name?
Are you refering to post #18? It so: "Handbook of Floating-Point Arithmetic" (Jean-Michael Muller, Nicolas Brisebarre, Florent De Dinechin, Claude-Pierre Jeannerod, Vincent Lefèvre, Guillaume Melquiond, Nathalie Revol, Damien Stehlé and Serge Torres), ISBN 978-0-8176-4704-9, DOI 10.1007/978-0-8176-4705-6, e-ISBN 978-0-8176-4705-6, Birkhäuser Boston (Springer).

15. Yes, #18, sorry I forgot mentioning that. I use the Hybrid view so all replies look nested in my screen but for those who may not be viewing it in this way will find this post towards the end of the thread making it completely unrelatable to what I'm referring to. Anyways thanks!

[EDIT]
Btw, I've been reading that Hacker's Delight book too which you shared on @Asymptotic's thread on how to get better with bitwise stuff. It's truly insane and I've learnt so much bitwise math I didn't know existed. Thanks for that too. Pretty sure this'd make a good read.
[/EDIT]

Popular pages Recent additions