Originally Posted by
hamster_nz
^^^ Are you trying to lose friends? Cause comments like that won't make any ^^^
The reason I haven't had any more input are
a) A Harry Potter movie isn't on - so I'm busy on other stuff. I'm building an Audio Distortion Meter as time allows.
b) I know enough about floating point to know that there is a lot to know. Fixing an odd 'doing something dumb in C' is relatively easy, but I don't want to spend hours reading papers to fully inform my self to help assist you with stuff where I don't have the depth of experience.
c) I'm more than happy to help somebody once or twice, but I'm nobody's code monkey for ever.
c) I don't really agree with how you are going about implementing it. Seems to be a silly way to do it. But that is because I don't appreciate your aims.
If was implementing FP addition/subtraction in software. Here's my outline of implementation:
1. Split the FP registers into parts (sign, exponent, mantissa) maybe adding guard bits too.
2. Handle special cases (zeros, +INF/-INF, NaN)
3. Fix up the mantissa of both parts, and deal with 'denormalized' cases (very small numbers)
4. work out how much to right-shift the smaller FP number, to give a common exponent as the other one.
6. Merge the sign into the mantissa (convert it from sign+magnitude to two's complement.
7. Finally do the addition - just simple binary addition.
8. Handle if the exponent has increased or decreased - i.e. 're-normalize'
9. Convert back to sign+magnitude
10. Process rounding
11. Convert back to IEEE format and pack into the register
If I was doing it for implementation in hardware (e.g. an FPGA) I would be doing it very differently.