I wonder if anyone can help me with two questions about floating point variables and modular arithmetic.
How are floating point numbers (float or double) added together? I understand that the variables are stored as one bit for sign, so many for the exponent and the rest for the closest binary approximation to the number. When you add two numbers together how is the answer actually calculated? For example is the number with the smaller exponent adjusted so that the exponents are the same for each number and then the approximations added together? I'm interested in the errors generated with floating point arithmetic, does anyone know of a good online reference for this?
Secondly, how does the % (modular arithmetic) operation calculate its answer? As in, to work out 10000%10 you could repeatedly subtract 10 from 10000 until the answer is less than 10. This would obviously be somewhat labour intensive so I'm interested in how it works. I'm wondering as I'm thinking about writing a function to find a%b for very large numbers.
Thanks in advance for any answers,