Bumping against the limits of precision

**KBriggs** · 04-28-2011

I have a complex little formula in my code in which variables are subtracted that are very close together. Their difference is many orders of magnitude smaller than the numbers being subtracted. C is evaluating the expression incorrectly.

The formula is as follows (all variables are doubles)

Code:

operand = a*a*(108.0*a*c*c*c + 108.0*b*b*b*d - 27.0*b*b*c*c - 486.0*a*b*c*d + 729.0*a*a*d*d);

The variables have the following values as calculated by the code

Code:

a: 2.0287864134649038e+07
b: 1.9026877897384940e+04
c: 4.4610669723446748e+00
d: -3.7933051584501112e-29

108.0*a*c*c*c: 1.9452539813523886e+11
108.0*b*b*b*d: -2.8219163158723021e-14
27.0*b*b*c*c: 1.9452539813523883e+11
486.0*a*b*c*d: -3.1746558553563405e-14
729.0*a*a*d*d: 4.3175338098787823e-40

compared to composite values calculated by hand using a calculator with better precision:

Code:

108.0*a*c*c*c: 1.94525398135238912527e11
108.0*b*b*b*d: -2.82191631587230194283e-14
27.0*b*b*c*c: 1.94525398135238861191e11
486.0*a*b*c*d: -3.17465585535634052349e-14
729.0*a*a*d*d: 4.31753380987878243389e-40

You can see that there are subtle differences between the values in the code and there, and that the first and third term almost cancel each other out, except for in the last few decimal places, which are not especially reliable.

The values calculated by the code give a value for operand of 1.2560956774113516e+10

The correct answer is 2.11297657006282327491e10.

So: I am bumping up against the limits of hardware precision here, I think. I need a way to reliable subtract numbers that only differ in maybe the 14th decimal place. Is such a things possible?

I have been playing around with order of operations in the formula, factoring out various terms. The value of operand can be changed dramatically just by changing the order in which things are done.

I would like to avoid using a higher precision library if at all possible - this will be executed several billion times in the course of a run and using software precision is going to slow things down a lot. Do I have alternatives?

Thanks

**anduril462** · 04-28-2011

What variable type are you using? If you're using float, you can upgrade to double. It still suffers similar precision issues, but with 64-bits, you get a lot better resolution. If doubles don't work, then you'll have to either use a 3rd party library, like GMP, or roll your own. Note that libraries like GMP are usually heavily optimized, so the performance hit might not be as bad as you expect.

**KBriggs** · 04-28-2011

I am using doubles, so the next step would be a 3rd party library I suppose. What a headache floating point subtraction is ^_^

**Salem** · 04-28-2011

Perhaps you could consider rearranging the expression into some equivalent form which avoids the issue?

**nonoob** · 04-28-2011

Instead of absolute comparison between A and B, you need to consider a comparison tolerance or epsilon.
The error should be some amount less than the magnitude of the numbers.
If using 'double' don't expect a precision greater than 15.9 decimal places. Due to round-off, shave off a few from that.
So perhaps 14 place accuracy.
Pseudo code:
if min(log10(A), log10(B)) - log10(abs(A - B)) >= 14 then they are "equal"

**KBriggs** · 04-28-2011

@Salem: I have been trying. As I said, the value of operand can be drastically changed just by reordering the equation and factoring things out. But so far, nothing is particularly reliable.

@nonoob: That is precisely the problem. Your test will show that the numbers are equal, but they are not, and that tiny (relative to the numbers themselves) difference is all-important to my calculation.

**iMalc** · 04-29-2011

The problem you are having is with Numerical Stability.

In short, adding or subtracting values of sufficiently differing magnitude loses precision.
It can be solved by putting all the values to add up into an array, sorting the values in the array ascending by their absolute value, and then adding them up starting from the smallest.

In C++ you could so it like so:

Code:

double vals[] = {108.0*a*c*c*c,  108.0*b*b*b*d, -27.0*b*b*c*c, -486.0*a*b*c*d, 729.0*a*a*d*d};
std::sort(vals, vals+sizeof(vals)/sizeof(vals[0]), myMagnitudeComparator()); //"batteries not included", so to speak
operand = a*a*std::accumulate(vals, vals+sizeof(vals)/sizeof(vals[0]), 0.0);

(Note that I'm assuming that std::accumulate adds the values up from first to last)
If you have trouble translating that to C, let me know.

**KBriggs** · 04-29-2011

Thanks iMalc, but that is not quite the problem I am having. My problem is that I have two numbers which are almost equal and differ only in the 14th (say) decimal place, and I need to extract that difference.

eg:

a = A
b = A - e

where e/A ~1e-14

what is a-b? C says 0, or possible something along the lines of e rounded to one decimal place.

**Salem** · 04-29-2011

Well there in lies the problem.
If the two numbers are identical in 14 decimal digits, then all you're going to get is 1 or 2 digits of precision when you subtract them. It can't "magic" another 14 bits of precision out of nothing, so you just get a whole load of noise bits instead.

Similarly, if one number is 15 decimal digits larger than another (say subtracting 1 from 1E15, then nothing will happen there either).

This long read explains what you're up against.
What Every Computer Scientist Should Know About Floating-Point Arithmetic

**KBriggs** · 04-29-2011

Thanks for the link.

I understand the problem, I was just hoping that this was a problem that had a standard approach to solving. I'll comb through the article and see what I can come up with.

**itCbitC** · 04-30-2011

Originally Posted by KBriggs

Code:

operand = a*a*(108.0*a*c*c*c + 108.0*b*b*b*d - 27.0*b*b*c*c - 486.0*a*b*c*d + 729.0*a*a*d*d);

The variables have the following values as calculated by the code

Code:

a: 2.0287864134649038e+07
b: 1.9026877897384940e+04
c: 4.4610669723446748e+00
d: -3.7933051584501112e-29

108.0*a*c*c*c: 1.9452539813523886e+11
108.0*b*b*b*d: -2.8219163158723021e-14
27.0*b*b*c*c: 1.9452539813523883e+11
486.0*a*b*c*d: -3.1746558553563405e-14
729.0*a*a*d*d: 4.3175338098787823e-40

compared to composite values calculated by hand using a calculator with better precision:

Code:

108.0*a*c*c*c: 1.94525398135238912527e11
108.0*b*b*b*d: -2.82191631587230194283e-14
27.0*b*b*c*c: 1.94525398135238861191e11
486.0*a*b*c*d: -3.17465585535634052349e-14
729.0*a*a*d*d: 4.31753380987878243389e-40

You can see that there are subtle differences between the values in the code and there, and that the first and third term almost cancel each other out, except for in the last few decimal places, which are not especially reliable.

The values calculated by the code give a value for operand of 1.2560956774113516e+10

The correct answer is 2.11297657006282327491e10.

Not sure if above answer is correct. Because putting those numbers and equations into bc (the *nix calculator with 99 digits of precision) I get a different answer. Or maybe I made an error somewhere. Lemme know if you have access to bc, so I can give you the bc script to figure out if the output you're getting with the high precision calculator is correct or not.

**KBriggs** · 05-01-2011

Actually I think I made a mistake. Redoing it I get operand = 9.6474321900824630266e9. Is that what you got?

The code is still wrong, though ^_^

I am using speedcunch

Thread: Bumping against the limits of precision

Thread Tools

Search Thread

Display

Bumping against the limits of precision

Similar Threads

bumping threads

limits.h