Thread: Mathematical Expression Produces Incorrect Result

  1. #1
    Registered Superuser nul's Avatar
    Join Date
    Nov 2014
    Location
    Earth
    Posts
    53

    Mathematical Expression Produces Incorrect Result

    I've implemented the Euler method (Euler method - Wikipedia, the free encyclopedia), but what I'm actually having a problem with is evaluating the exact solution.

    The following is the mathematical expression I'm trying to compute:

    (t^6 - 37)^(1/3).


    Here is the C function:

    Code:
    double exact_solution( double t ){ //exact solution function
       return pow(pow(t, 6) - 37, (1.0 / 3));
    }
    With an input of 3.0, the C function produces a result of: 8.783047
    However a calculator produces: 8.845085422

    What do you guys think?

    EDIT: Okay. After testing just the expression, its not the problem. Here's the whole program.

    Code:
    #include <stdio.h>
    #include <math.h>
    
    
    double F( double t, double x ){ //slope function
       return pow(t,5) * 2 / (x * x);
    }
    
    
    double exact_solution( double t ){ //exact solution function
       return pow(pow(t, 6) - 37, (1.0 / 3));
    }
    
    
    int main(int argc, char *argv[]){
    
    
       double t = 2;           //independent var
       double delta_t = .01;   //step size
       double x_n1 = 3;        //x(0) = x_n1 ( initial value )
       double x_n2;
       double interval_size = 1;
       double exact = exact_solution(t);
    
    
       printf("step size -> %f\n\n", delta_t  );
    
    
       for(int i = 0;i < interval_size/delta_t; i++){
          x_n2 = x_n1 + delta_t * F(t, x_n1);
          x_n1 = x_n2;
          exact = exact_solution(t);
          t += delta_t;
          printf("%f %f %f %f\n", t, x_n2, exact, fabs(exact - x_n2) / exact * 100);
       }
    
    
       return 0;
    }
    Last edited by nul; 03-01-2015 at 11:19 PM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,662
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered Superuser nul's Avatar
    Join Date
    Nov 2014
    Location
    Earth
    Posts
    53
    Salem, thank you very much.

    What has me scratching my head is that the approximation comes out exactly as printed in the back of the book. So I use the double variable "t" as the parameter for both functions; one produces the expected result, while the other doesn't.

    My TA provided a spreadsheet to do the problems, but suffering from the not-invented-here syndrome, I really wanted to write my own C version.

  4. #4
    Registered Superuser nul's Avatar
    Join Date
    Nov 2014
    Location
    Earth
    Posts
    53
    So I haven't looked at this since my last post. Does anybody have a possible solution I could implement?

    I've thought of just calculating the exact value at pre-determined points using hard-coded constants to avoid the floating-point rounding error. I'd like to hear what you smart folks have to say.

  5. #5
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    nul, your issue is about understanding the mathematical method, not so much a programming issue.

    That said, perhaps the following variant of the same code helps you undestand better:
    Code:
    #include <stdlib.h>
    #include <string.h>
    #include <stdio.h>
    #include <math.h>
    
    /* Slope function f(t, x) = y'(t) where x = y(t) */
    static double f(const double t, const double x)
    {
        const double tt = t * t;
        return 2.0 * t * tt * (tt / (x * x)); /* 2 * t^5 / x^2 */
    }
    
    /* Known exact function y(t) */
    static double known_y(const double t)
    {
        const double t2 = t * t;
        const double t4 = t2 * t2;
        return cbrt(t2 * t4 - 37.0);
    }
    
    int main(int argc, char *argv[])
    {
        const double t0 = 2.0;  /* Starting point */
        const double y0 = 3.0;  /* Function at starting point */
        const double t1 = 3.0;  /* Ending point */
        double       h, y;
        long steps, i;
        char dummy;
    
        if (argc != 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
            fprintf(stderr, "\n");
            fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
            fprintf(stderr, "       %s STEPS\n", argv[0]);
            fprintf(stderr, "\n");
            return EXIT_FAILURE;
        }
    
        if (sscanf(argv[1], " %ld %c", &steps, &dummy) != 1 || steps < 1L) {
            fprintf(stderr, "%s: Invalid number of steps.\n", argv[1]);
            return EXIT_FAILURE;
        }
    
        h = (t1 - t0) / (double)steps;
        y = y0;
    
        printf("# t f(t,y) y y(t) h\n%.9f %.9f %.9f %.9f %.9f\n", t0, f(t0, y0), y0, known_y(t0), h);
        for (i = 1L; i <= steps; i++) {
            const double phase = (double)i / (double)steps;
            const double t = (1.0 - phase) * t0 + phase * t1;
            const double fvalue = f(t, y);
            y += h * fvalue;
            printf("%.9f %.9f %.9f %.9f\n", t, fvalue, y, known_y(t));
        }
    
        fflush(stdout);
        fprintf(stderr, "Expected  y = %.16f\n", known_y(t1));
        fprintf(stderr, "Evaluated y = %.16f\n", y);
    
        return EXIT_SUCCESS;
    }
    The Wikipedia Euler method article has a pretty good description. Let me re-describe it, with variable names matching the above program and the article.

    The problem at hand is an initial value problem. We know that
    dy(t) / dt = f(t, y(t))
    f(t, x) = 2 t5 / x2
    y(2) = 3.0
    and we want to find out
    y(3) = ?

    In other words, we don't know the function y(t), but we know its slope is f(t,x) where x=y(t). One way to approximate this is to apply the Euler method, where t=t0+nh, and h is the number of steps between t0=2 (the starting point) and t1=3 (the ending point), and we know the initial value y(t0)=y0=3.

    The above program has hardcoded the starting point (t0=2.0, t1=3.0, and y0=3.0), but the number of steps taken is supplied on the command line. The step size, h, is determined from those three (h = (t1 - t0) / steps).

    Because we have an integer number of steps (iterations), I'm using a stable numeric trick to compute t at each step. The number of iterations (starting at 1) divided by the total number of steps gives us phase, a value beween 0 (exclusive) and 1 (inclusive), at very high precision. We can then calculate t = (1.0-phase)*t0 + phase*t1 to calculate t without cumulative errors. (Here, the difference to just adding h to t at each step would not make much of a difference, but if the start and end point were dozens of orders of magnitude different, then this method shines: it will yield reliable values even then.)

    During each iteration, in the loop body, we precalculate t and fvalue = f(t, y), then update y=y+h*fvalue. This is equivalent to the Wikipedia article yN+1=yN+h×f(tN,yN). We use separate variables, so we can print the actual values used, as well as yN+1, the point of the iteration.

    After the loop is done, we have the approximation of y(t1) in y. To make it easier to see the effect the number of steps has on the approximation, the summary is printed to standard error (while all the intermediate values are printed to standard output). Note that the loop will run much faster, if only the end result were to be printed; printing floating-point values is kind-of slow, when you have lots of iterations.

    I compiled and ran the above as euler, with different numbers of steps. The results, summarized, are as follows:
    Code:
    #  Steps        Approximation    h
           1  57.0000000000000000    1
           2  15.1173635453141522    0.5
           3  11.2860983089115550    0.3333333
           5  10.0388388035749934    0.2
          10   9.4014859985922055    0.1
          20   9.1159639917882256    0.05
          50   8.9518255220655405    0.02
         100   8.8981953539855372    0.01
         500   8.8556663104274111    0.002
        1000   8.8503733129637165    0.001
       10000   8.8456139814468280    0.0001
      100000   8.8451382755005721    0.00001
     1000000   8.8450907071763734    0.000001
    10000000   8.8450859503674462    0.0000001
    
    #   Exact  8.8450854218326640
    nul, in your code delta_t = .01 = h, which translates to just 100 steps. As the Wikipedia article suggests, the error is roughly proportional to the step size, so you only get about two digits of precision. Using more steps (more values of t and smaller h (your delta_t) gives you more precision.

    Questions?

  6. #6
    Registered Superuser nul's Avatar
    Join Date
    Nov 2014
    Location
    Earth
    Posts
    53
    Thank you for the very detailed write-up/explanation along with code.

    I do have a question: would you mind explaining in a little more detail why/how this method of calculating t is so accurate? Maybe a link to a resource covering it?

  7. #7
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by nul View Post
    would you mind explaining in a little more detail why/how this method of calculating t is so accurate?
    Well, it boils down to floating-point precision, and errors in floating-point operations, especially cumulative errors and domain cancellation errors. It is a method you can show has no cumulative error, and very small errors for any possible values. I claim it is precise, because I know it has very small (sources of) errors.

    The numerical analysis guides and books available out there (a commonly recommended one is Numerical Recipes) explain the details. There is lots of interesting stuff there, starting from things like Kahan summation algorithm.

    First, the phase math should be obvious:
    0 ≤ phase ≤ 1
    t(0) = t0
    t(1) = t1
    t(phase) = t0 (1 - phase) + t1 phase

    Although mathematically
    t(phase) = t0 + phase (t1 - t0)
    we don't want to use that, because the subtraction may lead to large domain cancellation error.

    (Domain cancellation error occurs when summing or subtracting two floating-point values, with one having an exponent so much larger that the smaller one does not affect the result at all.)

    Given 1 ≤ iN, where i and N are of int or long type, (double)i / (double)N yields a double-precision floating point within 0 and 1, inclusive, with minimum rounding error. Substracting that value from 1 is precise, in the sense that the two floating-point values do sum exactly to 1. Multiplying any double-precision number by a double-precision number between 0 and 1 yields a result with minimum rounding error.

    The only source of possibly significant error is the addition: If the exponents in the two (multiplied) floating-point values differ enough, you get domain cancellation error: their sum is just the larger one (in magnitude). However, this is okay, as the earlier zero-to-one multiplier still gets us to the correct ballpark -- as close to the correct result as possible! Also, if the two exponents are similar, the error in the addition is neglible.

    Simply put, this method or expression just minimises the sources of floating-point errors. It is precise, because there are no sources of cumulative error, and all other error sources are kept to a minimum for all possible inputs.

    Sometimes the nearly-equivalent expression
    Code:
    const double t = ((double)(N - i) * t0 + (double)i * t1) / (double)N;
    is used. I *think* it is not as precise as the phase method above, when the range is very wide (or one endpoint close to zero, and another large in magnitude), due to the (largeish) divisor. However, it does not have any sources for cumulative error, so even this one should be much more precise than using a fixed step size, and adding it to t each step, even if not as precise as the above phase method.

    Now, let's compare the phase method to the fixed-step-size, additive method:

    When you use a fixed-size step, any rounding error in the step size will accumulate in t instead. If you have a million steps, the error in the sum is million times the rounding error in the step size!

    A practical example is always warranted.

    Consider the case where the starting point is very large negative value, say t0 = -1e30 (-1,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,00 0,000,000,000,000,000), and the end point is at zero, t1 = 0.0. Let's say we use a million steps. The step size is then (0--1e30)/1000000 = 1e30/1e6 = 1e24. However, that is not exactly representable by a double-precision floating-point number; the closest representable value is 999999999999999983222784. So, if you start at t0 and just add the step size a million times, your final result is 14176573061112791040 instead of zero!

    Of course, with the "phase" method, the final result in this case is zero, 0*t0 + 1*t1 = 0*(-1e30)+1*0 = 0.

    Here are the numerical values comparing the two methods for the above range, using 20 steps only:
    Code:
    Using fixed step size            Using zero-to-one phase method
    -1000000000000000019884624838656 -1000000000000000019884624838656
     -950000000000000032964142432256  -950000000000000032964142432256
     -900000000000000046043660025856  -900000000000000046043660025856
     -850000000000000059123177619456  -850000000000000059123177619456
     -800000000000000072202695213056  -800000000000000072202695213056
     -750000000000000085282212806656  -749999999999999944544724451328
     -700000000000000098361730400256  -700000000000000098361730400256
     -650000000000000111441247993856  -649999999999999970703759638528
     -600000000000000124520765587456  -600000000000000054152021409792
     -550000000000000137600283181056  -549999999999999996862794825728
     -500000000000000150679800774656  -500000000000000009942312419328
     -450000000000000163759318368256  -450000000000000023021830012928
     -400000000000000176838835961856  -400000000000000036101347606528
     -350000000000000189918353555456  -350000000000000049180865200128
     -300000000000000202997871149056  -300000000000000027076010704896
     -250000000000000216077388742656  -250000000000000004971156209664
     -200000000000000229156906336256  -200000000000000018050673803264
     -150000000000000224644237885440  -150000000000000013538005352448
     -100000000000000220131569434624  -100000000000000009025336901632
      -50000000000000215618900983808   -50000000000000004512668450816
                    -211106232532992                                0
    The differences tend to show up near the end of the interval. The differences also increase as the number of steps increases. The second-to-last one has already two digits less precision the phase method yields, and obviously the final values differ significantly.

    Many people do not care, because this only occurs if your ranges are "difficult"; one endpoint close to zero, another very large in magnitude. They say that such ranges are "pathological": exactly the kind that shows this problem; and the fix is to "not use such ranges".

    Indeed, if you compare the 20-step ranges from 0 to 12345678987654321, which is not pathological or difficult at all, just a normal range, you get
    Code:
    Fixed step size            Phase method
                      0                   0
        617283949382716     617283949382716
       1234567898765432    1234567898765432
       1851851848148148    1851851848148148
       2469135797530864    2469135797530864
       3086419746913580    3086419746913580
       3703703696296296    3703703696296296
       4320987645679012    4320987645679012
       4938271595061728    4938271595061728
       5555555544444444    5555555544444444
       6172839493827160    6172839493827160
       6790123443209876    6790123443209876
       7407407392592592    7407407392592592
       8024691341975308    8024691341975309  <- the only difference!
       8641975291358024    8641975291358024
       9259259240740740    9259259240740740
       9876543190123456    9876543190123456
      10493827139506172   10493827139506172
      11111111088888888   11111111088888888
      11728395038271604   11728395038271604
      12345678987654320   12345678987654320
    with only one difference: the exact correct value is 8024691341975308.65 (verified using Maple), which is represented in double precision by 8024691341975309.

  8. #8
    Registered Superuser nul's Avatar
    Join Date
    Nov 2014
    Location
    Earth
    Posts
    53
    Extremely informative. Thank you so much!!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Incorrect result in Bitwise operations
    By hjazz in forum C Programming
    Replies: 1
    Last Post: 10-30-2014, 12:27 AM
  2. incorrect result in program
    By johnmerlino in forum C Programming
    Replies: 2
    Last Post: 02-15-2014, 08:46 PM
  3. Evaluate mathematical expression
    By Nyah Check in forum C Programming
    Replies: 12
    Last Post: 06-20-2012, 04:29 PM
  4. Incorrect result with my program
    By JoshR in forum C++ Programming
    Replies: 4
    Last Post: 04-27-2005, 03:46 PM
  5. Replies: 4
    Last Post: 03-25-2003, 01:23 AM