Thread: Optimizing my code

  1. #1
    Registered User
    Join Date
    Sep 2006
    Posts
    18

    Optimizing my code

    Hi,
    i have to write a poly2d function that runs fast.
    The following that i have is not fast, it is 7.06 ACPE, i know that Horner’s
    rule would make it much faster but i don't know how to modify what i have
    according to that. Please help me in optimizing my code. It has to be <3.00 ACPE

    Code:
    /* Data structures defined in another file */
    extern val_t C[M][N];
    extern val_t *x_p;
    extern val_t *y_p;
    
    val_t poly2d_precomp(val_t C[M][N]) {
    
        int i, j;
        val_t sol = 0.0;
       val_t xpwr = 1.0;
        val_t ypwr[N];
        val_t x = *x_p;
        val_t y = *y_p;
    
        ypwr[0] = 1.0;
        for (j = 1; j < N; j++) {
    	ypwr[j] = ypwr[j-1]*y;
        }
    
        for (i = 0; i < M; i++) {
            for (j = 0; j < N; j++) {
                sol += C[i][j] * (xpwr * ypwr[j]);
            }
    	xpwr *= x;
        }
        return sol;
    }

  2. #2
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    How about looking up Horner's rule through google ?
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  3. #3
    The C eater *munch*
    Join Date
    Oct 2006
    Posts
    101
    open www.google.com

    on the search bar, type "Horner's rule", that should help you


  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Or Homer's rule:
    "The circumference of a donut is in direct proportion to the time it takes to eat it"
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User OnionKnight's Avatar
    Join Date
    Jan 2005
    Posts
    555
    Not sure what needs to be changed in your function since I don't know how it works but you could apply Horner's rule by having an array that hold coefficients and a variable that holds the result.
    Loop through the array and set the and multiply the result by the function variable and add the next coefficient in the array to it and store this in the result variable.
    Code:
    /* Computes f(x) = 4x^2 + 3x + 5 */
    
    int f (int x)
    {
    	int coeff[3] = {4, 3, 5};
    	int y = 0;
    	int i;
    
    	for (i = 0; i < 3; i++)
    		y = y*x + coeff[i];
    
    	return y;
    }
    
    int main (void)
    {
    	int x;
    
    	printf("x = ");
    	scanf("%d", &x);
    	printf("y = %d", f(x));
    
    	return 0;
    }
    Unrolling that loop gives
    y = (((0*x) + 4)*x + 3)*x + 5
    y = (4*x + 3)*x + 5
    y = 4*x*x + 3*x + 5

    Can be further optimized by setting y = coeff[0] right off the bat and also setting i = 1 at the start of the loop.

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    18
    the flollowing is a simple way of implementing the function, then it's optimized to the one i posted earlier. Now, i want to optimize it more. So i thought that implemting Horner's rule would make it run faster.


    Code:
    /* 
     * poly2d_base - The simple base version of poly2d
     */
    char poly2d_base_descr[] = "poly2d_baseline: Baseline implementation";
    val_t poly2d_base(val_t C[M][N]) {
    
        int i, j;
        val_t sol = 0.0;
        val_t xpwr = 1.0;
        val_t ypwr = 1.0;
        val_t x = *x_p;
        val_t y = *y_p;
    
        for (i = 0; i < M; i++) {
    	ypwr = 1.0;
            for (j = 0; j < N; j++) {
                sol += C[i][j] * (xpwr * ypwr);
    	    ypwr *= y;
            }
    	xpwr *= x;
        }
        return sol;
    }
    this is the algorithm of Horner's rule
    Definition: A polynomial A(x) = a0 + a1x + a2x&#178; + a3x&#179; + ... may be written as A(x) = a0 + x(a1 + x(a2 + x(a3 + ...))).

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    18
    Hi Onion Knigt,

    my function has to looke like following:

    Code:
    val_t poly2d_anyName(val_t C[M][N]) {
    /* play with what is inside the function to make it run faster*/
    }

  8. #8
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Code:
    sol += C[i][j] * (xpwr * ypwr);
    Investigate whether or not the resulting assembly code generates successive [re]calculations of C[i][j] within the loop. If so, your code may benefit from using pointers calculated once at the start of the loop.

    [edit]But mostly, continue to look for ways to reduce the looping (algorithm stuff).
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  9. #9
    Registered User
    Join Date
    Sep 2006
    Posts
    18
    how can the pointers be caluclated once at the start of the loop?

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Lina
    Hi,
    The following that i have is not fast, it is 7.06 ACPE, i know that Horner’s
    rule would make it much faster but i don't know how to modify what i have
    according to that. Please help me in optimizing my code. It has to be <3.00 ACPE
    OK, so wth is ACPE?

  11. #11
    Registered User
    Join Date
    Sep 2006
    Posts
    18
    amortized cycles per matrix element (ACPE), where ACPE is defined as the total number of CPU cycles to perform the polynomial calculation divided by M &#215; N, the number of elements in the matrix C.

  12. #12
    Registered User
    Join Date
    Sep 2006
    Posts
    18
    we have to be optimizing the performance of a function f that computes the value of a bivariate polynomial of order M + N over variables x and y:
    where C[i][j] is the coefficient of the x^i y^j term.Thus, we have to make this function run as fast as possible. Moreover, The body of the function should not call any other functions
    Please Help!

  13. #13
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > Definition: A polynomial A(x) = a0 + a1x + a2x&#178; + a3x&#179; + ... may be written as A(x) = a0 + x(a1 + x(a2 + x(a3 + ...))).
    So why don't you do this - or at least attempt it.

    Hint - run the loop backwards so you sum the 'an' term first and finish up by just adding the 'a0' term.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  14. #14
    Registered User
    Join Date
    Sep 2006
    Posts
    18
    This following is what i got but it is really slow ACPE=11.8 and i have to get <3
    please help. I'm really desperate

    Code:
    val_t poly2d_h(val_t C[M][N]) {
    
    
        val_t sol = (val_t)0;
        val_t x = *x_p;
        val_t y = *y_p;
        int i, j;
    
    
        for (i = M-1; i >=0; --i) {
          val_t temp=C[i][N-1];
            for (j = N-2; j>=0; --j) {
                temp=C[i][j]+y *temp;
            }
            sol=temp+x*sol;
        }
        return sol;
    }

  15. #15
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by Dave_Sinkula
    Code:
    sol += C[i][j] * (xpwr * ypwr);
    Investigate whether or not the resulting assembly code generates successive [re]calculations of C[i][j] within the loop. If so, your code may benefit from using pointers calculated once at the start of the loop.
    First, this advice sucks, so I'm glad you didn't pursue it. I did investigate, and found that it can make things worse. And such micro-optimization should generally be avoided.

    Your attempts so far seem very reasonable and you seem to know quite well what it is you are doing, so my further advice on your problem would be questionable.

    I am not familiar with ACPE, but I've tried to look into it a bit. I was wondering if you've got some tool that you use that might be helpful to share with the rest of us?

    Also, such as in the case of your precomputed version, is the amortization over a single call to the function, or is it amortized over all calls? The point being, can you do something like this?
    Code:
    val_t poly2d_dave(val_t C[M][N])
    {
       static val_t coeffs[M][N] = {0};
       static int init = 0;
       int i, j;
       val_t sol = 0.0;
    
       if ( init == 0 )
       {
          val_t xpwr = 1.0;
          val_t ypwr = 1.0;
          val_t x = *x_p;
          val_t y = *y_p;
          for ( i = 0; i < M; i++ )
          {
             ypwr = 1.0;
             for ( j = 0; j < N; j++ )
             {
                coeffs[i][j] = xpwr * ypwr;
                ypwr *= y;
             }
             xpwr *= x;
          }
          init = 1;
       }
    
       for ( i = 0; i < M; i++ )
       {
          for ( j = 0; j < N; j++ )
          {
             sol += C[i][j] * coeffs[i][j];
          }
       }
       return sol;
    }
    Again, my advice on this stinks. I come from the land of micro-optimization and I know those techniques better. Perhaps some of them can combine with better algorithms, but I'm venturing outside of my normal sphere when I look at improving the algorithm.

    [edit]I shouldn't. But oh, well.
    Code:
    val_t poly2d_dave(val_t C[M][N])
    {
       static val_t coeffs[M][N] = {0};
       static int init = 0;
       val_t *mptr = &C[0][0], *cptr = &coeffs[0][0];
       int i, j;
       val_t sol = 0.0;
    
       if ( init == 0 )
       {
          val_t xpwr = 1.0;
          val_t ypwr = 1.0;
          val_t x = *x_p;
          val_t y = *y_p;
          for ( i = 0; i < M; i++ )
          {
             ypwr = 1.0;
             for ( j = 0; j < N; j++ )
             {
                coeffs[i][j] = xpwr * ypwr;
                ypwr *= y;
             }
             xpwr *= x;
          }
          init = 1;
       }
    
       for ( i = 0; i < M * N; i++ )
       {
          sol += *mptr++ * *cptr++;
       }
       return sol;
    }
    After the first time, the loop should be tighter.

    But, of course, that depends on the amortization question I asked. And just ignore the stupid micro-optimization attempts here please.
    Last edited by Dave_Sinkula; 10-21-2006 at 07:49 PM. Reason: Pointless comments continued.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Proposal: Code colouring
    By Perspective in forum A Brief History of Cprogramming.com
    Replies: 28
    Last Post: 05-14-2007, 07:23 AM
  2. Values changing without reason?
    By subtled in forum C Programming
    Replies: 2
    Last Post: 04-19-2007, 10:20 AM
  3. optimizing my quaternion code
    By confuted in forum Game Programming
    Replies: 9
    Last Post: 08-16-2003, 08:50 PM
  4. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM
  5. Replies: 0
    Last Post: 02-21-2002, 06:05 PM