Optimizing my code

**Lina** · 10-15-2006

Hi,
i have to write a poly2d function that runs fast.
The following that i have is not fast, it is 7.06 ACPE, i know that Horner’s
rule would make it much faster but i don't know how to modify what i have
according to that. Please help me in optimizing my code. It has to be <3.00 ACPE

Code:

/* Data structures defined in another file */
extern val_t C[M][N];
extern val_t *x_p;
extern val_t *y_p;

val_t poly2d_precomp(val_t C[M][N]) {

    int i, j;
    val_t sol = 0.0;
   val_t xpwr = 1.0;
    val_t ypwr[N];
    val_t x = *x_p;
    val_t y = *y_p;

    ypwr[0] = 1.0;
    for (j = 1; j < N; j++) {
	ypwr[j] = ypwr[j-1]*y;
    }

    for (i = 0; i < M; i++) {
        for (j = 0; j < N; j++) {
            sol += C[i][j] * (xpwr * ypwr[j]);
        }
	xpwr *= x;
    }
    return sol;
}

**Happy_Reaper** · 10-15-2006

How about looking up Horner's rule through google ?

**yxunomei** · 10-15-2006

open www.google.com

on the search bar, type "Horner's rule", that should help you

**Salem** · 10-15-2006

Or Homer's rule:
"The circumference of a donut is in direct proportion to the time it takes to eat it"

**OnionKnight** · 10-15-2006

Not sure what needs to be changed in your function since I don't know how it works but you could apply Horner's rule by having an array that hold coefficients and a variable that holds the result.
Loop through the array and set the and multiply the result by the function variable and add the next coefficient in the array to it and store this in the result variable.

Code:

/* Computes f(x) = 4x^2 + 3x + 5 */

int f (int x)
{
	int coeff[3] = {4, 3, 5};
	int y = 0;
	int i;

	for (i = 0; i < 3; i++)
		y = y*x + coeff[i];

	return y;
}

int main (void)
{
	int x;

	printf("x = ");
	scanf("%d", &x);
	printf("y = %d", f(x));

	return 0;
}

Unrolling that loop gives
y = (((0*x) + 4)*x + 3)*x + 5
y = (4*x + 3)*x + 5
y = 4*x*x + 3*x + 5

Can be further optimized by setting y = coeff[0] right off the bat and also setting i = 1 at the start of the loop.

**Lina** · 10-15-2006

the flollowing is a simple way of implementing the function, then it's optimized to the one i posted earlier. Now, i want to optimize it more. So i thought that implemting Horner's rule would make it run faster.

Code:

/* 
 * poly2d_base - The simple base version of poly2d
 */
char poly2d_base_descr[] = "poly2d_baseline: Baseline implementation";
val_t poly2d_base(val_t C[M][N]) {

    int i, j;
    val_t sol = 0.0;
    val_t xpwr = 1.0;
    val_t ypwr = 1.0;
    val_t x = *x_p;
    val_t y = *y_p;

    for (i = 0; i < M; i++) {
	ypwr = 1.0;
        for (j = 0; j < N; j++) {
            sol += C[i][j] * (xpwr * ypwr);
	    ypwr *= y;
        }
	xpwr *= x;
    }
    return sol;
}

this is the algorithm of Horner's rule
Definition: A polynomial A(x) = a0 + a1x + a2x² + a3x³ + ... may be written as A(x) = a0 + x(a1 + x(a2 + x(a3 + ...))).

**Lina** · 10-15-2006

Hi Onion Knigt,

my function has to looke like following:

Code:

val_t poly2d_anyName(val_t C[M][N]) {
/* play with what is inside the function to make it run faster*/
}

**Dave_Sinkula** · 10-15-2006

Code:

sol += C[i][j] * (xpwr * ypwr);

Investigate whether or not the resulting assembly code generates successive [re]calculations of C[i][j] within the loop. If so, your code may benefit from using pointers calculated once at the start of the loop.

[edit]But mostly, continue to look for ways to reduce the looping (algorithm stuff).

**Lina** · 10-16-2006

how can the pointers be caluclated once at the start of the loop?

**iMalc** · 10-17-2006

Originally Posted by Lina

Hi,
The following that i have is not fast, it is 7.06 ACPE, i know that Horner’s
rule would make it much faster but i don't know how to modify what i have
according to that. Please help me in optimizing my code. It has to be <3.00 ACPE

OK, so wth is ACPE?

**Lina** · 10-17-2006

amortized cycles per matrix element (ACPE), where ACPE is defined as the total number of CPU cycles to perform the polynomial calculation divided by M × N, the number of elements in the matrix C.

**Lina** · 10-20-2006

we have to be optimizing the performance of a function f that computes the value of a bivariate polynomial of order M + N over variables x and y:
where C[i][j] is the coefficient of the x^i y^j term.Thus, we have to make this function run as fast as possible. Moreover, The body of the function should not call any other functions
Please Help!

**Salem** · 10-20-2006

> Definition: A polynomial A(x) = a0 + a1x + a2x² + a3x³ + ... may be written as A(x) = a0 + x(a1 + x(a2 + x(a3 + ...))).
So why don't you do this - or at least attempt it.

Hint - run the loop backwards so you sum the 'an' term first and finish up by just adding the 'a0' term.

**Lina** · 10-21-2006

This following is what i got but it is really slow ACPE=11.8 and i have to get <3
please help. I'm really desperate

Code:

val_t poly2d_h(val_t C[M][N]) {


    val_t sol = (val_t)0;
    val_t x = *x_p;
    val_t y = *y_p;
    int i, j;


    for (i = M-1; i >=0; --i) {
      val_t temp=C[i][N-1];
        for (j = N-2; j>=0; --j) {
            temp=C[i][j]+y *temp;
        }
        sol=temp+x*sol;
    }
    return sol;
}

**Dave_Sinkula** · 10-21-2006

Originally Posted by Dave_Sinkula

Code:

sol += C[i][j] * (xpwr * ypwr);

Investigate whether or not the resulting assembly code generates successive [re]calculations of C[i][j] within the loop. If so, your code may benefit from using pointers calculated once at the start of the loop.

First, this advice sucks, so I'm glad you didn't pursue it. I did investigate, and found that it can make things worse. And such micro-optimization should generally be avoided.

Your attempts so far seem very reasonable and you seem to know quite well what it is you are doing, so my further advice on your problem would be questionable.

I am not familiar with ACPE, but I've tried to look into it a bit. I was wondering if you've got some tool that you use that might be helpful to share with the rest of us?

Also, such as in the case of your precomputed version, is the amortization over a single call to the function, or is it amortized over all calls? The point being, can you do something like this?

Code:

val_t poly2d_dave(val_t C[M][N])
{
   static val_t coeffs[M][N] = {0};
   static int init = 0;
   int i, j;
   val_t sol = 0.0;

   if ( init == 0 )
   {
      val_t xpwr = 1.0;
      val_t ypwr = 1.0;
      val_t x = *x_p;
      val_t y = *y_p;
      for ( i = 0; i < M; i++ )
      {
         ypwr = 1.0;
         for ( j = 0; j < N; j++ )
         {
            coeffs[i][j] = xpwr * ypwr;
            ypwr *= y;
         }
         xpwr *= x;
      }
      init = 1;
   }

   for ( i = 0; i < M; i++ )
   {
      for ( j = 0; j < N; j++ )
      {
         sol += C[i][j] * coeffs[i][j];
      }
   }
   return sol;
}

Again, my advice on this stinks. I come from the land of micro-optimization and I know those techniques better. Perhaps some of them can combine with better algorithms, but I'm venturing outside of my normal sphere when I look at improving the algorithm.

[edit]I shouldn't. But oh, well.

Code:

val_t poly2d_dave(val_t C[M][N])
{
   static val_t coeffs[M][N] = {0};
   static int init = 0;
   val_t *mptr = &C[0][0], *cptr = &coeffs[0][0];
   int i, j;
   val_t sol = 0.0;

   if ( init == 0 )
   {
      val_t xpwr = 1.0;
      val_t ypwr = 1.0;
      val_t x = *x_p;
      val_t y = *y_p;
      for ( i = 0; i < M; i++ )
      {
         ypwr = 1.0;
         for ( j = 0; j < N; j++ )
         {
            coeffs[i][j] = xpwr * ypwr;
            ypwr *= y;
         }
         xpwr *= x;
      }
      init = 1;
   }

   for ( i = 0; i < M * N; i++ )
   {
      sol += *mptr++ * *cptr++;
   }
   return sol;
}

After the first time, the loop should be tighter.

But, of course, that depends on the amortization question I asked. And just ignore the stupid micro-optimization attempts here please.

Thread: Optimizing my code

Thread Tools

Search Thread

Display

Optimizing my code

Similar Threads

Proposal: Code colouring

Values changing without reason?

optimizing my quaternion code

Interface Question

Who will map the scan code (inserted by VKD_Force_keys) to virtual key code?