# Thread: Need help optimizing loops....

1. ## Need help optimizing loops....

Hi all...I've written a program which successfully does very basic image manipulation utilizing a square image input source.

I'm just curious if there's anything I can do to optimize the following code to make more efficient use of memory (loop unrolling, etc), especially when dealing with large images where the number of iterations per each loop can be overwhelming. I've achieved what I feel is a very basic performance baseline and I definitely think there's room for improvement...how much?..i'm not sure.

Each image is a 2 dimensional matrix M ij .

image_rotate first transposes each pair of pixels M ij to M ji , then exchanges row i with row N - 1 - i.

Code:
```void image_rotate(int dim, pixel *src, pixel *dst)
{
int i, j;

for (i = 0; i < dim; i++)
for (j = 0; j < dim; j++)
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
}```

image_smooth replaces every pixel value with the average of all the pixels around it ( in a maximum 3 x3 window centered at that pixel) where

#define RIDX(i, j, n) ((i) * (n) + (j))

Code:
```void image_smooth(int dim, pixel *src, pixel *dst)
{
int i, j;

for (i = 0; i < dim; i++)
for (j = 0; j < dim; j++)
dst[RIDX(i, j, dim)] = avg(dim, i, j, src);
}```
The function avg returns the average value of all the pixels around the (i, j)th pixel

Code:
```static pixel avg(int dim, int i, int j, pixel *src)
{
int ii, jj;
pixel_sum sum;
pixel current_pixel;

initialize_pixel_sum(&sum);
for(ii = max(i-1, 0); ii <= min(i+1, dim-1); ii++)
for(jj = max(j-1, 0); jj <= min(j+1, dim-1); jj++)
accumulate_sum(&sum, src[RIDX(ii, jj, dim)]);

assign_sum_to_pixel(&current_pixel, sum);
return current_pixel;
}```

Is there anything I can do to optimize these three blocks of code in any way? My main concerns are rotate and smooth. Thanks in advance.

2. Before looking at the code, have you turned on your compiler's optimisation? The compiler's optimizer can often yield a surprising performance boost for mathematically and loop intensive code.

How is pixel defined? For example, declaring pixel as an unsigned long instead of a structure may give a performance improvement.

An optimisation you could make for both functions is to only calculate the row multiplication once for each iteration of i rather than for each iteration of j, although the compiler may already be doing this.

In the avg function, you could remove the min call from the loop conditions and the max call from the inner loop initialiser. The function calls involved in image_smooth are a little troubling.

For optimization of this sort, it is worthwhile checking the assembler output of your compiler.

3. thanks much for your reply...I was messing around with loop unrolling (Which I hear is much more efficient) but the resultant image is not the same as the original code when I use the following to replace the original rotate function:

Code:
```void rotate(int dim, pixel *src, pixel *dst)
{
int i, j;

for (i = 0; i < dim; i+=4)
for (j = 0; j < dim; j+=4)
dst[RIDX(dim-1-j,i, dim)] = src[RIDX(i,j,dim)];
dst[RIDX(dim-j,i+1, dim)] = src[RIDX(i+1,j+1,dim)];
dst[RIDX(dim-j+1,i+2, dim)] = src[RIDX(i+2,j+2,dim)];
dst[RIDX(dim-j+2,i+3, dim)] = src[RIDX(i+3,j+3,dim)];

return;```
anyone know why? thanks

4. Well, first of all, you need some brackets with your for loop. However, it's also unrolled incorrectly. Take a look at this and spot the differences:
Code:
```void image_rotate(int dim, pixel *src, pixel *dst)
{
int i, j;

for (i = 0; i < dim; i++)
{
for (j = 0; j < dim; j+=4)
{
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
dst[RIDX(dim-j, i, dim)] = src[RIDX(i, j+1, dim)];
dst[RIDX(dim-j+1, i, dim)] = src[RIDX(i, j+2, dim)];
dst[RIDX(dim-j+2, i, dim)] = src[RIDX(i, j+3, dim)];
}
}
}```

5. This is really not my arena, but let's see if this makes any sense...

I generally distrust macros, so I like to run the preprocessor occasionally. Let's take pianorain's code above and modify it slightly.
Code:
```void image_rotate(int dim, pixel *src, pixel *dst)
{
int i, j;

for ( i = 0; i < dim; i++ )
{
for ( j = 0; j < dim; j+=4 )
{
dst[RIDX(dim-j-1, i, dim)] = src[RIDX(i, j+0, dim)];
dst[RIDX(dim-j+0, i, dim)] = src[RIDX(i, j+1, dim)];
dst[RIDX(dim-j+1, i, dim)] = src[RIDX(i, j+2, dim)];
dst[RIDX(dim-j+2, i, dim)] = src[RIDX(i, j+3, dim)];
}
}
}```
Then run it through the preprocessor.
Code:
```void image_rotate(int dim, pixel *src, pixel *dst)
{
int i, j;

for ( i = 0; i < dim; i++ )
{
for ( j = 0; j < dim; j+=4 )
{
dst[((dim-j-1) * (dim) + (i))] = src[((i) * (dim) + (j+0))];
dst[((dim-j+0) * (dim) + (i))] = src[((i) * (dim) + (j+1))];
dst[((dim-j+1) * (dim) + (i))] = src[((i) * (dim) + (j+2))];
dst[((dim-j+2) * (dim) + (i))] = src[((i) * (dim) + (j+3))];
}
}
}```
To me it looks like the dst array index merely increases by dim each access; and the src array index merely increments by 1. So if you threw in a couple of temporaries and did this same thing,
Code:
```void image_rotate0(int dim, pixel *src, pixel *dst)
{
int i, j;

for ( i = 0; i < dim; i++ )
{
int a = dim * (dim - 1) + i;
int b = dim * i;
for ( j = 0; j < dim; j+=4 )
{
dst[a += dim] = src[b++];
dst[a += dim] = src[b++];
dst[a += dim] = src[b++];
dst[a += dim] = src[b++];
}
}
}```
How would that compare? Or did I just break the code? If I didn't break it already, could you then take that a step further with pointers?
Code:
```void image_rotate1(int dim, pixel *src, pixel *dst)
{
int i, j;

for ( i = 0; i < dim; i++ )
{
int a = dim * (dim - 1) + i;
int b = dim * i;
pixel *x = &dst [ a ];
pixel *y = &src [ b ];
for ( j = 0; j < dim; j+=4 )
{
x = y++; x += dim;
x = y++; x += dim;
x = y++; x += dim;
x = y++; x += dim;
}
}
}```
Well, something like that anyways.

[EDIT=1]I'm pretty sure I broke the code by not accounting for j.

[EDIT=2]What I was going for was something like this.
Code:
```#include <stdio.h>

typedef int type;

void print(const type *array, int rows, int cols)
{
int i;
for ( i = 0; i < rows * cols; ++i )
{
printf("%3d%c", array[i], i % cols == cols - 1 ? '\n' : ' ');
}
putchar('\n');
fflush(stdout);
}

void rotate(type *dst, const type *src, int rows, int cols)
{
int r, c;
for ( r = 0; r < rows; ++r )
{
type *cell = dst + rows - 1 - r;
for ( c = 0; c < cols; ++c )
{
*cell = *src++;
cell += rows;
}
}
}

type image[] =
{
1,  2,  3,  4,  5,  6,  7,  8,  9,  10,
11, 12, 13, 14, 15, 16, 17, 18, 19,  20,
21, 22, 23, 24, 25, 26, 27, 28, 29,  30,
31, 32, 33, 34, 35, 36, 37, 38, 39,  40,
41, 42, 43, 44, 45, 46, 47, 48, 49,  50,
51, 52, 53, 54, 55, 56, 57, 58, 59,  60,
61, 62, 63, 64, 65, 66, 67, 68, 69,  70,
71, 72, 73, 74, 75, 76, 77, 78, 79,  80,
81, 82, 83, 84, 85, 86, 87, 88, 89,  90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100,
};

type result [ sizeof image / sizeof *image ];

int main()
{
print(image, 10, 10);
rotate(result, image, 10, 10);
print(result, 10, 10);

print(image, 5, 20);
rotate(result, image, 5, 20);
print(result, 20, 5);

return 0;
}

/* my output
1   2   3   4   5   6   7   8   9  10
11  12  13  14  15  16  17  18  19  20
21  22  23  24  25  26  27  28  29  30
31  32  33  34  35  36  37  38  39  40
41  42  43  44  45  46  47  48  49  50
51  52  53  54  55  56  57  58  59  60
61  62  63  64  65  66  67  68  69  70
71  72  73  74  75  76  77  78  79  80
81  82  83  84  85  86  87  88  89  90
91  92  93  94  95  96  97  98  99 100

91  81  71  61  51  41  31  21  11   1
92  82  72  62  52  42  32  22  12   2
93  83  73  63  53  43  33  23  13   3
94  84  74  64  54  44  34  24  14   4
95  85  75  65  55  45  35  25  15   5
96  86  76  66  56  46  36  26  16   6
97  87  77  67  57  47  37  27  17   7
98  88  78  68  58  48  38  28  18   8
99  89  79  69  59  49  39  29  19   9
100  90  80  70  60  50  40  30  20  10

1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60
61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80
81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100

81  61  41  21   1
82  62  42  22   2
83  63  43  23   3
84  64  44  24   4
85  65  45  25   5
86  66  46  26   6
87  67  47  27   7
88  68  48  28   8
89  69  49  29   9
90  70  50  30  10
91  71  51  31  11
92  72  52  32  12
93  73  53  33  13
94  74  54  34  14
95  75  55  35  15
96  76  56  36  16
97  77  57  37  17
98  78  58  38  18
99  79  59  39  19
100  80  60  40  20
*/```
[EDIT=6]One final thought: more loop invariant code (and a useful return value) might be handled like this.
Code:
```type *rotate(type *dst, const type *src, int rows, int cols)
{
int r, c;
type *const col = dst + rows - 1;
for ( r = 0; r < rows; ++r )
{
type *cell = col - r;
for ( c = 0; c < cols; ++c )
{
*cell = *src++;
cell += rows;
}
}
return dst;
}

int main()
{
print(image, 10, 10);
print(rotate(result, image, 10, 10), 10, 10);

print(image, 5, 20);
print(rotate(result, image, 5, 20), 20, 5);

return 0;
}```