Thread: Need help optimizing loops....

  1. #1
    Registered User
    Join Date
    Feb 2005
    Posts
    14

    Need help optimizing loops....

    Hi all...I've written a program which successfully does very basic image manipulation utilizing a square image input source.

    I'm just curious if there's anything I can do to optimize the following code to make more efficient use of memory (loop unrolling, etc), especially when dealing with large images where the number of iterations per each loop can be overwhelming. I've achieved what I feel is a very basic performance baseline and I definitely think there's room for improvement...how much?..i'm not sure.


    Each image is a 2 dimensional matrix M ij .

    image_rotate first transposes each pair of pixels M ij to M ji , then exchanges row i with row N - 1 - i.


    Code:
    void image_rotate(int dim, pixel *src, pixel *dst)
    {
        int i, j;
    
        for (i = 0; i < dim; i++)
            for (j = 0; j < dim; j++)
                dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
    }

    image_smooth replaces every pixel value with the average of all the pixels around it ( in a maximum 3 x3 window centered at that pixel) where

    #define RIDX(i, j, n) ((i) * (n) + (j))

    Code:
    void image_smooth(int dim, pixel *src, pixel *dst)
    {
        int i, j;
    
        for (i = 0; i < dim; i++)
            for (j = 0; j < dim; j++)
                dst[RIDX(i, j, dim)] = avg(dim, i, j, src);
    }
    The function avg returns the average value of all the pixels around the (i, j)th pixel

    Code:
    static pixel avg(int dim, int i, int j, pixel *src)
    {
        int ii, jj;
        pixel_sum sum;
        pixel current_pixel;
    
        initialize_pixel_sum(&sum);
        for(ii = max(i-1, 0); ii <= min(i+1, dim-1); ii++)
            for(jj = max(j-1, 0); jj <= min(j+1, dim-1); jj++)
                accumulate_sum(&sum, src[RIDX(ii, jj, dim)]);
    
        assign_sum_to_pixel(&current_pixel, sum);
        return current_pixel;
    }


    Is there anything I can do to optimize these three blocks of code in any way? My main concerns are rotate and smooth. Thanks in advance.

  2. #2
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Before looking at the code, have you turned on your compiler's optimisation? The compiler's optimizer can often yield a surprising performance boost for mathematically and loop intensive code.

    How is pixel defined? For example, declaring pixel as an unsigned long instead of a structure may give a performance improvement.

    An optimisation you could make for both functions is to only calculate the row multiplication once for each iteration of i rather than for each iteration of j, although the compiler may already be doing this.

    In the avg function, you could remove the min call from the loop conditions and the max call from the inner loop initialiser. The function calls involved in image_smooth are a little troubling.

    For optimization of this sort, it is worthwhile checking the assembler output of your compiler.

  3. #3
    Registered User
    Join Date
    Feb 2005
    Posts
    14
    thanks much for your reply...I was messing around with loop unrolling (Which I hear is much more efficient) but the resultant image is not the same as the original code when I use the following to replace the original rotate function:


    Code:
    void rotate(int dim, pixel *src, pixel *dst)
    {
        int i, j;
    
        for (i = 0; i < dim; i+=4)
         for (j = 0; j < dim; j+=4)
          dst[RIDX(dim-1-j,i, dim)] = src[RIDX(i,j,dim)];
          dst[RIDX(dim-j,i+1, dim)] = src[RIDX(i+1,j+1,dim)];
          dst[RIDX(dim-j+1,i+2, dim)] = src[RIDX(i+2,j+2,dim)];
          dst[RIDX(dim-j+2,i+3, dim)] = src[RIDX(i+3,j+3,dim)];
    
       return;
    anyone know why? thanks

  4. #4
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Well, first of all, you need some brackets with your for loop. However, it's also unrolled incorrectly. Take a look at this and spot the differences:
    Code:
    void image_rotate(int dim, pixel *src, pixel *dst)
    {
        int i, j;
    
        for (i = 0; i < dim; i++)
        {
            for (j = 0; j < dim; j+=4)
            {
                dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
                dst[RIDX(dim-j, i, dim)] = src[RIDX(i, j+1, dim)];
                dst[RIDX(dim-j+1, i, dim)] = src[RIDX(i, j+2, dim)];
                dst[RIDX(dim-j+2, i, dim)] = src[RIDX(i, j+3, dim)];
            }
        }
    }
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  5. #5
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    This is really not my arena, but let's see if this makes any sense...

    I generally distrust macros, so I like to run the preprocessor occasionally. Let's take pianorain's code above and modify it slightly.
    Code:
    void image_rotate(int dim, pixel *src, pixel *dst)
    {
       int i, j;
    
       for ( i = 0; i < dim; i++ )
       {
          for ( j = 0; j < dim; j+=4 )
          {
             dst[RIDX(dim-j-1, i, dim)] = src[RIDX(i, j+0, dim)];
             dst[RIDX(dim-j+0, i, dim)] = src[RIDX(i, j+1, dim)];
             dst[RIDX(dim-j+1, i, dim)] = src[RIDX(i, j+2, dim)];
             dst[RIDX(dim-j+2, i, dim)] = src[RIDX(i, j+3, dim)];
          }
       }
    }
    Then run it through the preprocessor.
    Code:
    void image_rotate(int dim, pixel *src, pixel *dst)
    {
    int i, j;
    
    for ( i = 0; i < dim; i++ )
    {
    for ( j = 0; j < dim; j+=4 )
    {
    dst[((dim-j-1) * (dim) + (i))] = src[((i) * (dim) + (j+0))];
    dst[((dim-j+0) * (dim) + (i))] = src[((i) * (dim) + (j+1))];
    dst[((dim-j+1) * (dim) + (i))] = src[((i) * (dim) + (j+2))];
    dst[((dim-j+2) * (dim) + (i))] = src[((i) * (dim) + (j+3))];
    }
    }
    }
    To me it looks like the dst array index merely increases by dim each access; and the src array index merely increments by 1. So if you threw in a couple of temporaries and did this same thing,
    Code:
    void image_rotate0(int dim, pixel *src, pixel *dst)
    {
       int i, j;
    
       for ( i = 0; i < dim; i++ )
       {
          int a = dim * (dim - 1) + i;
          int b = dim * i;
          for ( j = 0; j < dim; j+=4 )
          {
             dst[a += dim] = src[b++];
             dst[a += dim] = src[b++];
             dst[a += dim] = src[b++];
             dst[a += dim] = src[b++];
          }
       }
    }
    How would that compare? Or did I just break the code? If I didn't break it already, could you then take that a step further with pointers?
    Code:
    void image_rotate1(int dim, pixel *src, pixel *dst)
    {
       int i, j;
    
       for ( i = 0; i < dim; i++ )
       {
          int a = dim * (dim - 1) + i;
          int b = dim * i;
          pixel *x = &dst [ a ];
          pixel *y = &src [ b ];
          for ( j = 0; j < dim; j+=4 )
          {
             x = y++; x += dim;
             x = y++; x += dim;
             x = y++; x += dim;
             x = y++; x += dim;
          }
       }
    }
    Well, something like that anyways.

    [EDIT=1]I'm pretty sure I broke the code by not accounting for j.

    [EDIT=2]What I was going for was something like this.
    Code:
    #include <stdio.h>
    
    typedef int type;
    
    void print(const type *array, int rows, int cols)
    {
       int i;
       for ( i = 0; i < rows * cols; ++i )
       {
          printf("%3d%c", array[i], i % cols == cols - 1 ? '\n' : ' ');
       }
       putchar('\n');
       fflush(stdout);
    }
    
    void rotate(type *dst, const type *src, int rows, int cols)
    {
       int r, c;
       for ( r = 0; r < rows; ++r )
       {
          type *cell = dst + rows - 1 - r;
          for ( c = 0; c < cols; ++c )
          {
             *cell = *src++;
             cell += rows;
          }
       }
    }
    
    type image[] =
    {
        1,  2,  3,  4,  5,  6,  7,  8,  9,  10,
       11, 12, 13, 14, 15, 16, 17, 18, 19,  20,
       21, 22, 23, 24, 25, 26, 27, 28, 29,  30,
       31, 32, 33, 34, 35, 36, 37, 38, 39,  40,
       41, 42, 43, 44, 45, 46, 47, 48, 49,  50,
       51, 52, 53, 54, 55, 56, 57, 58, 59,  60,
       61, 62, 63, 64, 65, 66, 67, 68, 69,  70,
       71, 72, 73, 74, 75, 76, 77, 78, 79,  80,
       81, 82, 83, 84, 85, 86, 87, 88, 89,  90,
       91, 92, 93, 94, 95, 96, 97, 98, 99, 100,
    };
    
    type result [ sizeof image / sizeof *image ];
    
    int main()
    {
       print(image, 10, 10);
       rotate(result, image, 10, 10);
       print(result, 10, 10);
    
       print(image, 5, 20);
       rotate(result, image, 5, 20);
       print(result, 20, 5);
    
       return 0;
    }
    
    /* my output
      1   2   3   4   5   6   7   8   9  10
     11  12  13  14  15  16  17  18  19  20
     21  22  23  24  25  26  27  28  29  30
     31  32  33  34  35  36  37  38  39  40
     41  42  43  44  45  46  47  48  49  50
     51  52  53  54  55  56  57  58  59  60
     61  62  63  64  65  66  67  68  69  70
     71  72  73  74  75  76  77  78  79  80
     81  82  83  84  85  86  87  88  89  90
     91  92  93  94  95  96  97  98  99 100
    
     91  81  71  61  51  41  31  21  11   1
     92  82  72  62  52  42  32  22  12   2
     93  83  73  63  53  43  33  23  13   3
     94  84  74  64  54  44  34  24  14   4
     95  85  75  65  55  45  35  25  15   5
     96  86  76  66  56  46  36  26  16   6
     97  87  77  67  57  47  37  27  17   7
     98  88  78  68  58  48  38  28  18   8
     99  89  79  69  59  49  39  29  19   9
    100  90  80  70  60  50  40  30  20  10
    
      1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
     21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
     41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60
     61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80
     81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
    
     81  61  41  21   1
     82  62  42  22   2
     83  63  43  23   3
     84  64  44  24   4
     85  65  45  25   5
     86  66  46  26   6
     87  67  47  27   7
     88  68  48  28   8
     89  69  49  29   9
     90  70  50  30  10
     91  71  51  31  11
     92  72  52  32  12
     93  73  53  33  13
     94  74  54  34  14
     95  75  55  35  15
     96  76  56  36  16
     97  77  57  37  17
     98  78  58  38  18
     99  79  59  39  19
    100  80  60  40  20
    */
    [EDIT=6]One final thought: more loop invariant code (and a useful return value) might be handled like this.
    Code:
    type *rotate(type *dst, const type *src, int rows, int cols)
    {
       int r, c;
       type *const col = dst + rows - 1;
       for ( r = 0; r < rows; ++r )
       {
          type *cell = col - r;
          for ( c = 0; c < cols; ++c )
          {
             *cell = *src++;
             cell += rows;
          }
       }
       return dst;
    }
    
    int main()
    {
       print(image, 10, 10);
       print(rotate(result, image, 10, 10), 10, 10);
    
       print(image, 5, 20);
       print(rotate(result, image, 5, 20), 20, 5);
    
       return 0;
    }
    Last edited by Dave_Sinkula; 02-24-2005 at 09:59 PM. Reason: [1] Noted error. [2] Tried to explain. Shoulda just left this one alone. [3] Changed 'transpose' to 'rotate'. [4] Changed 'x' to 'cell'. [5] More spaces in array intialization. (I really gotta stop.)
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Problem building Quake source
    By Silvercord in forum Game Programming
    Replies: 16
    Last Post: 07-11-2010, 09:13 AM
  2. Multiple thread for loops
    By lehe in forum C++ Programming
    Replies: 12
    Last Post: 03-29-2009, 12:01 PM
  3. Too many loops D:
    By F5 Tornado in forum C++ Programming
    Replies: 6
    Last Post: 12-03-2007, 01:18 AM
  4. Evaluation of nested loops
    By Mister C in forum C Programming
    Replies: 2
    Last Post: 08-13-2004, 01:47 PM
  5. help with arrays and loops
    By jdiazj1 in forum C Programming
    Replies: 4
    Last Post: 11-24-2001, 04:28 PM