Thread: optimizing repetitive code

  1. #1
    Registered User
    Join Date
    Feb 2003
    Posts
    596

    optimizing repetitive code

    Suppose a very simple snippet of code like this
    Code:
    for( counter = 0; counter < 3; ++counter ) {
        array[counter] = *(sourcePointer+counter);
    }
    is part of a loop that runs many thousands of times during a function's execution.
    int counter;
    int array[3];
    int* sourcePointer;
    are declared elsewhere, and sourcePointer is a different address on each iteration.

    Would this run faster if written as
    Code:
    array[0] = *sourcePointer;
    array[1] = *(sourcePointer+1);
    array[2] = *(sourcePointer+2);
    eliminating incrementing and testing the counter variable?
    Or do compilers do this anyway?

    (For real-time image analysis on low-powered equipment, so I'm curious about even very tiny savings.)

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    That would very much depend on the compiler. gcc 3.4 doesn't figure out that it's possible to replace the loop with three separate statements. Microsoft visual studio .Net does.

    Edit: Why do you copy it into an array in the first place? Is there any reason you can't just use sourcePointer[0], sourcePointer[1] and sourcePointer[2] directly in your code?
    --
    Mats
    Last edited by matsp; 08-23-2008 at 09:39 AM.
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I thought loop unrolling has been in gcc for a while? (-O3)

  4. #4
    Registered User
    Join Date
    Feb 2003
    Posts
    596
    Quote Originally Posted by matsp View Post
    Why do you copy it into an array in the first place? Is there any reason you can't just use sourcePointer[0], sourcePointer[1] and sourcePointer[2] directly in your code?
    --
    Mats
    A couple of reasons (which I hope are valid). Firstly, sourcePointer is one of the function's arguments, pointing to a camera image, so each use of sourcePointer[x] will involve a memory call. The array is local to this function and so (I hope) likely to be cached.

    Also, the arraysize is actually 9, and all of its elements are called many times on each iteration so if I wrote it with 9 individual variables I'm afraid the code would be completely unreadable even to me.

    (To alleviate the mystery, the code I'm working on is a convolution.)

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by cyberfish View Post
    I thought loop unrolling has been in gcc for a while? (-O3)
    Quite possibly - but -O3 or -funroll-loops doesn't make it do what I expected. <... time passes ...> -funroll-all-loops does, however.

    With -O3 -funroll-all-loops -fomit-frame-pointer, gcc and MS VC++ generates identical code (aside from exactly which register is being used).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by R.Stiltskin View Post
    A couple of reasons (which I hope are valid). Firstly, sourcePointer is one of the function's arguments, pointing to a camera image, so each use of sourcePointer[x] will involve a memory call. The array is local to this function and so (I hope) likely to be cached.

    Also, the arraysize is actually 9, and all of its elements are called many times on each iteration so if I wrote it with 9 individual variables I'm afraid the code would be completely unreadable even to me.

    (To alleviate the mystery, the code I'm working on is a convolution.)
    All processors I'm aware of [which are quite a few] that have any form of cache, doesn't care whether the data is local or global. So as long as your camera image is stored in cacheable memory in itself [modern OS's usually have a way that you can allocate memory with caching attributes - since sometimes non-cacheable data is better for performance, since flushing out the contents of cacheable memory to a device (or in a camera case, invalidating the CPU caches for that area of memory when a new image has been read in) can have a large impact on overall performance]. Obviously, if the camera driver allocates non-cacheable memory, then making a copy will definitely benefit your application, as a local variable will be on the stack, and only complete idiot system developers would allocate the stack in non-cacheable memory.

    There are other factors of course - making a local array means that it can be held in local registers over function calls that doesn't hold the address to that data, for example.

    Edit: Be aware of premature optimization. Get the overall code working first. Tricks to "be more clever than the compiler" sometimes backfires, and can depend very much on the compiler - and the last thing you want, I suspect, is code that works REALLY GREAT on one compiler, and then compile it with another and it gives you much worse code than the non-optimal case.

    --
    Mats
    Last edited by matsp; 08-23-2008 at 10:09 AM.
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Registered User
    Join Date
    Feb 2003
    Posts
    596
    Thanks, I'll have to chew on that for a while. I think I'll just have to experiment with a few approaches to see if any of them yields a time savings.

    I assume that the "-O3 -funroll-all-loops -fomit-frame-pointer" mentioned above are compiler flags, correct? Do your comments apply to gcc-3.3 as well? What's the effect of -fomit-frame-pointer? When time permits I can experiment with those as well. I may not be able to use those flags, though -- I'm interfacing with a lot of code that's essentially a black box to me & setting these flags may cause problems.

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I'm not sure what version of gcc has which flags, but I'm pretty sure -funroll-loops and -fomit-frame-pointer is available in 3.3.

    -funroll[-all]-loops should definitely not cause any interface problems. -fomit-frame-pointer can cause difficulties to debug the generated code, but otherwise shouldn't cause any compatibility.

    The "frame-pointer" is a register that holds "the stack pointer as we entered this function", which makes it easier for a debugger to find the local variables, and easier to track back to the previous function. Using "-fno-frame-pointer" gives two benefits:
    1. the compiler doesn't store/restore the frame pointer in each function.
    2. the compiler has one more register to use for variables.

    Like most optimizatins, the compiler is free to "not do it" if it finds suitable reasons.

    Note that -funrol-alll-loops is not always beneficial to whole programs - it will very agressively unroll all sorts of loops, even when there is no actual benefit [this is a case of "you asked for it, so that's what you get" optimization]. This is why this option is not on by default even in -O3.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Here's a quick refresher on what the [] operator does:
    a[b] is the same as *(a + b).
    Therefore you could more tidily write your code snippet as this:
    Code:
    for( counter = 0; counter < 3; ++counter ) {
        array[counter] = sourcePointer[counter];
    }
    Or perhaps you could simply use memcpy:
    Code:
    memcpy(array, sourcePointer, 3 * sizeof(array[0]));
    Of course if these were objects you were copying, you'd have to use std::copy instead. In fact it's not a bad idea anyway:
    Code:
    std::copy(array, &sourcePointer[0], &sourcePointer[3]);
    As often happens, I wont yet be giving you any furthur optimisation advice here because you aren't yet giving us the big picture of what you're doing. You'd have to post more than your one little loop. The whole function it's in would be a good start.
    Last edited by iMalc; 08-23-2008 at 01:58 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404
    I didn't read the whole thread, but before someone say so, you are not calling std::copy (ok, std::copy is a template, not a function, but I guess most people here understand what I mean) the correct way, iMalc. This should be

    Code:
    std::copy(&sourcePointer[0], &sourcePointer[3], array);
    I hate real numbers.

  11. #11
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by foxman View Post
    Code:
    std::copy(&sourcePointer[0], &sourcePointer[3], array);
    Since, in the original post, sourcePointer is a raw pointer to int, and array is a C-style array of int, this can be simplified to;
    Code:
    std::copy(sourcePointer, sourcePointer + 3, array);

  12. #12
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Yeah I'm prone to getting things front to back.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  13. #13
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    Most modern compilers automaticalyl unroll loops, btu in general this is limited to loops with 4 or fewer iterations and some other conditions such as no way to potentially modify the loop variable inside the loop. Manually unrolling the loop will not hurt performance however, so if it might help go ahead. Unrolling extremely long loops however can hurt performance if the unrolled code does not fit entirely into the cache. It is common in these cases to use partial loop unrolling.

    e.g.
    Code:
    for( counter = 0; counter < 4; ++counter ){
        array[counter] = *(sourcePointer+counter);
        }
    becomes
    Code:
    for( counter = 0; counter < 4; counter+=2 ) {
        array[counter] = *(sourcePointer+counter);
        array[counter+1] = *(sourcePointer+counter+1);
        }
    of course you wouldnt do a partial unroll for such a short loop, I just used that as an example. Generally this is useful where the number of iterations is evenly divisible by some whole number and the fully unrolled loop would exceed about 1KB or has >64 iterations.
    Last edited by abachler; 08-24-2008 at 11:13 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Proposal: Code colouring
    By Perspective in forum A Brief History of Cprogramming.com
    Replies: 28
    Last Post: 05-14-2007, 07:23 AM
  2. Values changing without reason?
    By subtled in forum C Programming
    Replies: 2
    Last Post: 04-19-2007, 10:20 AM
  3. Optimizing my code
    By Lina in forum C Programming
    Replies: 30
    Last Post: 10-22-2006, 01:31 PM
  4. Obfuscated Code Contest
    By Stack Overflow in forum Contests Board
    Replies: 51
    Last Post: 01-21-2005, 04:17 PM
  5. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM