Thread: execution speed of seemingly eqivalent loops

  1. #1
    Registered User
    Join Date
    Jul 2010
    Posts
    22

    execution speed of seemingly eqivalent loops

    Consider the following 2 looping statements:

    a) while(counter <=MAX_COUNT) counter++;

    b) while(counter++ <= MAX_COUNT);

    When I set counter to 1 and MAX_COUNT to 4 billion (counter and MAX_COUNT is type unsigned long), it takes a full second longer on my machine for statement b to complete than statement a. Anybody know why? I thought the code has the same effect, but it's obviously different machine language. The reason I'm asking is that I'm writing a program with heavy iteration and I need to minimize running time. Thanks.

  2. #2
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    It's due to the way you're compiler optimized this. For Visual Studio, if there was nothing in the loop, it wouldn't loop and just set counter to MAX_COUNT+1, which could be zero or if counter wasn't used anywhere else, it could just ignore the loop completely (with VS, declaring counter to be volatile would prevent the compiler from ignoring updates to counter, but it would also force it to be stored and updated in memory instead of a register).

    For the loop, this may be a bit quicker:

    while(++counter <= (MAX_COUNT+1);
    Last edited by rcgldr; 08-09-2013 at 07:24 PM.

  3. #3
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Did you test it many times to see if the result holds up?

    The only difference I can see in the two snippets is that the post-increment in (a) is identical to a pre-increment, whereas the post-increment in (b) is not.

    I.e., in (b) counter is incremented and then the old value is used for the test. That takes a little more code.

    EDIT: Here's the gcc assembly code for your loops. The second loop has two extra instructions.
    Code:
    #include <stdio.h>
    
    #define MAX_COUNT (unsigned)-1
    
    int main(void) {
      unsigned counter = 0;
      while(counter < MAX_COUNT)
        counter++;
      while(counter++ < MAX_COUNT)
        ;
      return 0;
    }
    
    /*
    	jmp	L2
    L3:
    	incl	12(%esp)
    L2:
    	cmpl	$-1, 12(%esp)
    	jne	L3
    
    L4:
    	cmpl	$-1, 12(%esp)
    	setne	%al
    	incl	12(%esp)
    	testb	%al, %al
    	jne	L4
    */
    Last edited by oogabooga; 08-09-2013 at 06:48 PM.

  4. #4
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Same effect does not mean the compiler emits the same instructions.

    The end result is the same, but the sequence of events described by the standard is not. So, depending on how the compiler goes about things (and how aggressive it is with optimisation) it will organise the sequence of events differently.

    The first form can be trivially optimised to this
    Code:
        while (counter <= MAX_COUNT) ++counter;
    which avoids any usage of a temporary to increment counter. It doesn't take much smarts for a compiler to do this, since the temporary implicitly created by counter++ (the original value of counter is stored temporaily, then counter is incremented) is obviously never used.

    The second form, however, logically uses the temporary (since it is the temporary that is compared with MAX_COUNT). This means the compiler needs to perform more significant analysis to realise it can reorganise the code to eliminate the temporary.

    The more "significant analysis" a compiler has to do, the less likely it is to do so by default. At low compiler optimisation levels (which is the default for virtually every available compiler or IDE), it is therefore more probable that it won't be done.

    You'll probably find that, if you turn up compiler optimisation levels (the techniques to do this are compiler/IDE specific) enough, then the two samples will more likely to exhibit equivalent performance characteristics.

    Note that I refer to likelihoods, not to certainties. Reorganising code like you expect (or that I have described) is not required at all by the standard. It is a quality of implementation concern for the compiler vendor. Some vendors will consider the type of optimisation you want important, others will not. You will therefore find differences between compilers, compiler versions, and optimisation settings.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  5. #5
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Quote Originally Posted by rcgldr View Post
    For the loop, this may be a bit quicker:

    while(++counter <= (MAX_COUNT+1);
    If MAX_COUNT+1 is zero, then the loop would need to be:

    while(++counter);
    or
    while(++counter != 0);

    old compiler, Visual C/C++ 4.0:

    Code:
    int main(void)
    {
        unsigned counter = 0;
        while(counter <= 0xffffffffUL)
            counter++;
        return(0);
    }

    Code:
    _main   PROC NEAR
            xor     eax, eax
    $L78:
            inc     eax
            cmp     eax, -1
            jbe     SHORT $L78
            xor     eax, eax
            ret     0
    _main   ENDP
    Code:
    int main(void)
    {
        unsigned counter = 0;
        while(counter++ <= 0xffffffffUL);
        return(0);
    }
    Code:
    _main   PROC NEAR
            xor     eax, eax
    $L78:
            mov     ecx, eax
            inc     eax
            cmp     ecx, -1
            jbe     SHORT $L78
            xor     eax, eax
            ret     0
    _main   ENDP
    Code:
    int main(void)
    {
        unsigned counter = 0;
        while(++counter != 0UL);
        return(0);
    }
    Code:
    _main   PROC NEAR
            xor     eax, eax
    $L78:
            inc     eax
            jne     SHORT $L78
            xor     eax, eax
            ret     0
    _main   ENDP


    Visual Studio 2005 (counter is ignored):
    Code:
    _main   PROC NEAR
            xor     eax, eax
            ret     0
    Last edited by rcgldr; 08-09-2013 at 07:32 PM.

  6. #6
    Registered User
    Join Date
    Jul 2010
    Posts
    22
    Wow, thanks for the great replies. I'll just have to experiment around.

    Quote Originally Posted by oogabooga View Post
    Did you test it many times to see if the result holds up?
    I did. It varied somewhat but no more than .1 of a second. I guess with multitasking and all, the OS comes into play too, right?

  7. #7
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Quote Originally Posted by dnj23 View Post
    Consider the following 2 looping statements:
    a) while(counter <=MAX_COUNT) counter++;
    b) while(counter++ <= MAX_COUNT);
    I forgot to mention the key difference: For case a) the counter is only incremented if the compare condition is true, but for case b), the counter is incremented regardless if the compare condition is true or false, so the logic is different. In the case of the microsoft compilers, if a register is used for counter, then for case b) a second register is used, and there is one more instruction (a move to copy register to register) per loop. For both cases with the microsoft compilers, if counter is in memory, then the number of instructions is the same, using an add to memory instruction either before or after loading the memory location used for counter into a register.

  8. #8
    Registered User
    Join Date
    Mar 2011
    Posts
    596
    Quote Originally Posted by rcgldr View Post
    It's due to the way you're compiler optimized this. For Visual Studio, if there was nothing in the loop, it wouldn't loop and just set counter to MAX_COUNT+1, which could be zero or if counter wasn't used anywhere else, it could just ignore the loop completely...
    What is the purpose of optimizing empty loops like that?
    If you wanted the loop in there, eg as a crude delay or timer, you wouldn't want it optimized.
    And if you did want it optimized like that, why put the loop in there to begin with?

    I guess I am wondering what type of programming problem it is intended to solve.

  9. #9
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Quote Originally Posted by megafiddle View Post
    What is the purpose of optimizing empty loops like that?
    Apparently microsoft doesn't consider loops like that should be used for delays, and in general, it eliminates any code that only changes local variables never used outside a function. If there was a call to another function in that loop, the loop would not have been eliminated. I'm not sure if the warning level is set to maximum that a warning that unused code is being deleted will occur.

  10. #10
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by rcgldr View Post
    Apparently microsoft doesn't consider loops like that should be used for delays
    From a broader perspective, they're right. If you want a delay, better to use an API function (for example, provided by a host OS) that implements a delay of the required duration and resolution. For such loops to make any sense, it is necessary to know details of the host instruction set architecture, as well as the mapping of C statements to the host instruction set (which is compiler specific, and subject to change as the compiler evolves). And one of the major goals of higher level languages and of (modern) operating systems is make code independent of such low level details (for anyone except implementers of compilers and libraries).
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  11. #11
    Registered User
    Join Date
    Mar 2011
    Posts
    596
    Delay was a bad example. But I used to use them for trouble shooting. Certainly you don't want to use it as final solution
    (and why I called it crude).

    Another example might be using an empty loop when measuring execution time of a very fast statement. You place
    the statement to be measured inside a long loop, and measure the total execution time. Then you subtract off the
    execution time of a similar but empty loop.

    Anyway, is that pretty much the only reason, to basically "clean up" a program?

    If all the compiler was going to is remove it or set the counter to it's final value, wouldn't it be better to simply report
    a warning? Surely the programmer could handle things from there.

  12. #12
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by megafiddle View Post
    If all the compiler was going to is remove it or set the counter to it's final value, wouldn't it be better to simply report
    a warning? Surely the programmer could handle things from there.
    If you simplified something so the same required outcome was produced a little bit more efficiently, would you bother to warn anyone about it? Or would you just make the change?

    There are a few things that compiler vendors compete on. One is feature set (support of the latest standard, extensions). Another is maximising performance of poorly written code without offending programmers (for example, by issuing a diagnostic that says - in effect - "you wrote crap code").

    You also need to remember that compiler vendors - because of pressure from lazy programmers - minimise the things they warn about by default. That it is why it is necessary to tweak a compiler if you want more comprehensive diagnostics about your code.

    If you want to measure timing of something, there are ways and means other than putting it in a loop that the compiler can detect has no usable effect.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  13. #13
    Registered User
    Join Date
    Mar 2011
    Posts
    596
    But it does have a usable effect, since a time value would will have been recordered immediately before
    and immediately after the empty loop as well as the loop containing the statement under test. Acually, it
    can work very well.

    Anyway though, that does answer my question, as to why that type of optimization is done.

  14. #14
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by megafiddle View Post
    Another example might be using an empty loop when measuring execution time of a very fast statement. You place
    the statement to be measured inside a long loop, and measure the total execution time. Then you subtract off the
    execution time of a similar but empty loop.
    No, don't do this.
    And to the OP, does it really matter how long these loops take if the moment you put something useful inside the loop everything changes? The faster one might even become the slower one!

    But it does have a usable effect, since a time value would will have been recordered immediately before
    and immediately after the empty loop as well as the loop containing the statement under test
    You would much prefer every compiler out there to do no optimisation whatsoever? Because that's where such thinking is leading to.

    Two lessons you need to learn:
    1. Don't profile expecting that something which could be optimised away wont be.
    2. Profile the real code, not a stripped down example which is highly likely to behave totally differently.

    You'll get burnt by such things soon enough if you ignore them, so you'd best learn them now.
    Last edited by iMalc; 08-12-2013 at 03:22 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  15. #15
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by megafiddle View Post
    But it does have a usable effect, since a time value would will have been recordered immediately before
    and immediately after the empty loop as well as the loop containing the statement under test. Acually, it
    can work very well.
    Elapsed time is not a "usable effect", since (among other things) any means of measuring time produce implementation defined results.

    It might "work very well" for you in some specialised circumstances. But it is not guaranteed to always work.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 3
    Last Post: 11-15-2012, 02:50 AM
  2. Execution speed slow initially, then speeds up
    By megafiddle in forum Windows Programming
    Replies: 22
    Last Post: 12-08-2011, 11:16 PM
  3. Execution speed test in c++
    By nick2price in forum C++ Programming
    Replies: 1
    Last Post: 03-12-2009, 04:23 PM
  4. Speed of Loops.
    By SavesTheDay in forum C Programming
    Replies: 1
    Last Post: 02-19-2002, 09:33 AM
  5. Questions on Speed of Execution
    By wavering in forum C Programming
    Replies: 22
    Last Post: 01-20-2002, 02:04 PM