execution speed of seemingly eqivalent loops

**dnj23** · 08-09-2013

Consider the following 2 looping statements:

a) while(counter <=MAX_COUNT) counter++;

b) while(counter++ <= MAX_COUNT);

When I set counter to 1 and MAX_COUNT to 4 billion (counter and MAX_COUNT is type unsigned long), it takes a full second longer on my machine for statement b to complete than statement a. Anybody know why? I thought the code has the same effect, but it's obviously different machine language. The reason I'm asking is that I'm writing a program with heavy iteration and I need to minimize running time. Thanks.

**rcgldr** · 08-09-2013

It's due to the way you're compiler optimized this. For Visual Studio, if there was nothing in the loop, it wouldn't loop and just set counter to MAX_COUNT+1, which could be zero or if counter wasn't used anywhere else, it could just ignore the loop completely (with VS, declaring counter to be volatile would prevent the compiler from ignoring updates to counter, but it would also force it to be stored and updated in memory instead of a register).

For the loop, this may be a bit quicker:

while(++counter <= (MAX_COUNT+1);

**oogabooga** · 08-09-2013

Did you test it many times to see if the result holds up?

The only difference I can see in the two snippets is that the post-increment in (a) is identical to a pre-increment, whereas the post-increment in (b) is not.

I.e., in (b) counter is incremented and then the old value is used for the test. That takes a little more code.

EDIT: Here's the gcc assembly code for your loops. The second loop has two extra instructions.

Code:

#include <stdio.h>

#define MAX_COUNT (unsigned)-1

int main(void) {
  unsigned counter = 0;
  while(counter < MAX_COUNT)
    counter++;
  while(counter++ < MAX_COUNT)
    ;
  return 0;
}

/*
	jmp	L2
L3:
	incl	12(%esp)
L2:
	cmpl	$-1, 12(%esp)
	jne	L3

L4:
	cmpl	$-1, 12(%esp)
	setne	%al
	incl	12(%esp)
	testb	%al, %al
	jne	L4
*/

**grumpy** · 08-09-2013

Same effect does not mean the compiler emits the same instructions.

The end result is the same, but the sequence of events described by the standard is not. So, depending on how the compiler goes about things (and how aggressive it is with optimisation) it will organise the sequence of events differently.

The first form can be trivially optimised to this

Code:

    while (counter <= MAX_COUNT) ++counter;

which avoids any usage of a temporary to increment counter. It doesn't take much smarts for a compiler to do this, since the temporary implicitly created by counter++ (the original value of counter is stored temporaily, then counter is incremented) is obviously never used.

The second form, however, logically uses the temporary (since it is the temporary that is compared with MAX_COUNT). This means the compiler needs to perform more significant analysis to realise it can reorganise the code to eliminate the temporary.

The more "significant analysis" a compiler has to do, the less likely it is to do so by default. At low compiler optimisation levels (which is the default for virtually every available compiler or IDE), it is therefore more probable that it won't be done.

You'll probably find that, if you turn up compiler optimisation levels (the techniques to do this are compiler/IDE specific) enough, then the two samples will more likely to exhibit equivalent performance characteristics.

Note that I refer to likelihoods, not to certainties. Reorganising code like you expect (or that I have described) is not required at all by the standard. It is a quality of implementation concern for the compiler vendor. Some vendors will consider the type of optimisation you want important, others will not. You will therefore find differences between compilers, compiler versions, and optimisation settings.

**rcgldr** · 08-09-2013

Originally Posted by rcgldr

For the loop, this may be a bit quicker:

while(++counter <= (MAX_COUNT+1);

If MAX_COUNT+1 is zero, then the loop would need to be:

while(++counter);
or
while(++counter != 0);

old compiler, Visual C/C++ 4.0:

Code:

int main(void)
{
    unsigned counter = 0;
    while(counter <= 0xffffffffUL)
        counter++;
    return(0);
}

Code:

_main   PROC NEAR
        xor     eax, eax
$L78:
        inc     eax
        cmp     eax, -1
        jbe     SHORT $L78
        xor     eax, eax
        ret     0
_main   ENDP

Code:

int main(void)
{
    unsigned counter = 0;
    while(counter++ <= 0xffffffffUL);
    return(0);
}

Code:

_main   PROC NEAR
        xor     eax, eax
$L78:
        mov     ecx, eax
        inc     eax
        cmp     ecx, -1
        jbe     SHORT $L78
        xor     eax, eax
        ret     0
_main   ENDP

Code:

int main(void)
{
    unsigned counter = 0;
    while(++counter != 0UL);
    return(0);
}

Code:

_main   PROC NEAR
        xor     eax, eax
$L78:
        inc     eax
        jne     SHORT $L78
        xor     eax, eax
        ret     0
_main   ENDP

Visual Studio 2005 (counter is ignored):

Code:

_main   PROC NEAR
        xor     eax, eax
        ret     0

**dnj23** · 08-09-2013

Wow, thanks for the great replies. I'll just have to experiment around.

Originally Posted by oogabooga

Did you test it many times to see if the result holds up?

I did. It varied somewhat but no more than .1 of a second. I guess with multitasking and all, the OS comes into play too, right?

**rcgldr** · 08-10-2013

Originally Posted by dnj23

Consider the following 2 looping statements:
a) while(counter <=MAX_COUNT) counter++;
b) while(counter++ <= MAX_COUNT);

I forgot to mention the key difference: For case a) the counter is only incremented if the compare condition is true, but for case b), the counter is incremented regardless if the compare condition is true or false, so the logic is different. In the case of the microsoft compilers, if a register is used for counter, then for case b) a second register is used, and there is one more instruction (a move to copy register to register) per loop. For both cases with the microsoft compilers, if counter is in memory, then the number of instructions is the same, using an add to memory instruction either before or after loading the memory location used for counter into a register.

**megafiddle** · 08-10-2013

Originally Posted by rcgldr

It's due to the way you're compiler optimized this. For Visual Studio, if there was nothing in the loop, it wouldn't loop and just set counter to MAX_COUNT+1, which could be zero or if counter wasn't used anywhere else, it could just ignore the loop completely...

What is the purpose of optimizing empty loops like that?
If you wanted the loop in there, eg as a crude delay or timer, you wouldn't want it optimized.
And if you did want it optimized like that, why put the loop in there to begin with?

I guess I am wondering what type of programming problem it is intended to solve.

**rcgldr** · 08-10-2013

Originally Posted by megafiddle

What is the purpose of optimizing empty loops like that?

Apparently microsoft doesn't consider loops like that should be used for delays, and in general, it eliminates any code that only changes local variables never used outside a function. If there was a call to another function in that loop, the loop would not have been eliminated. I'm not sure if the warning level is set to maximum that a warning that unused code is being deleted will occur.

**grumpy** · 08-10-2013

Originally Posted by rcgldr

Apparently microsoft doesn't consider loops like that should be used for delays

From a broader perspective, they're right. If you want a delay, better to use an API function (for example, provided by a host OS) that implements a delay of the required duration and resolution. For such loops to make any sense, it is necessary to know details of the host instruction set architecture, as well as the mapping of C statements to the host instruction set (which is compiler specific, and subject to change as the compiler evolves). And one of the major goals of higher level languages and of (modern) operating systems is make code independent of such low level details (for anyone except implementers of compilers and libraries).

**megafiddle** · 08-10-2013

Delay was a bad example. But I used to use them for trouble shooting. Certainly you don't want to use it as final solution
(and why I called it crude).

Another example might be using an empty loop when measuring execution time of a very fast statement. You place
the statement to be measured inside a long loop, and measure the total execution time. Then you subtract off the
execution time of a similar but empty loop.

Anyway, is that pretty much the only reason, to basically "clean up" a program?

If all the compiler was going to is remove it or set the counter to it's final value, wouldn't it be better to simply report
a warning? Surely the programmer could handle things from there.

**grumpy** · 08-10-2013

Originally Posted by megafiddle

If all the compiler was going to is remove it or set the counter to it's final value, wouldn't it be better to simply report
a warning? Surely the programmer could handle things from there.

If you simplified something so the same required outcome was produced a little bit more efficiently, would you bother to warn anyone about it? Or would you just make the change?

There are a few things that compiler vendors compete on. One is feature set (support of the latest standard, extensions). Another is maximising performance of poorly written code without offending programmers (for example, by issuing a diagnostic that says - in effect - "you wrote crap code").

You also need to remember that compiler vendors - because of pressure from lazy programmers - minimise the things they warn about by default. That it is why it is necessary to tweak a compiler if you want more comprehensive diagnostics about your code.

If you want to measure timing of something, there are ways and means other than putting it in a loop that the compiler can detect has no usable effect.

**megafiddle** · 08-11-2013

But it does have a usable effect, since a time value would will have been recordered immediately before
and immediately after the empty loop as well as the loop containing the statement under test. Acually, it
can work very well.

Anyway though, that does answer my question, as to why that type of optimization is done.

**iMalc** · 08-12-2013

Originally Posted by megafiddle

Another example might be using an empty loop when measuring execution time of a very fast statement. You place
the statement to be measured inside a long loop, and measure the total execution time. Then you subtract off the
execution time of a similar but empty loop.

No, don't do this.
And to the OP, does it really matter how long these loops take if the moment you put something useful inside the loop everything changes? The faster one might even become the slower one!

But it does have a usable effect, since a time value would will have been recordered immediately before
and immediately after the empty loop as well as the loop containing the statement under test

You would much prefer every compiler out there to do no optimisation whatsoever? Because that's where such thinking is leading to.

Two lessons you need to learn:
1. Don't profile expecting that something which could be optimised away wont be.
2. Profile the real code, not a stripped down example which is highly likely to behave totally differently.

You'll get burnt by such things soon enough if you ignore them, so you'd best learn them now.

**grumpy** · 08-12-2013

Originally Posted by megafiddle

But it does have a usable effect, since a time value would will have been recordered immediately before
and immediately after the empty loop as well as the loop containing the statement under test. Acually, it
can work very well.

Elapsed time is not a "usable effect", since (among other things) any means of measuring time produce implementation defined results.

It might "work very well" for you in some specialised circumstances. But it is not guaranteed to always work.

Thread: execution speed of seemingly eqivalent loops

Thread Tools

Search Thread

Display

execution speed of seemingly eqivalent loops

Similar Threads

LD_PRELOAD reduces the speed of execution. why it is like that??

Execution speed slow initially, then speeds up

Execution speed test in c++

Speed of Loops.

Questions on Speed of Execution