Originally Posted by
quzah
There's an example in the C99 draft that shows where a floating point number example, quite similar actually to this one, isn't optimized out because it could raise a floating point exception. It has to do with nested loops and adding and the fact that the loop may or may not execute, or some such.
It indeed does appear to have something to do with whether the variable in question is an int or a float.
Code:
#include <stdlib.h>
int main(int argc, char *argv[])
{
int i,j,k;
int T;
int size = atoi(argv[1]);
for (j=1; j<=size; j++) {
for (i=1; i<=j-1; i++) {
T = 0;
for (k=1; k<=i-1; k++)
T += 2 * 3;
}
}
printf("T=%d\n", T);
can be optimzed by lifting the "T += 2.0 * 3.0" out of the innermost loop and it would become
Code:
#include <stdlib.h>
int main(int argc, char *argv[])
{
int i,j,k;
int T;
int size = atoi(argv[1]);
for (j=1; j<=size; j++) {
for (i=1; i<=j-1; i++) {
T = 0;
T += 2 * 3 * (i-1);
for (k=1; k<=i-1; k++) {}
}
}
printf("T=%d\n", T);
Actually both the above programs takes about 6 seconds to run on my computer, when I compile them using gcc -O3 and set size=3120.
However, if I declare T as a float, then the "unoptimized" program
Code:
#include <stdlib.h>
int main(int argc, char *argv[])
{
int i,j,k;
float T;
int size = atoi(argv[1]);
for (j=1; j<=size; j++) {
for (i=1; i<=j-1; i++) {
T = 0.0;
for (k=1; k<=i-1; k++)
T += 2.0 * 3.0;
}
}
printf("T=%f\n", T);
takes about 29 seconds, whereas the "optimized" program
Code:
#include <stdlib.h>
int main(int argc, char *argv[])
{
int i,j,k;
float T;
int size = atoi(argv[1]);
for (j=1; j<=size; j++) {
for (i=1; i<=j-1; i++) {
T = 0.0;
T += 2.0 * 3.0 * (i-1);
for (k=1; k<=i-1; k++) {}
}
}
printf("T=%f\n", T);
takes about 6 seconds, with size=3120.
I read the gcc man page and there is this "-ffast-math" optimization flag that seems to optimize floating point operations at the expense of correctness. However, even with this flag, the "unoptimized" program still takes 29 seconds to run, with size=3120.
When I remove the printf() statement at the end, the unoptimized program can now run for 6 seconds. But with the removal of printf(), the optimized program still runs for 6 seconds.
The conclusion thus seems to be: For some reason, gcc is not able to optimize if the variable in question is a float. But if it is an int or the invariant computation is lifted outside the innermost loop by hand, then the running time does not even depend on the "visibility" of the variable (i.e. whether it is printed at the end). This can be illustrated by the following program with
empty loop body:
Code:
#include <stdlib.h>
int main(int argc, char *argv[])
{
int i,j,k;
int size = atoi(argv[1]);
for (j=1; j<=size; j++) {
for (i=1; i<=j-1; i++) {
for (k=1; k<=i-1; k++) {}
}
}
it still takes 6 seconds.
And interestingly, gcc does not optimize away the entire loop nest. (The running time increases as 'size' increases.) I think that's kind of dumb. Some might have used such a dumb loop to delay the running of a program, as one poster suggests; but with a -O3 flag, I think gcc should optimize it away.
Unfortunately I am not able to deciper the assembly code. I cannot even understand what the assembly code of a simple program works.
Code:
int main(int argc, char *argv[])
{
float T;
T = 2.0 * 3.0;
return 0;
}
gcc -O0 -S 2x3.c -o 2x3.s generates an assembly program (on an IA-32 machine):
Code:
.file "2x3.c"
.version "01.01"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
subl %eax, %esp
movl $0x40c00000, %eax
movl %eax, -4(%ebp)
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 3.4.1"
It really beats me why 15 is added to the register eax twice and then right shift once and left shift once. I am guessing these 4 arithmetic ops are what T = 2.0*3.0 is translated into. But when I write a T = 2.0*5.0 program, everything is unchanged except that $0x40c00000 becomes $0x41200000. So it seems "movl $0x40c00000, %eax" is really the computation. But how come $0x40c00000 is 6.0?! I am totally confused.
Originally Posted by
quzah
Anyway, just because it should, doesn't mean it will. The standard doesn't say it must, so if it isn't, that's why. That was kinda my whole point.
I think it does not have much to do with the C Standard. Most likely a language standard would not say whether certain things should be or must be optimized. It is up to the compiler. Therefore if gcc says something should be optimized in a certain way when the -O3 flag, say for example, or a specific optimization flag, say "-ffast-math", is used, then it should. If not, it violates the gcc spec (and not the C Standard). Unfortunately, there is not much detailed documentation about gcc optimization. For example, gcc manual does not say what kind of optimzation flag, if any, should be specified when the lifting of an invariant computation out of a loop is desired. Nor does it say what flag should be specified in order to optimize away the computation of an "invisible" variable. I think these optimization techniques are quite basic and any reasonable compiler should have them; but again, very unfortunately, gcc simply fails to document them and include them in the -O3 option.