I use gcc 3.4.1 (with -O3) on an AMD Opteron dual-code dual-processor machine running Solaris 10 to compile and run the above code. It takes about 30 seconds. But if I replaceCode:for (j=1; j<=3120; j++) { for (i=1; i<=j-1; i++) { T = 0.0; for (k=1; k<=i-1; k++) T += 0.0; } } printf("T=%f\n", T);
T += 0.0 with either
T *= 1.0 or
T += 12.345 * 54.321 or
T = 0.0
it takes about 6 seconds.
1. What prevents gcc from optimizing the code with T+=0.0 such that running the program would also take 6 seconds?
2. What prevents gcc from optimizing the code such that it would take much less than 6 seconds? For example, with T=0.0 in the k loop, one should know T is zero without doing any computation. Why can't gcc detect it?
3. If I remove the last printf(), the timings do not change. I thought without printing T, the computation of T is irrelevant because T's value will never be used. So gcc should have optimized away the entire computation!