Compiler: VS2010
I'm having an interesting performance problem. I have a class which provides a ForEach method that takes a lambda as its parameter. I'm trying to use the lambda to count the number of iterations. Boiled down, the case looks like this:
This works correctly, of course, but the compiler generates less-than-optimal assembly code. The anonymous class the compiler creates to implement the lambda contains a pointer to the counter variable. On each iteration of ForEach(), the generated code obtains this pointer from memory and then dereferences it to increment the counter. This is doubly non-optimal, as the compiler could have simply kept count in a register and written to memory at the end of the loop, but instead it performs TWO memory accesses (one to get the pointer from the lambda, then another to get the variable itself by dereferencing the pointer) for each iteration.Code:class Foo { public: template <typename F> void ForEach(F func) { // ... calls func() a number of times } }; // Invocation site: Foo foo; int counter = 0; foo.ForEach([&]() { ++counter; });
I suspected that declaring the counter variable as static would improve matters. Indeed it does, with the generated code simply keeping count inside a register as expected. The resulting code is 25% faster!
It is almost as if the compiler is treating, not only the counter variable but the pointer to it, as volatile! This seems extremely pessimistic. I'm bummed, because making it static is a hack and I don't want to do it.
Does anybody have a clue why the compiler is being so pessimistic? Something in the C++11 standard maybe?
EDIT: I forgot to mention that the compiler is inlining everything together -- it generates a specialized ForEach() which does the loop and increments the variable in line, with no function call to dispatch to the lambda. So the compiler should in theory be perfectly capable of realizing that the pointer never changes and thus it COULD cache the counter in a register... It just doesn't.