Lambda captures are inefficient?

This is a discussion on Lambda captures are inefficient? within the C++ Programming forums, part of the General Programming Boards category; Compiler: VS2010 I'm having an interesting performance problem. I have a class which provides a ForEach method that takes a ...

  1. #1
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,249

    Lambda captures are inefficient?

    Compiler: VS2010

    I'm having an interesting performance problem. I have a class which provides a ForEach method that takes a lambda as its parameter. I'm trying to use the lambda to count the number of iterations. Boiled down, the case looks like this:

    Code:
    class Foo
    {
    public:
        template <typename F>
        void ForEach(F func)
        {
            // ... calls func() a number of times
        }
    };
    
    // Invocation site:
    Foo foo;
    int counter = 0;
    foo.ForEach([&]()
    {
        ++counter;
    });
    This works correctly, of course, but the compiler generates less-than-optimal assembly code. The anonymous class the compiler creates to implement the lambda contains a pointer to the counter variable. On each iteration of ForEach(), the generated code obtains this pointer from memory and then dereferences it to increment the counter. This is doubly non-optimal, as the compiler could have simply kept count in a register and written to memory at the end of the loop, but instead it performs TWO memory accesses (one to get the pointer from the lambda, then another to get the variable itself by dereferencing the pointer) for each iteration.

    I suspected that declaring the counter variable as static would improve matters. Indeed it does, with the generated code simply keeping count inside a register as expected. The resulting code is 25% faster!

    It is almost as if the compiler is treating, not only the counter variable but the pointer to it, as volatile! This seems extremely pessimistic. I'm bummed, because making it static is a hack and I don't want to do it.

    Does anybody have a clue why the compiler is being so pessimistic? Something in the C++11 standard maybe?

    EDIT: I forgot to mention that the compiler is inlining everything together -- it generates a specialized ForEach() which does the loop and increments the variable in line, with no function call to dispatch to the lambda. So the compiler should in theory be perfectly capable of realizing that the pointer never changes and thus it COULD cache the counter in a register... It just doesn't.
    Last edited by brewbuck; 09-19-2012 at 09:53 PM.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  2. #2
    Registered User manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    Kolkata@India
    Posts
    2,498
    Does the same happen for the following code ?
    Code:
    #include<functional>
    #include<iostream>
    #include<vector>
    class Foo
    {
    public:
        Foo():foo({1,2,3,4,5}){}
        template <typename F>
        void ForEach(F func)
        {
            for(auto x:foo)func();
        }
        std::vector<int> foo;
        
    };
    int main()
    {
        Foo foo;
        int counter = 0;
        foo.ForEach
        (
            std::bind
            (
                [](int& i){++i;}
                ,std::ref(counter)
            )
        );
        std::cout<<counter;
    }
    If so, it *may* just be the quirk of the compiler.(You'll probably see the same in all 'call by reference' cases.)

    If not, it could be due to how lambdas must treat objects captured by reference...(I have no idea.)
    Last edited by manasij7479; 09-19-2012 at 10:33 PM.
    Manasij Mukherjee | gcc-4.8.2 @Arch Linux
    Slow and Steady wins the race... if and only if :
    1.None of the other participants are fast and steady.
    2.The fast and unsteady suddenly falls asleep while running !



  3. #3
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    You didn't say whether you were building a Release or Debug build or what optimization level you set.

    Try a release build with maximum optimization.
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  4. #4
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,249
    Quote Originally Posted by manasij7479 View Post
    Does the same happen for the following code ?
    VS2010 can't compile that

    I'll see if I can do something similar, though.

    I'm using standard Release build settings, which is not the highest optimization. Turning optimization up makes no difference, it looks like.
    Last edited by brewbuck; 09-19-2012 at 11:00 PM.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  5. #5
    Registered User
    Join Date
    Oct 2006
    Posts
    2,503
    Quote Originally Posted by brewbuck View Post
    It is almost as if the compiler is treating, not only the counter variable but the pointer to it, as volatile!
    I can certainly see the counter variable being treated as volatile, as the developers at microsoft know that a lambda may be used as a thread procedure; however, treating the pointer also as if it were volatile does seem a bit pessimistic, since we all know that references cannot change once they are assigned.

    I suspect that this first microsoft release of a compiler supporting C++11 features simply doesn't have the level of optimization on the new features that it has on the previous standard. the standard simply hasn't been around long enough to fully develop implementations of it, and since VS2010 was released long before the standard was finalized, I'm not even the slightest bit surprised that it doesn't have very complete or efficient support of C++11. In contrast, G++ 4.7, by virtue of the fact that it is a very recent release, has much more complete support, and G++ is usually recognized as the superior code generator of the two, although I have not profiled the performance of any C++11 features in G++.

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    4,337
    I'm having an interesting performance problem.
    O_o

    Would you do me the favor (I don't have 2010.) of profiling explicitly using a pointer and an address?

    I know it is less than convenient, but could you also profile explicitly "wrangling" a pointer to a reference within the lambda.

    I can certainly see the counter variable being treated as volatile, as the developers at microsoft know that a lambda may be used as a thread procedure; however, treating the pointer also as if it were volatile does seem a bit pessimistic, since we all know that references cannot change once they are assigned.
    Well yeah, but every function can potentially be used in a threaded context; if they made the same pessimistic assumption everywhere the compiler would be virtually unusable.

    Soma

  7. #7
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,249
    Quote Originally Posted by phantomotap View Post
    O_o

    Would you do me the favor (I don't have 2010.) of profiling explicitly using a pointer and an address?

    I know it is less than convenient, but could you also profile explicitly "wrangling" a pointer to a reference within the lambda.
    You mean like this?

    Code:
    class Foo
    {
    public:
        template <typename F>
        void ForEach(F func)
        {
            // ... calls func() a number of times
        }
    };
     
    // Invocation site:
    Foo foo;
    int counter = 0;
    int *pCounter = &counter;
    foo.ForEach([&]()
    {
        ++*pCounter;
    });
    If so, I already tried that and found that the compiler generates identical code to the first case.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Inefficient/Repetitious Parsing
    By User Name: in forum C Programming
    Replies: 13
    Last Post: 06-17-2010, 12:14 AM
  2. Lambda functions in C?
    By black0ut in forum C Programming
    Replies: 2
    Last Post: 08-12-2009, 01:26 PM
  3. boost lambda expression
    By pheres in forum C++ Programming
    Replies: 10
    Last Post: 04-26-2007, 05:25 AM
  4. I want to make a pogram that captures another program's HWND.
    By Queatrix in forum Windows Programming
    Replies: 4
    Last Post: 07-26-2005, 03:07 PM
  5. Very Inefficient translator.
    By XenoForce in forum C++ Programming
    Replies: 15
    Last Post: 10-24-2004, 02:58 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21