Thread: Inline functions

  1. #1
    Registered User Sharke's Avatar
    Join Date
    Jun 2008
    Location
    NYC
    Posts
    303

    Inline functions

    When I run the following code on my system, the inline function virtually always runs slower than the non-inline function. Why is this?

    Code:
    #include <iostream>
    #include <ctime>
    using namespace std;
    
    inline void f1()
    {
    	int n = 0;
    	for (int i = 0; i < 100; i++)
    		n += 10;
    }
    
    void f2()
    {
    	int n = 0;
    	for (int i = 0; i < 100; i++)
    		n += 10;
    }
    
    int main()
    {
    
    	clock_t start;
    	clock_t end;
    	clock_t duration;
    
    	cout << "Running inline function f1()..." << endl;
    	start = clock();
    	for (int i = 0; i < 1500000; i++)
    		f1();
    	end = clock();
    	duration = end - start;
    	cout << "Time elapsed: " << duration << " ticks." << endl;
    
    	cout << "Running function f2()..." << endl;
    	start = clock();
    	for (int i = 0; i < 1500000; i++)
    		f2();
    	end = clock();
    	duration = end - start;
    	cout << "Time elapsed: " << duration << " ticks." << endl;
    
    	system("pause");
    	return 0;
    }

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Sharke
    When I run the following code on my system, the inline function virtually always runs slower than the non-inline function. Why is this?
    Considering that a compiler might not inline despite the suggestion, especially due to the presence of a loop...
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    Maybe it's slow because you called it 1.5 million times.
    Last edited by whiteflags; 05-12-2009 at 11:08 PM. Reason: phantom zero

  4. #4
    Registered User Sharke's Avatar
    Join Date
    Jun 2008
    Location
    NYC
    Posts
    303
    Quote Originally Posted by laserlight View Post
    Considering that a compiler might not inline despite the suggestion, especially due to the presence of a loop...
    Ah, I did not know that inline functions were compiler suggestions. Having said that, I get similar results if I copy and paste the function body in place of the function call. It's not slower than the function call every time, but most times. This surprises me.

    Quote Originally Posted by whiteflags View Post
    Maybe it's slow because you called it 1.5 million times.
    1.5 million, 1.5 thousand......it's the comparison with the non-inline function I was interested in.

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Sharke
    Having said that, I get similar results if I copy and paste the function body in place of the function call. It's not slower than the function call every time, but most times. This surprises me.
    Maybe it is just "luck". You could compare assembly output to see what is happening.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    I assume you're compiling WITHOUT optimization? Because any decent optimizing compiler will optimize that entire thing to nothing, since the function has no side effects, and returns no value.

    Here's what I get:

    Code:
    scott@scott-intel-mini-ubuntu:/tmp$ g++ -O3 -o timer timer.cpp
    scott@scott-intel-mini-ubuntu:/tmp$ ./timer
    Running inline function f1()...
    Time elapsed: 0 ticks.
    Running function f2()...
    Time elapsed: 0 ticks.
    sh: pause: not found
    As opposed to without optimization:

    Code:
    scott@scott-intel-mini-ubuntu:/tmp$ g++ -O0 -o timer timer.cpp
    scott@scott-intel-mini-ubuntu:/tmp$ ./timer
    Running inline function f1()...
    Time elapsed: 620000 ticks.
    Running function f2()...
    Time elapsed: 560000 ticks.
    sh: pause: not found
    Looking at the assembly code generated without optimization, neither call is being inlined, and the assembly code for f1() and f2() is identical. So why the difference? I can't explain it, honestly, but I doubt it has anything to do with inlining, since neither function is actually being inlined.

    Your compiler may differ, of course.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  7. #7
    Registered User
    Join Date
    May 2007
    Posts
    147
    Sharke,

    Pay close attention to brewbuck's point.

    For example, if n were not local (so it's preserved between calls), and then PRINTED at the end, the compiler would probably not be able to optimize the calls into nothing.

    Also, timing differences:

    I've found that the precision of the clock is low compared to the microscopic inspection this kind of research is doing. On occasion I've observed that the order in which the calls are made is the difference ( that is, reverse f1 with f2 in main and see if that changes your observation ).

    Due to the way you've written the code, some compilers might inline f2 even though you didn't ask for it. This is especially true of the "any suitable" option.

    Also, ask yourself what are you timing here?

    1.5 million function calls -
    or the loop of 100 items inside each call?

    1.5 million loop of the outer call -
    or the repetitions of the interior loop of 100?

    (the latter is an inline version)


    Sometimes all you're going to do inside the function is 100 items in a loop - I know.

    But at a ratio of 1.5 million to 100, your timing relates as much to the performance of the CPU's cache as it does the ability to increment in a loop. That performance will not be stable! It is interrupted by other processes the OS is performing, which will destroy any predictability in the timing of your tests.

    So, generalize your test results. Test several runs, different orders, quiet the machine (stop services) - if you really want to know such microscopic differences between performance of any few different approaches.


    A quick example that irks me on this kind of research.

    In Don Knuth's work, an example loop optimization is discussed that uses "goto". It is often touted as the reason "goto" is valuable (there's a long standing argument about the 'proper-ness' of goto). A simple loop through an array searching for a match (array of integers). The "goto" loop is about 6 % to 8 % faster, but not because of goto. The "goto" fans miss the point of the example (the code in the citation is pascal style pseudo-code).

    The point of the loop isn't that goto was used, but that the loop isn't indexed by 1, it is indexed by 2, and two adjacent elements in the array are tested inside the loop instead of one.

    The results are identical if you perform a 'standard' for or while loop in C with the same incremental adjustment, with the sole exception of extremely small datasets.

    Incrementing the loop by 4, and testing for adjacent entries in the array at once within the loop, yields another incremental performance increase.

    In my own tests, demonstrating that the "goto" version performed exactly as the "standard loop" versions in C was maddening. Measurements varied, the order in which the tests gave one or other test a 1% result advantage - more telling, if I ran the same tests interleaved (that is, test1, test3, test2, test3, test1, test 2 ) I'd get different timings on exactly the same values and conditions.


    This Knuth example relates to your test and inquiry somewhat, BTW.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Inline Definitions and Declarations?
    By legit in forum C++ Programming
    Replies: 1
    Last Post: 06-15-2009, 01:59 PM
  2. Is it legal to have functions within functions?
    By Programmer_P in forum C++ Programming
    Replies: 13
    Last Post: 05-25-2009, 11:21 PM
  3. When to inline your *tors
    By Angus in forum C++ Programming
    Replies: 43
    Last Post: 10-29-2008, 03:38 PM
  4. conditional breakpoints and inline functions
    By Mario F. in forum C++ Programming
    Replies: 2
    Last Post: 08-10-2006, 08:30 PM
  5. inline friend functions.
    By sean in forum C++ Programming
    Replies: 2
    Last Post: 01-03-2002, 12:37 PM