Thread: Is virtual calls really slow or it is just bull........?

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User
    Join Date
    Feb 2009
    Posts
    40

    Is virtual calls really slow or it is just bull........?

    Hi all,

    Is virtual calls really slow or it is just bull........? With slow do I mean "slow" as I should notice it, not just some extra nanoseconds or so, that do I not really call slow. Of course should it probably have a negative impact on a critical section/application, but now do I mean a normal program that is normally not limited to CPU power but more network, disk IO and such resources. I wonder because I just have started to use virtual methods in many classes, and is thinking of I should use it together with multiple inheritance. My application should then be much easier to maintain and extend.

    Thanks in advance!

  2. #2
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Short answer: Total bull.

    Long answer: It depends. Since the standard doesn't dictate the particular format used or techniques employed, compiler vendor are free to implement it however they wish. That said, most implementations do so quite efficiently, so in general it really shouldn't be an issue.

  3. #3
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    It basically has the cost of two extra memory accesses, which isn't a high cost at all. One if the implementation chooses to store the v-table with the object.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  4. #4
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    I woudln't use it in the inner loop of a games physics engine, but its fine for things that don't get executed very often. I personally never use the virtual keyword, 99.99999% of problems can be solved in a trivial manner using other methods.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    A normal call is:

    call direct

    A virtual call is:

    load vtable pointer
    load function pointer
    call indirect

    The biggest problem is that it can really mess up the branch predictor.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by CornedBee View Post
    The biggest problem is that it can really mess up the branch predictor.
    That and the extra memory accesses can cause a cache miss or a page fault.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  7. #7
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    It should be pretty tiny impact compared to the code inside the function, and that's if it's not already translated to a direct call by the optimiser where possible.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  8. #8
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    Quote Originally Posted by King Mir View Post
    That and the extra memory accesses can cause a cache miss or a page fault.
    Highly unlikely it will cause a page fault. it will most certainly cause cache misses, but only the first time it is called, after that the data will be in the cache. what slows it down is the extra memory accesses, which is why its very bad to use as part of an inner loop. Inner loops are inherently CPU bound if designed properly. Any extra work will slow the system down.

    Now with THAT said, its probable that in most cases you will never notice the difference.

  9. #9
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by abachler View Post
    Highly unlikely it will cause a page fault. it will most certainly cause cache misses, but only the first time it is called, after that the data will be in the cache. what slows it down is the extra memory accesses, which is why its very bad to use as part of an inner loop. Inner loops are inherently CPU bound if designed properly. Any extra work will slow the system down.

    Now with THAT said, its probable that in most cases you will never notice the difference.
    That is true. If you're, say, processing an image and trying to squeeze every last drop of performance out of it, avoiding virtual dispatches can make a noticeable difference. But of course, in that situation even a single unnecessary multiply is going to be a concern, and moreover, writing clear, concise code is quite simply a luxury you can't afford! Desperate times call for desperate measures.

  10. #10
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    Quote Originally Posted by Sebastiani View Post
    That is true. If you're, say, processing an image and trying to squeeze every last drop of performance out of it, avoiding virtual dispatches can make a noticeable difference. But of course, in that situation even a single unnecessary multiply is going to be a concern, and moreover, writing clear, concise code is quite simply a luxury you can't afford! Desperate times call for desperate measures.
    Are you actually claiming that making a call virtual clarifies the code? I'm laughing so hard I think I pooped a little...

    Quote Originally Posted by grumpy View Post
    Assuming virtual function support involves a virtual function table, the same class of overheads occur in C with using a "pointer to function" to call a function.
    Utter horse manure. A pointer to a function has no run-time overhead, that's the whole point of using them.

    Code:
    call foo
    becomes
    Code:
    call [foo]
    which involves only a single extra memory lookup.

    virtual adds more than a single lookup iirc.
    Last edited by abachler; 09-25-2009 at 10:09 PM.

  11. #11
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by abachler View Post
    Highly unlikely it will cause a page fault. it will most certainly cause cache misses, but only the first time it is called, after that the data will be in the cache. what slows it down is the extra memory accesses, which is why its very bad to use as part of an inner loop. Inner loops are inherently CPU bound if designed properly. Any extra work will slow the system down.

    Now with THAT said, its probable that in most cases you will never notice the difference.
    Yeah, page faults are rare with the amount of memory computers have these days, but I included that for completeness.

    A cache miss is a memory access. Or rather, it's a memory access that actually slows the CPU. An L1 cache hit will make a memory access as fast as a register access.

    Likely, cache misses will be more frequent than once per memory location, because multiple memory locations are mapped to the same cache location, and because in a modern multi-process environment will switch between applications often.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  12. #12
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Assuming virtual function support involves a virtual function table, the same class of overheads occur in C with using a "pointer to function" to call a function - apart from concerns of loading argument lists, it's necessary to retrieve the function pointer, load it, and then do an indirect call.

    An alternative mechanism (eg switch on the type, and call the corresponding function) has the overhead of selecting which function to call. Few compilers do this.

    Either way, as others have said, the overhead is usually insignificant outside of tight inner CPU-bound loops. There is a measurable overhead, as there is with anything, but significance depends on the context in which it is done.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  13. #13
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    I just wanted to point out that C++ provides another useful (tho' often unused) facility for virtual dispatch - The curiously recurring template pattern:

    Code:
    template < typename Derived >
    class behavior
    {
        public:
        
        template < typename Type >
        inline void act( Type const& data )
        {
            static_cast< Derived* >( this )->act( data );
        }
    };
    
    class foo : public behavior< foo >
    {
        public:
    
        inline void act( int data )
        {
            cout << data << endl;
        }
    };
    
    class bar : public behavior< bar >
    {
        public:
    
        inline void act( double data )
        {
            cout << data << endl;
        }
    };
    The most interesting points are:
    1) All of the function calls would be expanded inline by a decent compiler (depending on the internal code, naturally, but certainly the base class function would be in all cases).
    2) The base class function can be templated, for flexibility. If so, the derived classes can even define several functions with the same name, making it a much more powerful paradigm altogether, compared with 'ordinary' virtual functions.

  14. #14
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    So I did a test using this code:
    Code:
    // Test.cpp : Defines the entry point for the console application.
    //
    
    #include "stdafx.h"
    #include <windows.h>
    #include <iostream>
    
    void Test();
    
    namespace Constants
    {
    	const int Kilo = 1000;
    	const int Mega = Kilo * 1000;
    }
    
    int _tmain(int argc, _TCHAR* argv[])
    {
    	HANDLE h = OpenThread( THREAD_ALL_ACCESS, FALSE, GetCurrentThreadId() );
    	if (h == NULL)
    	{
    		std::cout << "Failed to open thread!\n";
    		return 1;
    	}
    	if (! SetThreadPriority(h, THREAD_PRIORITY_TIME_CRITICAL) )
    	{
    		std::cout << "Failed to set thread priority!\n";
    		return 1;
    	}
    	CloseHandle(h);
    
    	h = GetCurrentProcess(); //OpenProcess( GetCurrentProcessId(), FALSE, PROCESS_ALL_ACCESS );
    	if (h == NULL)
    	{
    		std::cout << "Failed to open process!\n";
    		return 1;
    	}
    	if (! SetPriorityClass(h, REALTIME_PRIORITY_CLASS) )
    	{
    		std::cout << "Failed to set process priority!\n";
    		return 1;
    	}
    	CloseHandle(h);
    
    	void (*pTest)() = &Test;
    	DWORD dwStart = GetTickCount();
    	for (int i = 0; i < 1000 * Constants::Mega; i++)
    		pTest();
    	std::cout << "Took " << GetTickCount() - dwStart << " ms.\n";
    
    	dwStart = GetTickCount();
    	for (int i = 0; i < 1000 * Constants::Mega; i++)
    		Test();
    	std::cout << "Took " << GetTickCount() - dwStart << " ms.\n";
    
    	return 0;
    }
    Test is just an empty function defined in another source file to prevent the compiler from optimizing away the function call.
    I set the process and thread priority to time critical to avoid as much outside interference as possible.
    I ran the test a total of 36 times, and these are the results I got:

    Function pointer (ms)
    3292
    3307
    3307
    3276
    3307
    3307
    3308
    3292
    3307
    3323
    3307
    3292
    3323
    3323
    3339
    3291
    3323
    3291
    3323
    3323
    3323
    3291
    3323
    3323
    3323
    3307
    3292
    3307
    3308
    3292
    3323
    3307
    3307
    3322
    3338
    3307
    3308

    Direct call (ms)
    3307
    3292
    3308
    3276
    3323
    3276
    3307
    3276
    3308
    3291
    3292
    3276
    3307
    3276
    3307
    3276
    3291
    3276
    3307
    3276
    3307
    3292
    3308
    3276
    3307
    3276
    3291
    3276
    3292
    3276
    3276
    3308
    3276
    3292
    3276
    3307
    3276

    Mean (ms)
    Function pointer: 3310
    Direct call: 3291

    Standard deviation (ms)
    Function pointer: 14,52609746311790
    Direct call: 14,74945919506710

    Standard uncertainty (ms)
    Function pointer: 0,3925972287329160
    Direct call: 0,3986340322991110

    Conclusion: the performance impact is negligible, yet there seems to be an impact on using the function pointer, although more conclusive tests would have to be done to say exactly.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  15. #15
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    Quote Originally Posted by Elysia View Post
    So I did a test using this code:

    Test is just an empty function defined in another source file to prevent the compiler from optimizing away the function call.
    I set the process and thread priority to time critical to avoid as much outside interference as possible.
    I ran the test a total of 36 times, and these are the results I got:


    Conclusion: the performance impact is negligible, yet there seems to be an impact on using the function pointer, although more conclusive tests would have to be done to say exactly.
    Quote Originally Posted by Visual Studio
    1>------ Build started: Project: OpenMP Sandbox, Configuration: No Debug Info Win32 ------
    1>Compiling...
    1>main.cpp
    1>Linking...
    1>LINK : warning LNK4224: /OPT:NOWIN98 is no longer supported; ignored
    1>main.obj : error LNK2001: unresolved external symbol "void __cdecl Test(void)" (?Test@@YAXXZ)
    1>E:\Projects\Double Vision Recorder\No Debug Info\OpenMP Sandbox.exe : fatal error LNK1120: 1 unresolved externals
    1>Build log was saved at "file://e:\Projects\Double Vision Recorder\OpenMP Sandbox\No Debug Info\BuildLog.htm"
    1>OpenMP Sandbox - 2 error(s), 1 warning(s)
    ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
    Odd that it wouldn't even compile. Ah I see well, after I actually defined test, it failed to open the thread, so perhaps the code is a bit buggy. Change that monstrosity at the start to this -
    Code:
    	if (! SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL) ){
    		std::cout << "Failed to set thread priority!\n";
    		return 1;
    		}
    Here, i cleaned it up and among other things removed the part that includes the allocation fo the local variable as a penalty to the pointer routine. probably negligeable but its bad form.
    Code:
    #include <windows.h>
    #include <iostream>
    
    void Test(){
    	
    	return;
    	}
    
    namespace Constants
    {
    	const int Kilo = 1000;
    	const int Mega = Kilo * 1000;
    }
    
    int main(int argc, char* argv[]){
    	if (! SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL) ){
    		std::cout << "Failed to set thread priority!\n";
    		return 1;
    		}
    	if (! SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS) ){
    		std::cout << "Failed to set process priority!\n";
    		return 1;
    		}
    
    	void (*pTest)() = &Test;
    	DWORD dwStartpFunc , dwStartFunc;
    	DWORD dwStoppFunc , dwStopFunc;
    	dwStartpFunc = GetTickCount();
    	for (int i = 0; i < 1000 * Constants::Mega; i++) pTest();
    	dwStoppFunc = GetTickCount();
    	std::cout << "Took " << dwStoppFunc - dwStartpFunc << " ms.\n";
    
    	dwStartFunc = GetTickCount();
    	for (int i = 0; i < 1000 * Constants::Mega; i++) Test();
    	dwStopFunc = GetTickCount();
    	std::cout << "Took " << dwStopFunc - dwStartFunc << " ms.\n";
    
    	return 0;
    	}
    and apparently I cant get VS2008 to stop optimizing the function call, either that or it claims my computer is performing one billion increments in less than 15ms

    Ok, the problem came down to forcing a rebuild. And teh results i got where below the timer resolution.

    Here is the final code -

    Test.cpp
    Code:
    extern unsigned long Count;
    
    void Test(unsigned long* Junk){
    	
    	*Junk+=2;
    	
    	return;
    	}
    main.cpp
    Code:
    #include <windows.h>
    #include <iostream>
    
    extern void Test(unsigned long*);
    unsigned long Junk = 0;
    unsigned long Count = 1000000000;
    
    namespace Constants
    {
    	const int Kilo = 1000;
    	const int Mega = Kilo * 1000;
    }
    
    int main(int argc, char* argv[]){
    	if (! SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL) ){
    		std::cout << "Failed to set thread priority!\n";
    		return 1;
    		}
    	if (! SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS) ){
    		std::cout << "Failed to set process priority!\n";
    		return 1;
    		}
    	
    	for(int y = 0;y<10;y++){
    	// first we make sure the code for test() is in teh cache, so the pointer routine doesnt getpenalized
    		for(int x = 0;x<1000;x++) Test(&Junk);
    
    		void (*pTest)(unsigned long*) = &Test;
    		DWORD dwStartpFunc , dwStartFunc;
    		DWORD dwStoppFunc , dwStopFunc;
    			
    		dwStartpFunc = GetTickCount();
    		for (int i = 0; i < Count; i++) pTest(&Junk);
    		dwStoppFunc = GetTickCount();
    		std::cout << "pFunc Took " << dwStoppFunc - dwStartpFunc << " ms.\n";
    
    		dwStartFunc = GetTickCount();
    		for (int i = 0; i < Count; i++) Test(&Junk);
    		dwStopFunc = GetTickCount();
    		std::cout << "Func Took " << dwStopFunc - dwStartFunc << " ms.\n";
    
    		std::cout << "\n";
    		}
    
    	return 0;
    	}
    results indicate that there is no zero, or very little difference in speed of either method.
    Last edited by abachler; 09-26-2009 at 09:41 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 26
    Last Post: 07-05-2010, 10:43 AM
  2. Replies: 48
    Last Post: 09-26-2008, 03:45 AM
  3. pure virtual calls in destructor
    By FillYourBrain in forum C++ Programming
    Replies: 2
    Last Post: 08-21-2003, 08:31 AM
  4. C++ XML Class
    By edwardtisdale in forum C++ Programming
    Replies: 0
    Last Post: 12-10-2001, 11:14 PM
  5. Exporting Object Hierarchies from a DLL
    By andy668 in forum C++ Programming
    Replies: 0
    Last Post: 10-20-2001, 01:26 PM