Thread: Condition Performance Impact.

  1. #1
    System Novice siavoshkc's Avatar
    Join Date
    Jan 2006
    Location
    Tehran
    Posts
    1,246

    Exclamation Condition Performance Impact.

    I designed a test program to show the speed reduction caused by conditional statements.

    Here is the code:
    Code:
    #include <iostream>
    #define WINDOWS_LEAN_AND_MEAN
    #include <windows.h>
    using namespace std;
    int TestCondition(int val)
    {
    	if(val < 23)
    		val = val * 20;
    	val -= 322;
    	return val;
    }
    
    int TestNoCondition(int val)
    {
    	val = val * 20;
    	val -= 322;
    	return val;
    
    }
    
    int main()
    {
    	LARGE_INTEGER temp, tc, tnc;
    	int val = -100;
    	for (int j = 190 ; j < 200 ; ++j){
    		QueryPerformanceCounter(&temp);
    		for(int i = 0; i <100000*j ;i++)
    		{
    			val += TestCondition(i*j);
    		}
    		QueryPerformanceCounter(&tc);
    		tc.LowPart -= temp.LowPart;
    		cout << val << "  "; 
    		val = -100;
    		QueryPerformanceCounter(&temp);
    		for(int i = 0; i <100000*j ;i++)
    		{
    			val += TestNoCondition(i*j);
    		}
    		QueryPerformanceCounter(&tnc);
    		tnc.LowPart -= temp.LowPart;
    		cout << val << endl; 
    		val = 0;
    		cout << "With condition: " << tc.LowPart << endl;
    		cout << "Without condition: " << tnc.LowPart << endl;
    	}
    	return 0;
    }
    It needs Windows because of its precise performance counter.

    This is assembly code of two functions:
    This one has a condition in it. So val = val * 20 will be executed conditionaly.
    Code:
      push        ebp  
      mov         ebp,esp 
      sub         esp,40h 
      push        ebx  
      push        esi  
      push        edi  
    	if(val < 23)
      cmp         dword ptr [ebp+8],17h 
      jge         0041F4A8 
    	  val = val * 20;
      mov         eax,dword ptr [ebp+8] 
      imul        eax,eax,14h 
      mov         dword ptr [ebp+8],eax 
    	val -= 322;
      mov         eax,dword ptr [ebp+8] 
      sub         eax,142h 
      mov         dword ptr [ebp+8],eax 
    	return val;
      mov         eax,dword ptr [ebp+8]
    This one has no condition:
    Code:
      push        ebp  
      mov         ebp,esp 
      sub         esp,40h 
      push        ebx  
      push        esi  
      push        edi  
    	val = val * 20;
      mov         eax,dword ptr [ebp+8] 
      imul        eax,eax,14h 
      mov         dword ptr [ebp+8],eax 
    	val -= 322;
      mov         eax,dword ptr [ebp+8] 
      sub         eax,142h 
      mov         dword ptr [ebp+8],eax 
    	return val;
      mov         eax,dword ptr [ebp+8]
    In my PC tnc is half of tc.
    Learn C++ (C++ Books, C Books, FAQ, Forum Search)
    Code painter latest version on sourceforge DOWNLOAD NOW!
    Download FSB Data Integrity Tester.
    Siavosh K C

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Speed is irrelevant when you have to sacrifice correctness for it. If this is about your linked list, then you should still have the checks, simply because if you don't, the checks will just have to be outside the functions. You haven't gained speed, but you made your life a lot harder.
    At the very least, you must have these checks in debug mode. assert() is the way to go.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    You can't just pretend that a LARGE_INTEGER is the same size as its LowPart.
    If you're going to subtract them try casting them to __int64 instead.

    I'm very pleased that you're forcing the compiler to compile the code as you intend by actually using the return results of everything etc, and even printing out the result. You've already done MUCH better than most amatuer performance benchmarks.

    However something is still very wrong with it...
    The reason is that a for-loop itself involves a conditional branch as well. Now if it were as straightforward as every conditional branch taking X extra time, then you're comparing something that does two compares plus extra stuff (mult add etc), to something that does one compare plus the same extra stuff. So how can it be that one of the compares accounts for half of the time, and the rest accounts for zero time?
    The simple answer is that it doesn't. The for loop obviously plays better with branch prediction in this case, and is much faster than the unpredictible branch that sets the functions apart.

    Now that you've essentially proven that corrrectly predicted branches are much faster, apply that to your linked-list case where most of the time the user isn't trying to pop a node off an empty list. Hmm, since that should be by far the most likely case, the branch will also be predictable and hence as fast as the one in the for-loop in your test.

    One lesson here is that you must take ALL of the executed code in the test program into account in a benchmark. You also haven't considered switching the order of the test have you? Putting the no branch one first might give you different results again, but for a different reason.
    Last edited by iMalc; 07-19-2007 at 02:56 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    System Novice siavoshkc's Avatar
    Join Date
    Jan 2006
    Location
    Tehran
    Posts
    1,246
    You also haven't considered switching the order of the test have you? Putting the no branch one first might give you different results again, but for a different reason.
    Yes, it changes the results. Why?
    Learn C++ (C++ Books, C Books, FAQ, Forum Search)
    Code painter latest version on sourceforge DOWNLOAD NOW!
    Download FSB Data Integrity Tester.
    Siavosh K C

  5. #5
    System Novice siavoshkc's Avatar
    Join Date
    Jan 2006
    Location
    Tehran
    Posts
    1,246
    EDIT: I changed the program [AGAIN]:
    Code:
    #define WINDOWS_LEAN_AND_MEAN
    #include <iostream>
    #include <windows.h>
    using namespace std;
    int TestCondition(int val)
    {
    	int *ptr = new int;
    	if(val < 23 || val > 60)
    		val = *ptr ;
    	else
    		val*=2;
    	
    	val -= 322;
    	val = val >> 1 ;
    	val++;
    	delete ptr;
    	return val;
    }
    
    int TestNoCondition(int val)
    {
    	int *ptr = new int;
    	val = *ptr ;
    	val -= 322;
    	val = val >> 1 ;
    	val++;
    	delete ptr;
    	return val;
    
    }
    
    void PCounterReadinessTest()
    {
    	LARGE_INTEGER lpF;
    	if(!QueryPerformanceFrequency(&lpF)){
    		cout << "No hardware performance counter detected by Windows." << endl ;
    		return;
    	}
    	cout <<"Freq: " << lpF.QuadPart << "/ sec." << endl;
    }
    
    int main()
    {
    	PCounterReadinessTest();
    	
    	SetPriorityClass(GetCurrentProcess(),HIGH_PRIORITY_CLASS);
    	for(int n=0 ; n<10; ++n){
    		LARGE_INTEGER t1, t2;
    		__int64 sumNC = 0, sumC = 0, val = 0;
    		for(int i = 0; i <1000;i++)
    		{
    			QueryPerformanceCounter(&t1);
    			val += TestNoCondition(i);
    			QueryPerformanceCounter(&t2);
    			
    			sumNC += static_cast<__int64>(t2.QuadPart - t1.QuadPart);
    
    			QueryPerformanceCounter(&t1);
    			val += TestCondition(i);
    			QueryPerformanceCounter(&t2);
    
    			sumC += static_cast<__int64>(t2.QuadPart - t1.QuadPart);			
    		}	
    	
    		cout << val << " "; 
    		val = 0;
    
    		for(int i = 0; i <1000 ;i++)
    		{
    			QueryPerformanceCounter(&t1);
    			val += TestCondition(i);
    			QueryPerformanceCounter(&t2);
    			
    			sumC += static_cast<__int64>(t2.QuadPart - t1.QuadPart);
    
    			QueryPerformanceCounter(&t1);
    			val += TestNoCondition(i);
    			QueryPerformanceCounter(&t2);
    			
    			sumNC += static_cast<__int64>(t2.QuadPart - t1.QuadPart);
    		}
    		
    		cout <<"  "<< val << " " << endl;
    	
    		cout << "C - NC: " << sumC - sumNC << endl;
    	}
    
    	SetPriorityClass(GetCurrentProcess(),NORMAL_PRIORITY_CLASS);
    	return 0;
    }
    Last edited by siavoshkc; 07-20-2007 at 04:02 AM.
    Learn C++ (C++ Books, C Books, FAQ, Forum Search)
    Code painter latest version on sourceforge DOWNLOAD NOW!
    Download FSB Data Integrity Tester.
    Siavosh K C

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Will too much IF STATEMENT decrease performance
    By beyonddc in forum C++ Programming
    Replies: 10
    Last Post: 04-18-2011, 02:07 PM
  2. Performance and footprint of virtual function
    By George2 in forum C++ Programming
    Replies: 8
    Last Post: 01-31-2008, 07:34 PM
  3. File map performance
    By George2 in forum C++ Programming
    Replies: 8
    Last Post: 01-04-2008, 04:18 AM
  4. Race condition
    By Roaring_Tiger in forum C Programming
    Replies: 5
    Last Post: 10-24-2004, 09:42 PM
  5. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM