Thread: Need optimization for faster performance

  1. #1
    Registered User
    Join Date
    Nov 2011
    Posts
    83

    Need optimization for faster performance

    Which one is faster for floating point operation?
    x = a / 2.0f;

    or

    x = a * 0.5f;

    I have seen some codes using the bottom one but that I believe adds to un-readability. Also even so, will the compiler (MSVC) automatically optimizes operations such as those for me?

  2. #2
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Your question is pointless. The fact you are even asking it shows you are indulging in premature optimisation: attempting to squeeze performance out of low-level operations, when there is rarely any practical benefit in doing so.

    Which is faster depends on how your hardware and system supports floating point. It also depends on what types x and a are, because various type conversions may also occur.

    Many many moons ago, it could often be assumed that floating point multiplication would be (very very slightly) faster than an equivalent division. However, there have always been a few (admittedly rare) real-world exceptions to that statement, depending on the quality of implementation of floating point operations in hardware. With the complexity of modern processors and operating systems you would need to do measurements to be 100% certain.

    Most good quality compilers will optimise operations involving compile-time constants quite happily. So the difference should be moot.

    Practically, it is usually better to write the code in a form that is readable and maintainable. It is a VERY rare program that will any suffer serious performance problems if you make the wrong choice between dividing by 2 versus multiplication by 0.5. Hence my opening comment about premature optimisation: you are wasting your time worrying about such things.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  3. #3
    Registered User
    Join Date
    Nov 2011
    Posts
    83
    I am rewriting some functions of a game. The thing is that there are now larger nodes to be searched than before. You can see that I have removed sqrt operations in the hopes of making this function very time critical. Is there any way it can be optimized more?
    Code:
    int CPathFind::FindNodeClosestToCoors(float fX, float fY, float fZ, unsigned char iPathDataFor, float fRangeCoefficient, bool bCheckIgnored, bool bCheckRestrictedAccess, bool bCheckUnkFlagFor2, bool bIsVehicleBoat){
    	int iStartNodeIndex, iEndNodeIndex;
    	
    	switch (iPathDataFor)
    	{
    		case PATHDATAFOR_CAR:
    			iStartNodeIndex = 0;
    			iEndNodeIndex = m_nCarAttachedNodes;
    			break;
    		case PATHDATAFOR_PED:
    			iStartNodeIndex = m_nCarAttachedNodes;
    			iEndNodeIndex = m_nAttachedNodes;
    			break;
    	}
    	
    	float fPrevFoundRangeCoeff = 10000.0f;
    	int iPrevFoundRangedNode = 0;
    	CPathNode* pNode = m_AttachedPaths[iStartNodeIndex];
    	for (int i = iStartNodeIndex; i < iEndNodeIndex; i++){
    		if ((bCheckIgnored == false || !(pNode->bitIgnoredNode)) &&
    		   (bCheckRestrictedAccess == false || !(pNode->bitRestrictedAccess)) &&
    		   (bCheckUnkFlagFor2 == false || !(pNode->bitUnkFlagFor2)) &&
    		   (bIsVehicleBoat == pNode->bitIsVehicleBoat))
    		{
    			fXDiff = utl::abs<float>(((float)pNode->wX / 8.0f) - fX);
    			fYDiff = utl::abs<float>(((float)pNode->wY / 8.0f) - fY);
    			fZDiff = utl::abs<float>(((float)pNode->wZ / 8.0f) - fZ);
    			
    			float fCurrentCoeff = fXDiff + fYDiff + fZDiff * 3.0f;
    			if ( fCurrentCoeff < fPrevFoundRangeCoeff){
    				fPrevFoundRangeCoeff = fCurrentCoeff;
    				iPrevFoundRangedNode = i;
    			}
    		}
    		pNode++;
    	}
    	if ( fPrevFoundRangeCoeff < fRangeCoefficient)
    		return iPrevFoundRangedNode;
    	else 
    		return -1;
    }

  4. #4
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    I would suggest looking at the structure of the program overall, and using a profiler to determine IF your function is time-critical, rather than just trying to optimise it in the hope that it will be time-critical.

    I'll ignore the fact that you have left out some parts of your real code, so what you have given will not even compile.

    But, since you have the bit in your teeth, and are insisting - incorrectly - on optimising the code .....

    You can almost certainly restructure that if() statement in the loop, and hoist some checks out of the loop, given that some values (bCheckIgnored, bCheckRestrictedAccess, etc) are not changed inside the loop.

    You could also do the trick (using the fact that fX, fY, and fZ are passed by value, so any changes of their values is invisible to the caller) of multiplying them each by 8.0 BEFORE the loop, then compute fXDiff as utl::abs<float>(pNode->wX - fX) (similarly for fYDiff and fZDiff). Having done that, compute "fCurrentCoeff = (fXDiff + fYDiff + fZDiff*3)/8.0". That will reduce the total number of divisions your code is doing, particularly for larger values of iEndNodeIndex.

    Given that you are explicitly converting pNode->wX (and other members) to float, it is a fair bet you can do several operations on integers, rather than on floats. Apart from the fact that some of the function arguments are of type float, there is little in that code which inherently requires floating point operations. It is somewhat uncommon (except on the odd superscalar machine) that floating point operations are more efficient than operations on integers.

    Assuming utl::abs<float> is simply computing an absolute value, you could replace it with std::abs(). It is a fair bet that std::abs() will be implemented a bit more efficiently than a comparable function in a third-party library.

    There are little tricks like using pre-increment rather than post-increment, and using a pointer as the control variable for a loop (rather than counting with an int, and incrementing a pointer each time through the loop). However, those probably won't gain you much, unless your compiler is rather weak with optimisation.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  5. #5
    Registered User
    Join Date
    Dec 2011
    Posts
    795
    If you really want to divide by two in the fastest possible way, use bitwise operators:

    Code:
    x = a >> 1;
    An optimizing compiler will replace "x = a/2" with this, not your latter solution.


    Edit: but to restate what everyone else said, this kind of optimization is pointless. Let's say you're using the common 60 hertz refresh rate on your display. It is refreshed every 10 milliseconds. And let's say your processor is running at 2 gHz, one cycle is then approximately 0.5 nanoseconds.

    And, assuming that a bitwise shift takes approximately one cycle, you could perform literally twenty million calculations before the user can even see the change.
    Last edited by memcpy; 06-03-2012 at 07:11 AM.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Swoorup View Post
    Which one is faster for floating point operation?
    Since you're interested in stuff like that, maybe the question you should be asking is "how can I benchmark small sections of code"?

    WRT this particular issue though, I think unless you find some very significant difference first and then ask for an explanation and it turns out there is some universal, assembly level factor at play, it is not worth thinking about.

    I disagree about the readability thing too. Personally, I'm inclined to use the first version when I half something, but I wouldn't claim it is any more or less clear than the second. Something tells me there will not come a time when you look back at this and go "a * 0.5 ?? WTF was I thinking??"
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Awesome, grumpy covered all of the things I thought of, especially the multiply by 8 beforehand and std::abs().
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  8. #8
    Registered User
    Join Date
    Nov 2011
    Posts
    83
    thank you guys,
    I had been implementing the abs function as an inline function.
    Code:
    template <typename T> inline const T abs(T const & x){
            return ( x<0 ) ? -x : x;
        }
    Thank you grumpy. Directly working on integer, short is lot more faster

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Swoorup View Post
    Directly working on integer, short is lot more faster
    What do you mean by that?

    Have you done all the optimisations grumpy suggested?
    Would you perhaps like to post the updated code so that we can check that you've both got it right and to confirm that no further improvements can be made?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Pretty sure an optimizing compiler should be able to inline std::abs as well...
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Optimization
    By Richardcavell in forum C Programming
    Replies: 2
    Last Post: 04-06-2011, 08:48 AM
  2. help w/optimization
    By dudeomanodude in forum C++ Programming
    Replies: 3
    Last Post: 02-08-2008, 02:45 PM
  3. GCC 4.2.0 SRA optimization
    By brewbuck in forum A Brief History of Cprogramming.com
    Replies: 8
    Last Post: 08-16-2007, 08:47 AM
  4. optimization
    By strickey in forum C++ Programming
    Replies: 7
    Last Post: 04-19-2007, 05:28 AM
  5. IDE optimization
    By Traveller in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 07-04-2002, 02:01 AM