Thread: Optimization question

  1. #1
    Registered User
    Join Date
    Nov 2008
    Posts
    41

    Optimization question

    Which of the following is faster?:
    A :
    Code:
    float f1; float f2;
    //... f1 and f2 are modified
    f1 *= x / y;
    f2 *= x / y;
    OR
    B:
    Code:
    float f1; float f2;
    //... f1 and f2 are modified
    float temp = x / y;
    f1 *= temp;
    f2 *= temp;
    So is it quicker to make an extra float and only do the multiplication once? This is something I need to do probably 500 times per second.



    I am also wondering whether there are any unmentioned costs to creating an object. I have a Vector3d class which has 3 float members and a heap of methods. If I create one of those is that going to take as long as creating a struct with 3 floats? Also, does it take up the same amount of memory as a struct with 3 floats?

  2. #2
    Registered User Kudose's Avatar
    Join Date
    Jun 2006
    Posts
    92
    What has your testing yielded?

  3. #3
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Leave it to the compiler/optimizer. Modern optimizers are really smart, and will do this kind of simple optimizations for you if it thinks it will help.

    Unless you are extremely familiar with computer architecture, chances are you won't beat the optimizer at dirty optimization.

    Just write your code for clarity, and leave optimizations to the compiler.

    Most of the time programmers only need to worry about optimizations on the algorithmic level.

    As for the second question, I MAY BE WRONG, but I think the cost is the same as making a struct, assuming an empty constructor. Member functions are essentially functions that take a hidden "this" pointer, and they don't actually go with each instance of the object in memory.

  4. #4
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    It is quite likely that a compiler would perform common subexpression elimination in release build.

    Option B is definitely better though because it almost certainly increases the speed of your debug builds, which can be a godsend. As far as release builds are concerned, division is slow enough that I would not leave the common subexpression elimination to chance. When performed frequently division makes a huge performance impact!
    B is also clearer anyway, as it shows that you intended for both to be multiplied by the same value.

    As for whether your 'temp' variable takes up any extra time: It almost certainly wont even exist as a variable on the stack in a release build. It'll be optimised away by the compiler. Even if it wasn't, the performance impact would be minimal, compared to having to perform an extra division. Also, for common sub-expression elimination to be performed, the compiler needs to generate code to do the equivalent of your 'temp' variable anyway.

    Interestingly, this code:
    Code:
    float inv = 1/z;
    float a = x*inv;
    float b = y*inv;
    is faster than this code:
    Code:
    float a = x/z;
    float b = y/z;
    even though the first one has three operations instead of two. Floating point division really is that slow compared to multiplication. (Assuming your average modern PC here of course)
    Last edited by iMalc; 02-13-2009 at 12:07 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by iMalc View Post
    Interestingly, this code:
    Code:
    float inv = 1/z;
    float a = x*inv;
    float b = y*inv;
    is faster than this code:
    Code:
    float a = x/z;
    float b = y/z;
    even though the first one has three operations instead of two. Floating point division really is that slow compared to multiplication. (Assuming your average modern PC here of course)
    Yes, if you frequently divide by the same value, then it is highly recommended to calculate the inverse of the number and use multiplication. (And when it's feasible in integer arithmetics, replacing divide by multiply is recommended here too - but it's often much harder to achieve that).

    I also agree that I'd expect the CSE (Common-sub-expression) optimizer to spot this type of thing if it's not too far apart in the code - but if it's reasonable, then using a temporary variable to optimize it yourself is unlikely to make it worse.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Out of curiosity, I know division is much slower than multiplication, but by how many times approximately? 5x? 10x? 20x? (for both integers and floating points).

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Reading the optimization guide for AMD Family 10h processors, FDIV is about 16-24 cycles (time depends on size of the operand - 16-cycles for 32-bit (float) and 24 cycles for 80-bit (long double) result), and FMUL is 4 cycles - so we're looking at 4-6x longer for a divide than a multiply. So you need two divides to make it faster to use 1/x and multiply.

    SSE instructions are about the same - 16-20 cycles for all types of DIV instructions and 4 (6 if one operand is in memory) cycles for MUL.

    I'd expect Intels timings to be similar if not the same.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Ah I see, thanks.

  9. #9
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    the second one is faster, although as stated, the compiler may optize the first one anyway, and produce teh same output as the second one. But I wouldnt bet a plug nickel that teh compiler will properly optimize anything. half the time I can do way better, and the rest of the time I do just as well.

  10. #10
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by abachler View Post
    half the time I can do way better, and the rest of the time I do just as well.
    Ah, yes. That's consistent. 90% of all programmers think that are in the top 10% too.

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Whilst computing the inverse just once and multiplying is correct from a mathematical point of view, you could end up with a different computational result.

    For large z, your inverse might be approximately zero, which would give a very wrong answer in the following multiplication.

    Eg.
    x, y, z are in the range of say 10,000,000 to 20,000,000
    By being >6 decimal digits greater than 1, the division is in a whole heap-o-trouble with respect to rounding errors.
    But 20,000,000 / 10,000,000 is nice and easy because the magnitudes are comparable.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by Salem View Post
    Whilst computing the inverse just once and multiplying is correct from a mathematical point of view, you could end up with a different computational result.
    That's quite true. I had an experience, several years ago, with an early optimizing compiler (Fortran 77, not C++) which did exactly that. It caused an intermittent instability of an otherwise stable numerical algorithm - and it was a .......... to track down because the behaviour was different when the code was compiled without optimisation for debugging.

    One of the most fruitful sources of bugs in numerical code is the programmer who assumes that a series of mathematically equivalent operations on real numbers are also computationally equivalent with floating point. Whereas, the reality is that floating point variables are only an approximate representation of real variables

  13. #13
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by cyberfish View Post
    Leave it to the compiler/optimizer. Modern optimizers are really smart, and will do this kind of simple optimizations for you if it thinks it will help.

    Unless you are extremely familiar with computer architecture, chances are you won't beat the optimizer at dirty optimization.

    Just write your code for clarity, and leave optimizations to the compiler.
    I would dispute the clarity of writing the same expression twice. If both values are always supposed to be multiplied by the same quantity, then this quantity should be precomputed, not for optimization purposes, but to ensure that in fact the same value is used for both multiplications. If this expression was more complex, and needed to be modified, then it would have to be modified in two places, creating a opening to make a mistake.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  14. #14
    Registered User
    Join Date
    Nov 2008
    Posts
    41
    Thanks for the informative responses everyone!

    Is cyberfish the only one venturing an answer to my second question?

  15. #15
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    cyberfish's answer to your second question mostly covered it.

    Technically, a class and a struct are the same thing in C++. The only difference is that class attributes and members are private by default, and struct attributes and members are public. Those affect whether the compiler allows operations but, if an operation is allowed, the runtime performance is usually the same: the access checks are not usually repeated at run time.

    Any difference in performance or memory usage with using one approach or the other depends on the compiler. For example, an alterative to a getter/setter pair is simply making the corresponding data member public. Taken literally, that can suggest a difference in performance but compilers are also free to optimise (eg inlining the getter/setter functions, and therefore eliminate the associated function-call overhead).

    That's why it's generally considered better to make code readable and understandable. And only worry about performance if it is really necessary - which means bringing out profilers to find the real hot-spots. Totally at odds with what most learning programmers do (premature optimisation) but that's life.

    Quote Originally Posted by Noise View Post
    I am also wondering whether there are any unmentioned costs to creating an object. I have a Vector3d class which has 3 float members and a heap of methods. If I create one of those is that going to take as long as creating a struct with 3 floats? Also, does it take up the same amount of memory as a struct with 3 floats?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Turn Off Optimization?
    By danlee58 in forum C Programming
    Replies: 6
    Last Post: 12-10-2008, 03:52 AM
  2. need reading material for c++ database optimization
    By elninio in forum C++ Programming
    Replies: 0
    Last Post: 07-24-2008, 11:32 PM
  3. Optimization settings
    By Roaring_Tiger in forum C Programming
    Replies: 4
    Last Post: 02-23-2005, 02:53 AM
  4. Optimization Question
    By saxman in forum C Programming
    Replies: 7
    Last Post: 06-30-2004, 10:49 PM
  5. Question...
    By TechWins in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 07-28-2003, 09:47 PM