Optimization question

**Noise** · 02-12-2009

Which of the following is faster?:
A :

Code:

float f1; float f2;
//... f1 and f2 are modified
f1 *= x / y;
f2 *= x / y;

OR
B:

Code:

float f1; float f2;
//... f1 and f2 are modified
float temp = x / y;
f1 *= temp;
f2 *= temp;

So is it quicker to make an extra float and only do the multiplication once? This is something I need to do probably 500 times per second.

I am also wondering whether there are any unmentioned costs to creating an object. I have a Vector3d class which has 3 float members and a heap of methods. If I create one of those is that going to take as long as creating a struct with 3 floats? Also, does it take up the same amount of memory as a struct with 3 floats?

**Kudose** · 02-12-2009

What has your testing yielded?

**cyberfish** · 02-12-2009

Leave it to the compiler/optimizer. Modern optimizers are really smart, and will do this kind of simple optimizations for you if it thinks it will help.

Unless you are extremely familiar with computer architecture, chances are you won't beat the optimizer at dirty optimization.

Just write your code for clarity, and leave optimizations to the compiler.

Most of the time programmers only need to worry about optimizations on the algorithmic level.

As for the second question, I MAY BE WRONG, but I think the cost is the same as making a struct, assuming an empty constructor. Member functions are essentially functions that take a hidden "this" pointer, and they don't actually go with each instance of the object in memory.

**iMalc** · 02-13-2009

It is quite likely that a compiler would perform common subexpression elimination in release build.

Option B is definitely better though because it almost certainly increases the speed of your debug builds, which can be a godsend. As far as release builds are concerned, division is slow enough that I would not leave the common subexpression elimination to chance. When performed frequently division makes a huge performance impact!
B is also clearer anyway, as it shows that you intended for both to be multiplied by the same value.

As for whether your 'temp' variable takes up any extra time: It almost certainly wont even exist as a variable on the stack in a release build. It'll be optimised away by the compiler. Even if it wasn't, the performance impact would be minimal, compared to having to perform an extra division. Also, for common sub-expression elimination to be performed, the compiler needs to generate code to do the equivalent of your 'temp' variable anyway.

Interestingly, this code:

Code:

float inv = 1/z;
float a = x*inv;
float b = y*inv;

is faster than this code:

Code:

float a = x/z;
float b = y/z;

even though the first one has three operations instead of two. Floating point division really is that slow compared to multiplication. (Assuming your average modern PC here of course)

**matsp** · 02-13-2009

Originally Posted by iMalc

Interestingly, this code:

Code:

float inv = 1/z;
float a = x*inv;
float b = y*inv;

is faster than this code:

Code:

float a = x/z;
float b = y/z;

even though the first one has three operations instead of two. Floating point division really is that slow compared to multiplication. (Assuming your average modern PC here of course)

Yes, if you frequently divide by the same value, then it is highly recommended to calculate the inverse of the number and use multiplication. (And when it's feasible in integer arithmetics, replacing divide by multiply is recommended here too - but it's often much harder to achieve that).

I also agree that I'd expect the CSE (Common-sub-expression) optimizer to spot this type of thing if it's not too far apart in the code - but if it's reasonable, then using a temporary variable to optimize it yourself is unlikely to make it worse.

--
Mats

**cyberfish** · 02-13-2009

Out of curiosity, I know division is much slower than multiplication, but by how many times approximately? 5x? 10x? 20x? (for both integers and floating points).

**matsp** · 02-13-2009

Reading the optimization guide for AMD Family 10h processors, FDIV is about 16-24 cycles (time depends on size of the operand - 16-cycles for 32-bit (float) and 24 cycles for 80-bit (long double) result), and FMUL is 4 cycles - so we're looking at 4-6x longer for a divide than a multiply. So you need two divides to make it faster to use 1/x and multiply.

SSE instructions are about the same - 16-20 cycles for all types of DIV instructions and 4 (6 if one operand is in memory) cycles for MUL.

I'd expect Intels timings to be similar if not the same.

--
Mats

**cyberfish** · 02-13-2009

Ah I see, thanks.

**abachler** · 02-13-2009

the second one is faster, although as stated, the compiler may optize the first one anyway, and produce teh same output as the second one. But I wouldnt bet a plug nickel that teh compiler will properly optimize anything. half the time I can do way better, and the rest of the time I do just as well.

**grumpy** · 02-13-2009

Originally Posted by abachler

half the time I can do way better, and the rest of the time I do just as well.

Ah, yes. That's consistent. 90% of all programmers think that are in the top 10% too.

**Salem** · 02-13-2009

Whilst computing the inverse just once and multiplying is correct from a mathematical point of view, you could end up with a different computational result.

For large z, your inverse might be approximately zero, which would give a very wrong answer in the following multiplication.

Eg.
x, y, z are in the range of say 10,000,000 to 20,000,000
By being >6 decimal digits greater than 1, the division is in a whole heap-o-trouble with respect to rounding errors.
But 20,000,000 / 10,000,000 is nice and easy because the magnitudes are comparable.

**grumpy** · 02-13-2009

Originally Posted by Salem

Whilst computing the inverse just once and multiplying is correct from a mathematical point of view, you could end up with a different computational result.

That's quite true. I had an experience, several years ago, with an early optimizing compiler (Fortran 77, not C++) which did exactly that. It caused an intermittent instability of an otherwise stable numerical algorithm - and it was a .......... to track down because the behaviour was different when the code was compiled without optimisation for debugging.

One of the most fruitful sources of bugs in numerical code is the programmer who assumes that a series of mathematically equivalent operations on real numbers are also computationally equivalent with floating point. Whereas, the reality is that floating point variables are only an approximate representation of real variables

**brewbuck** · 02-13-2009

Originally Posted by cyberfish

Leave it to the compiler/optimizer. Modern optimizers are really smart, and will do this kind of simple optimizations for you if it thinks it will help.

Unless you are extremely familiar with computer architecture, chances are you won't beat the optimizer at dirty optimization.

Just write your code for clarity, and leave optimizations to the compiler.

I would dispute the clarity of writing the same expression twice. If both values are always supposed to be multiplied by the same quantity, then this quantity should be precomputed, not for optimization purposes, but to ensure that in fact the same value is used for both multiplications. If this expression was more complex, and needed to be modified, then it would have to be modified in two places, creating a opening to make a mistake.

**Noise** · 02-13-2009

Thanks for the informative responses everyone!

Is cyberfish the only one venturing an answer to my second question?

**grumpy** · 02-13-2009

cyberfish's answer to your second question mostly covered it.

Technically, a class and a struct are the same thing in C++. The only difference is that class attributes and members are private by default, and struct attributes and members are public. Those affect whether the compiler allows operations but, if an operation is allowed, the runtime performance is usually the same: the access checks are not usually repeated at run time.

Any difference in performance or memory usage with using one approach or the other depends on the compiler. For example, an alterative to a getter/setter pair is simply making the corresponding data member public. Taken literally, that can suggest a difference in performance but compilers are also free to optimise (eg inlining the getter/setter functions, and therefore eliminate the associated function-call overhead).

That's why it's generally considered better to make code readable and understandable. And only worry about performance if it is really necessary - which means bringing out profilers to find the real hot-spots. Totally at odds with what most learning programmers do (premature optimisation) but that's life.

Originally Posted by Noise

I am also wondering whether there are any unmentioned costs to creating an object. I have a Vector3d class which has 3 float members and a heap of methods. If I create one of those is that going to take as long as creating a struct with 3 floats? Also, does it take up the same amount of memory as a struct with 3 floats?

Thread: Optimization question

Thread Tools

Search Thread

Display

Optimization question

Similar Threads

Turn Off Optimization?

need reading material for c++ database optimization

Optimization settings

Optimization Question

Question...