macro functions/efficiency tricks (old-style?)

**jsaetrum** · 01-04-2012

Hi,

taken from the PD code section from Numerical Recipes Book for C from 1992, a macro is defined as follows:

Code:

#define SQR(a) ((sqrarg=(a)) = 0.0 ? 0.0 : sqrarg * sqrarg )

This may be old-style, but are there any benefits today in testing for 0 to avoiding multiplies with 0.0 (for efficiency I suppose?).
And why are the a-parameter assigned to sqrarg and not used directly in the expression?

Thanks!

**brewbuck** · 01-04-2012

It seems it would only be beneficial if a great majority of the values being squared are equal to 0.0 -- if that's the case, surely you know it is, and the optimization should be performed at a higher level in order to make it visible and explicit, not tuck it away hidden inside some SQR() macro.

I judge it to be pointless.

**jsaetrum** · 01-04-2012

I also refer back to the days when you could suggest to the compiler to make key variables to be stored in the registers (with register keyword),
though this is only a suggestion and could not be demanded. Today, it seem to be possible to use this, but would make no point to use them (in at least most cases(?)).
Maybe some other platforms would make use of this?
Often I am little confused about the role of the C programmer today and how detail-specific a C programmer should be when dealing with performance issues. I mean, when to do the best choices. I know that in most cases a modern compiler would in most cases out-compete a programmer when it comes to small efficiency snippets and the focus should be in making the most efficient algorithms in the whole.

**brewbuck** · 01-04-2012

Generally, the compilers these days will produce the best possible code that does what you have written. That doesn't mean what you've written is necessarily the most optimal way to do something.

The role of the programmer in optimization will never really go away, because compilers can't design programs or think about things. They can, however, figure out how to squeeze the best performance out of a specific set of statements. Two different things.

Most really serious optimizations are related to algorithm choice and efficient use of CPU cache. A compiler can't help you there.

**cyberfish** · 01-04-2012

GCC, for example, completely ignores the register keyword.

**cyberfish** · 01-04-2012

Also, the snippet you posted will be slower in most cases because of the branch. Branching is bad in modern processors with deep pipelines.

It's not so bad if the path taken is predictable, since modern predictors are very good.

However, if your input has 50% chance of being 0.0 and 50% chance of being not 0.0, it will probably be much slower than just multiplying by itself.

It depends on performance of the FPU, too. If the compare and branch instruction takes equal or longer than multiplication, it will obviously be slower.

**iMalc** · 01-04-2012

One thing nobody mentioned yet is that sqrarg still have to be declared somewhere for it to compile. Being done this way is likely to encourage the kind of thing that would make it not threadsafe. e.g. sqrarg being a global.
Its probably slower than a plain old multiply, possibly even after taking the frequency of zeros being taken into account.
It also has a major bug unless you're written it out incorrectly. It's assigning zero instead of comparing to zero.
The macro does perform common subexpression elimination, but the right way to do that is either for the programmer to use a temp variable on their own or for it to be a function call.

In summary, do not use this macro in modern code. Even 20 years ago it would have been of very dubious benefit.

**KCfromNC** · 01-05-2012

Agree with everyone else that this is probably a bad idea unless used to solve a really specific problem.

Actually, the right way to do common subexpression elimination is to enable optimization in the compiler.

Aside from the problems with breaking branch prediction in any reasonably modern processor, this also fails because it creates lots of extra work if the argument to square is anything but a double. For CPUs without hardware floating point support, you've just added hundreds of lines of code to handle something which should have been done with one integer multiply instruction.

**Salem** · 01-05-2012

> GCC, for example, completely ignores the register keyword.
Except for the part about "you can't point at a register".

> It depends on performance of the FPU, too. If the compare and branch instruction takes equal or longer than multiplication...
Worse still, clever instruction scheduling could deliver the FP mult result "for free", if the compiler can make the CPU do something else useful between issuing the mult instruction and needing the result.

Then there's the whole "comparing floats for equality" issue as well.

Thread: macro functions/efficiency tricks (old-style?)

Thread Tools

Search Thread

Display

macro functions/efficiency tricks (old-style?)

Similar Threads

macro functions console appi

An array of macro functions?

pls post here tricks that you know of in C/C++, thanks

comment on my coding style and if i use functions well

Tips And Tricks