smart use of std:algorithm

Printable View

Show 80 post(s) from this thread on one page

09-17-2008
EOP

Quote:

Originally Posted by master5001

What makes you think your compiler is not aware of every opcode for your CPU and knows how to rearrange data to be optimal for loops.

Maybe looking at the asm code? ;)
09-18-2008
matsp

Quote:

Originally Posted by EOP

What about e.g. using loopnz instead of compiler generated loops?

What processor are we talking about? Your point may be valid for processors up to 386.

Using loopnz in modern x86 is not faster than a well-written loop using separate decrement and jnz operations. It may not be slower either, but using the generic version is still better, since the compiler can generate the loop using ANY register, rather than being forced to use ECX. Restricting register usage will make the register allocation more complicated (and the compiler will have to know at the beginning that it should use loopnz, so that when it gets to the end, it has the right value in ECX). In this case, there's no benefit. If something like this makes loops MUCH more efficient, then the compiler will be tuned to do that, but modern x86 processors are very efficient at simple instructions, so in most cases, the complex alternative instructions are either slower or equal to the simple variants.

--
Mats
09-18-2008
CornedBee

Perhaps the loops were slower due to a misunderstanding of tabstop's algorithm? In particular, although tabstop says there's nothing wrong with nested loops, his algorithm doesn't actually have any - just a single loop that goes through both containers at the same time, taking advantage of their being sorted and unique.
09-18-2008
iMalc

I myself have previously needed to do this, and not long ago I might add. set_symmetric_difference didn't cut it because it puts both differences in the same set. set_difference works but needs two passes.
The approach I suggest is to go into your algorithm header, find the code for set_symmetric difference, copy it and modify it to put the A-B and B-A into seperate sets.
It logically should be about twice as fast as two set_difference calls. If not then you've probably made a mistake.

Show 80 post(s) from this thread on one page