What processor are we talking about? Your point may be valid for processors up to 386.
Using loopnz in modern x86 is not faster than a well-written loop using separate decrement and jnz operations. It may not be slower either, but using the generic version is still better, since the compiler can generate the loop using ANY register, rather than being forced to use ECX. Restricting register usage will make the register allocation more complicated (and the compiler will have to know at the beginning that it should use loopnz, so that when it gets to the end, it has the right value in ECX). In this case, there's no benefit. If something like this makes loops MUCH more efficient, then the compiler will be tuned to do that, but modern x86 processors are very efficient at simple instructions, so in most cases, the complex alternative instructions are either slower or equal to the simple variants.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
Perhaps the loops were slower due to a misunderstanding of tabstop's algorithm? In particular, although tabstop says there's nothing wrong with nested loops, his algorithm doesn't actually have any - just a single loop that goes through both containers at the same time, taking advantage of their being sorted and unique.
All the buzzt!
CornedBee
"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
I myself have previously needed to do this, and not long ago I might add. set_symmetric_difference didn't cut it because it puts both differences in the same set. set_difference works but needs two passes.
The approach I suggest is to go into your algorithm header, find the code for set_symmetric difference, copy it and modify it to put the A-B and B-A into seperate sets.
It logically should be about twice as fast as two set_difference calls. If not then you've probably made a mistake.
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"