Faster bitwise operator

This is a discussion on Faster bitwise operator within the C++ Programming forums, part of the General Programming Boards category; Is != or < faster? In the bignum class I made, != is faster. And just from what I can ...

  1. #1
    Unregistered User Yarin's Avatar
    Join Date
    Jul 2007
    Posts
    1,591

    Faster bitwise operator

    Is != or < faster?
    In the bignum class I made, != is faster. And just from what I can best think of, it would be for the processor register comparison too.
    But I just wanted to confirm this with you guys - which is faster?
    A class that doesn't overload all operators just isn't finished yet. -- SmugCeePlusPlusWeenie
    A year spent in artificial intelligence is enough to make one believe in God. -- Alan J. Perlis

  2. #2
    Registered User whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    7,625
    I would wager that both are done in constant time, and whether one is faster than the other depends on processing power.

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    For simple types (e.g. integers and floats), they are identical. I actually fail to see how they would be majorly different in a bignum class or similar either - you compare from the most significant bit and if the current element is different or greater/lesser doesn't make any difference to the processor - it's still a compare operation (or memory read in processors like 68K, PDP-11, VAX, 6502, 6809) followed by a branch/jump instruction.

    So unless there is some major difference in the implementationin your bignum class [e.g. one reads single bytes and the other does larger chunks, 32- or 64-bit words] , I don't understand why one would be better than the other.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    The larch
    Join Date
    May 2006
    Posts
    3,573
    For example a string. If they have a different length, then != is true. On the other hand, strings of unequal length might begin with a very long equal sequence before they differ.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by anon View Post
    For example a string. If they have a different length, then != is true. On the other hand, strings of unequal length might begin with a very long equal sequence before they differ.
    Sure. But I bet most string compares just compare the string itself, without comparing length! Because for many cases, it's either equal length (and thus no benefit) or the beginning is not similar.

    Certainly, MS VS 2008 implementation of basic_string::compare() doesn't compare lengths.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    May 2007
    Posts
    147
    Back to "Is != or < faster?" specifically, let me invoke a hint of history here.

    This is the same in C or C++; they are elementary statements.

    C was originally written to replace an assembler, essentially to be a chip independent assembler or just slightly higher than assembler.

    Many constructs in the elementary syntax of C (and therefore C++) translate into 1 assembler instruction, when a chip supports that as such. These two happen to translate to x86 instructions of identical meaning (!= or <, >, >=, <=, etc).

    As such, the speed of such comparisons, or of any of the elementary constructs like this, will relate closely to the chip on which you're executing. So, from a language viewpoint, it hardly makes sense to think of this in terms of the "language speed", but of the fact that implementations differ, optimizations may significantly affect some of the basic constructs and you generally shouldn't focus on microscopic performance issues too early. There are very rare exceptions to this general rule, and by the time you get to the point that it matters in your work, you'll have some sense of the assembler implications of your C/C++ code anyway, and the question will be moot.

  7. #7
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Does anyone know which (x86) instructions those operators boil down to?
    All you'd need to do is see how many clock cycles each of those instructions takes and compare their speed, right?

    Although, without knowing too much about chip implementations, the only guess I can make as to why one might be faster is that != would just compare the binary representation of the two numbers, whereas < or > would need to check the sign of the variable for a signed int and then compare the numbers (whether the CPU can check the sign and do the compare at the same time, I have no idea).
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

  8. #8
    Registered User
    Join Date
    Apr 2006
    Posts
    2,020
    Quote Originally Posted by cpjust View Post
    Does anyone know which (x86) instructions those operators boil down to?
    All you'd need to do is see how many clock cycles each of those instructions takes and compare their speed, right?
    For Integers, It should boil down to CMP R2,R1 (or however registers are names) followed by either JE or JGE. JE and JGE both just check the control flags, and are therefore trivial operations. For floating point it would use FCOM, instead of CMP.

    So for primitives there is no difference.
    Last edited by King Mir; 04-27-2009 at 09:11 PM.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  9. #9
    Registered User
    Join Date
    May 2007
    Posts
    147
    cpjust

    x86 does as King Mir states - cmp is the generic call to perform a comparison - it is very much like a subtraction. In x86, subtractions and cmp set flags in the CPU, indicating things like sign, zero flag, overflow - various technical results. After that, a conditional jump instruction is what makes the decision as to truth or falsehood of a particular comparison like <, !=, etc. All of these have the same cycle count - so the answer you've received is the best you can get - they are basically the same. There is no point in asking which is faster.

    What makes this a little more complicated is the various CPU's ability to predict branches, cache reaction to branches, hyperthreading, speed of RAM, alignment in RAM, whether or not the cmp is applied to a register or memory location (I forget if/when that was or was not possible in x86)....

    In other words, you're focused on such a microscopic level of performance that you probably can't get the results you're hoping for without deciding to code in assembler - and even that isn't likely to be all that productive in this case.

    Each model CPU and combination of hardware could make some difference.
    Last edited by JVene; 04-27-2009 at 09:56 PM.

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,288
    For the processor (all the common ones at least) they are surely absolutely equal.
    For a bignum class the < would be slightly slower because it involves two comparisons in the loop (excluding the loop condition) rather than just one:
    Code:
    {
        if (a.data[i] < b.data[i]) return true;
        if (a.data[i] > b.data[i]) return false;
        // else continue iterating.
    }
    
    // vs
    
    {
        if (a.data[i] != b.data[i]) return false;
    }
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #11
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    iMalc:iMalc - good point, but we could do that in a different way:
    Code:
       int cmp;
       for(...)
       {
          if ((cmp = a.data[i] - b.data[i]));
          {
             return (cmp > 0)?true:false; 
          }
       }
    This assumes that the common case is similar numbers. But I don't think it's much worse for the case with immediately different numbers - and that is the fast case in either variant, anyways.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  12. #12
    pwns nooblars
    Join Date
    Oct 2005
    Location
    Portland, Or
    Posts
    1,094
    I think the question that should actually be asked is have you done the testing and determined this is where there is a bottleneck in your code? Honestly the difference between the two isn't going to show itself very easily in most cases. And even if one is faster than the other, I am pretty sure you could optimize another part of your code and result in a higher net change than you would receive from this comparison check.

    So go ahead and profile your code and see where you are hitting some hiccups. Also you can easily (via polymorphism) replace the function that does the comparison dynamically and do some profiling yourself to determine which is ultimately faster.

  13. #13
    Registered User
    Join Date
    May 2007
    Posts
    147
    I risk promoting unwanted pedantic continuation, but....

    For some reason I kept thinking about this question as I tried to sleep.

    To further illustrate why this particular subject of optimization does not make sense to pursue:

    The speed of the comparison doesn't lie simply in the clock tick count of cmp and a conditional jump (the x86 assembler resulting from an integer comparison of any kind). The actual result - that is, whether the jump is taken or not, which is also saying whether the test is true or not, affects the speed.

    If a jump is taken, we can't easily account for the cost of what happens to the instruction cache, the delay in RAM for selecting the jump destination instruction vs the 'next' instruction should the jump NOT be taken. This means the data feed into a loop that uses such a test could affect the performance.

    If you changed the comparison, perhaps it might affect the predilection of the branch predictor to be correct more often than not, and, here's where it really suggests the futility of the idea - the previous instructions that were in the CPU's pipeline might affect the speed of the transition between cmp and the conditional jump.

    Say there were a multiplication prior to the cmp, and assume that the results of the multiplication are not involved with the operands of the cmp (the instructions are not interdependent).

    Depending on alignment relative to the availability of execution units that the moment the cmp is to be scheduled, the CPU might be performing the previous multiplication instruction at the same time - parallel execution like this is part of modern CPU operation.

    However, the pending conditional jump may not execute until the multiplication is finished, which takes more clock cycles than a cmp. In essense, the cmp may be held up by the multiplication which preceded it, in the logical stream of instructions.

    In other words, the surrounding environment of code would have impact on the performance of a particular fast instruction, like cmp is, relative to a question like this, and indeed, points us back to the observation that we're dealing with assembler level optimizations that must be CPU aware (since different CPU's have different characteristics along the lines of execution units, branch prediction robustness, cache sizes, it goes on....)

    Wraithan and others point out, this can't be a position of bottleneck, and optimization of code for instructions that amount to 1 or 2 clock ticks is a focus in the wrong direction.

    For any improvement one might hope to gain, you'd have much more significant results with an extra 100Mhz, a larger cache or a smarter branch predictor than any microscopic comparison at this level.

    All of this suggests that any answer beyond the fact that the instructions have the same timing is no meaning. That is, if someone conducts a test, and as a result of comparing != to < and others, somehow sees data that suggests != is faster or slower, it can't be taken as a generalized result that applies to all uses of !=. In a different context, != might be slower, and those assuming the previous test of speed were valid would have been mislead.

  14. #14
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Jvene,

    True, the surrounding code, branch prediction, cache content, predictive fetching of cache, and many other things DO indeed affect the execution time.

    Most of the time when discussing optimization at instruction level, we assume that the code is in cache - otherwise, making sure the code is in cache (e.g. making the code smaller) will have a better impact than changing the instructions themselves. Note also that a single branch that goes the other way (to a not-yet-executed path, so therefore not in cache) is very rarely the target of performance optimization, since it only happens once. If two branches are about equally likely, then both would most likely be in cache if the code is executed enough to warrant looking at in the first place.

    Another point I'd like to rectify. If we have something like:
    Code:
    a = b * c;
    if (d > 7)
        ...
    then the processor WILL NOT wait for the multiplication to finish before performing the jump for d > 7 comparison. I'm not saying there are NO processors that do so, but certainly none of the common modern CPU's (e.g. x86, Sparc etc).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  15. #15
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,288
    Quote Originally Posted by matsp View Post
    iMalc:iMalc - good point, but we could do that in a different way:
    Code:
       int cmp;
       for(...)
       {
          if ((cmp = a.data[i] - b.data[i]));
          {
             return (cmp > 0)?true:false; 
          }
       }
    This assumes that the common case is similar numbers. But I don't think it's much worse for the case with immediately different numbers - and that is the fast case in either variant, anyways.

    --
    Mats
    That at first seemed like a great idea, and would work for people who have implemented something where each decimal digit is stored in an int or something wasteful like that, but it wont work for a "wasteless" implementation where the bigint is an array of unsigned longs. The reason being that the result of the subtraction would have to range from 0xFFFFFFFFU - 0 to 0 - 0xFFFFFFFFU. So that's -0xFFFFFFFF to +0xFFFFFFFF, requring an extra bit, as well as casting to a signed datatype. Of course it will work if you cast both to a larger datatype first, however I imagine the efficiency will then come out about the same as the original way I posted. Well, maybe still a little ahead if you're lucky.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Smart pointer class
    By Elysia in forum C++ Programming
    Replies: 63
    Last Post: 11-03-2007, 07:05 AM
  2. Screwy Linker Error - VC2005
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 06-19-2007, 02:39 PM
  3. C bitwise operator exercise - more like newb killer
    By shdwsclan in forum C Programming
    Replies: 3
    Last Post: 02-22-2006, 06:02 AM
  4. operator overloading and dynamic memory program
    By jlmac2001 in forum C++ Programming
    Replies: 3
    Last Post: 04-06-2003, 11:51 PM
  5. Replies: 5
    Last Post: 10-30-2002, 09:23 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21