bitwise hack for swapping

This is a discussion on bitwise hack for swapping within the C++ Programming forums, part of the General Programming Boards category; And at worst, with a very stupid compiler and a very stupid CPU, they are the same speed. Both are ...

  1. #16
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    And at worst, with a very stupid compiler and a very stupid CPU, they are the same speed. Both are 3 instructions.

    Why do you think XOR is quicker?

  2. #17
    30 Helens Agree neandrake's Avatar
    Join Date
    Jan 2002
    Posts
    640
    That's quite helpful BMJ, however it's only a measurement in speed, not in memory. Will the optimization from compiler keep the temp value? If programming for small chips in which copying values from registers to memory is expensive and less-desired, the XOR trick may take the upper-hand.
    Environment: OS X, GCC / G++
    Codes: Java, C#, C/C++
    AOL IM: neandrake, Email: neandrake (at) gmail (dot) com

  3. #18
    BMJ
    BMJ is offline
    Banal internet user BMJ's Avatar
    Join Date
    Aug 2002
    Location
    Chicagoland
    Posts
    1,380
    well obviously there are no algorithmic panaceas, it's all a cost-benefit analysis you have to make depending on the system you're working with

    sometimes xor swap might be useful, but for most applications I'd say it's not

  4. #19
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    That's quite helpful BMJ, however it's only a measurement in speed, not in memory. Will the optimization from compiler keep the temp value? If programming for small chips in which copying values from registers to memory is expensive and less-desired, the XOR trick may take the upper-hand.
    It will just use 1 more register.

    Even on microcontrollers you'll be hard pressed to find a situation where you can't even spare one temporary register.

  5. #20
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,399
    Quote Originally Posted by Aisthesis View Post
    i do think i'll start using xor method as default here. seems a little neater--and why not gain that little bit of extra speed?
    Or you could use std::swap.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  6. #21
    30 Helens Agree neandrake's Avatar
    Join Date
    Jan 2002
    Posts
    640
    Quote Originally Posted by cyberfish View Post
    It will just use 1 more register.

    Even on microcontrollers you'll be hard pressed to find a situation where you can't even spare one temporary register.
    I meant to emphasize more on the point of speed -- copying from memory to registers would usually be slower than the XOR trick. Granted, there usually isn't a good case of needing this speed, but there certainly could be.

    Plus XORing is leet.
    Environment: OS X, GCC / G++
    Codes: Java, C#, C/C++
    AOL IM: neandrake, Email: neandrake (at) gmail (dot) com

  7. #22
    Registered User
    Join Date
    May 2009
    Posts
    242
    Quote Originally Posted by cyberfish View Post
    And at worst, with a very stupid compiler and a very stupid CPU, they are the same speed. Both are 3 instructions.

    Why do you think XOR is quicker?
    it seemed to me like specifying and defining the contents of the memory for the tmp variable would be slower. i'd be interested actually in hearing more about the processes involved. from what some were saying, it sounds like gcc compiler might do further optimization by using some kind of shortcut when it recognizes the code, or something like that.

    what i was thinking was something like this, comparing line by line:

    int tmp = a; vs. a ^= b;
    creating the tmp takes a little time, and assigning values to each bit in tmp is probably the very close to the same as xor-ing a in place. really the second hypothesis is the critical one, because that's the comparison in all of the next steps. e.g., does
    b = a; take almost identical time to b ^= a; ?
    if the assignment is faster by 1/2 the time difference between int tmp = a; and a ^= b; (assuming a ^= b; is faster when you add the overhead of creating tmp), then the traditional method wins--since that happens twice.

  8. #23
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    If you have a ^= b that means a = a ^ b. Which means
    load a, load b, xor them, write on a (4 things)

    If you have a = b then you have
    load b, write on a (2 things)

    Beyond that it really depends.

    The point is that XOR is logically slower. If you don't have a reason (your system favors it in any way) then it will probably be actually slower using cyberfish's imaginary dumb compiler and CPU.

    So you need a reason to actually say that the XOR will be faster.

    Both I believe will receive optimizations. Using the XOR means that the optimizations used for that are actually more beneficial. Surprisingly more beneficial that is. Which might be the case in some systems. Which you would have to know in advance. Otherwise there are millions of small optimizations that you could try, but without knowing something that tells you that it will work for you system, it is mostly a waste of time.

  9. #24
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    it seemed to me like specifying and defining the contents of the memory for the tmp variable would be slower. i'd be interested actually in hearing more about the processes involved. from what some were saying, it sounds like gcc compiler might do further optimization by using some kind of shortcut when it recognizes the code, or something like that.

    what i was thinking was something like this, comparing line by line:

    int tmp = a; vs. a ^= b;
    creating the tmp takes a little time, and assigning values to each bit in tmp is probably the very close to the same as xor-ing a in place. really the second hypothesis is the critical one, because that's the comparison in all of the next steps. e.g., does
    b = a; take almost identical time to b ^= a; ?
    if the assignment is faster by 1/2 the time difference between int tmp = a; and a ^= b; (assuming a ^= b; is faster when you add the overhead of creating tmp), then the traditional method wins--since that happens twice.
    "Creating" the variable doesn't take any time. It's just telling the compiler you want a variable called tmp.

    The compiler says, "ok, I want you to use R3 (a register) as tmp". Then whenever you write to or read from tmp, and compiler generates code that reads from or write to R3.

    Or, if there is no register available, when you enter the function, the compiler will push back the stack a little more so you can use that space as the variable. Subtracting by 16 doesn't take any longer than subtracting by 12.

    a = b and a ^= b take the same amount of time in virtually all architectures. Just 1 instruction clock.

  10. #25
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    4,356
    a = b and a ^= b take the same amount of time in virtually all architectures.
    I don't know of any such architecture. Or I don't understand what you are saying.

    Do you mean "a = a ^ b" and "a ^= b"?

    O_o

    Did you mean "a = b" and "a ^ b"?

    (That makes perfect sense now that I see that.)

    Soma

  11. #26
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    Usually, instructions store the result in the first operand.

    So "a ^= b" would be one instruction (xor a, b).

    To do something like c = a ^ b you will need to do something like
    Code:
    c = a;
    c ^= b;
    Very few architectures have 3 operand instructions. I've came across one... but I don't remember which off the top of my head. ARM maybe?

  12. #27
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    4,356
    ^_^;

    Wow. I totally misunderstood.

    Now that you've explained what you were saying, it is completely obvious.

    *shrug*

    ARM maybe?
    I doubt it. I guess maybe if they limited the relevant instructions to registers only (requiring a `mov' or similar for addresses/constants). I imagine it was something a little more exotic.

    That said, I've heard some rumors about Intel adding some three operand vector instructions, but I think those are actually targeted at combined operations (add with multiply or something) instead of that sort of thing.

    Soma

  13. #28
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    [QUOTE=cyberfish;969387]They can happen at the same time, using register renaming. While b is being copied to a, the CPU can assign another physical register to be the "new b" from now on, and copy t to the new b at the same time. It's a "fake" dependency.[/QUOTE[

    Compilers do this internally during optimization as well - look up static single assignment for examples. Each assignment is "renamed" to a different variable name so these sorts of false dependencies disappear. In this case, the b in lines 2 and 3 are treated as totally different variables (assuming previous optimizations don't already pick up on the fact you're swapping values).

    In the XOR thing, all dependencies are "real". You can't start the next XOR until you have the result of the previous XOR.
    Yep, among other problems. By the time you've compared a and b to make sure they're not the same value you've killed any chance at being faster than a straightforward swap using a temp variable. Branch mispredicts are expensive.

    But aside from being either wrong or slow, there's no drawbacks to using the XOR method of swapping

    And ARM (both ARM and Thumb-2 code) have 3 operand logical and math instructions. On the down side, those are all registers or immediate values so you'll need a separate load instruction for each somewhere along the line. See RealView Assembler User's Guide: AND, ORR, EOR, BIC, and ORN if you're really bored.

Page 2 of 2 FirstFirst 12
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Modulus to Bitwise
    By blurrymadness in forum C Programming
    Replies: 1
    Last Post: 12-01-2009, 12:49 PM
  2. Bitwise Questions
    By someprogr in forum C Programming
    Replies: 8
    Last Post: 12-14-2008, 06:45 PM
  3. bitwise operations with double
    By henry_kay in forum C Programming
    Replies: 2
    Last Post: 10-03-2007, 05:57 AM
  4. bitwise negation
    By Sathyabodh in forum C Programming
    Replies: 1
    Last Post: 08-12-2003, 10:42 AM
  5. Characters into bitwise ints
    By Code Zer0 in forum C++ Programming
    Replies: 9
    Last Post: 04-24-2003, 09:34 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21