bitwise hack for swapping

**cyberfish** · 09-10-2010

And at worst, with a very stupid compiler and a very stupid CPU, they are the same speed. Both are 3 instructions.

Why do you think XOR is quicker?

**neandrake** · 09-10-2010

That's quite helpful BMJ, however it's only a measurement in speed, not in memory. Will the optimization from compiler keep the temp value? If programming for small chips in which copying values from registers to memory is expensive and less-desired, the XOR trick may take the upper-hand.

**BMJ** · 09-10-2010

well obviously there are no algorithmic panaceas, it's all a cost-benefit analysis you have to make depending on the system you're working with

sometimes xor swap might be useful, but for most applications I'd say it's not

**cyberfish** · 09-10-2010

That's quite helpful BMJ, however it's only a measurement in speed, not in memory. Will the optimization from compiler keep the temp value? If programming for small chips in which copying values from registers to memory is expensive and less-desired, the XOR trick may take the upper-hand.

It will just use 1 more register.

Even on microcontrollers you'll be hard pressed to find a situation where you can't even spare one temporary register.

**pianorain** · 09-11-2010

Originally Posted by Aisthesis

i do think i'll start using xor method as default here. seems a little neater--and why not gain that little bit of extra speed?

Or you could use std::swap.

**neandrake** · 09-11-2010

Originally Posted by cyberfish

It will just use 1 more register.

Even on microcontrollers you'll be hard pressed to find a situation where you can't even spare one temporary register.

I meant to emphasize more on the point of speed -- copying from memory to registers would usually be slower than the XOR trick. Granted, there usually isn't a good case of needing this speed, but there certainly could be.

Plus XORing is leet.

**Aisthesis** · 09-11-2010

Originally Posted by cyberfish

And at worst, with a very stupid compiler and a very stupid CPU, they are the same speed. Both are 3 instructions.

Why do you think XOR is quicker?

it seemed to me like specifying and defining the contents of the memory for the tmp variable would be slower. i'd be interested actually in hearing more about the processes involved. from what some were saying, it sounds like gcc compiler might do further optimization by using some kind of shortcut when it recognizes the code, or something like that.

what i was thinking was something like this, comparing line by line:

int tmp = a; vs. a ^= b;
creating the tmp takes a little time, and assigning values to each bit in tmp is probably the very close to the same as xor-ing a in place. really the second hypothesis is the critical one, because that's the comparison in all of the next steps. e.g., does
b = a; take almost identical time to b ^= a; ?
if the assignment is faster by 1/2 the time difference between int tmp = a; and a ^= b; (assuming a ^= b; is faster when you add the overhead of creating tmp), then the traditional method wins--since that happens twice.

**C_ntua** · 09-11-2010

If you have a ^= b that means a = a ^ b. Which means
load a, load b, xor them, write on a (4 things)

If you have a = b then you have
load b, write on a (2 things)

Beyond that it really depends.

The point is that XOR is logically slower. If you don't have a reason (your system favors it in any way) then it will probably be actually slower using cyberfish's imaginary dumb compiler and CPU.

So you need a reason to actually say that the XOR will be faster.

Both I believe will receive optimizations. Using the XOR means that the optimizations used for that are actually more beneficial. Surprisingly more beneficial that is. Which might be the case in some systems. Which you would have to know in advance. Otherwise there are millions of small optimizations that you could try, but without knowing something that tells you that it will work for you system, it is mostly a waste of time.

**cyberfish** · 09-11-2010

it seemed to me like specifying and defining the contents of the memory for the tmp variable would be slower. i'd be interested actually in hearing more about the processes involved. from what some were saying, it sounds like gcc compiler might do further optimization by using some kind of shortcut when it recognizes the code, or something like that.

what i was thinking was something like this, comparing line by line:

int tmp = a; vs. a ^= b;
creating the tmp takes a little time, and assigning values to each bit in tmp is probably the very close to the same as xor-ing a in place. really the second hypothesis is the critical one, because that's the comparison in all of the next steps. e.g., does
b = a; take almost identical time to b ^= a; ?
if the assignment is faster by 1/2 the time difference between int tmp = a; and a ^= b; (assuming a ^= b; is faster when you add the overhead of creating tmp), then the traditional method wins--since that happens twice.

"Creating" the variable doesn't take any time. It's just telling the compiler you want a variable called tmp.

The compiler says, "ok, I want you to use R3 (a register) as tmp". Then whenever you write to or read from tmp, and compiler generates code that reads from or write to R3.

Or, if there is no register available, when you enter the function, the compiler will push back the stack a little more so you can use that space as the variable. Subtracting by 16 doesn't take any longer than subtracting by 12.

a = b and a ^= b take the same amount of time in virtually all architectures. Just 1 instruction clock.

**phantomotap** · 09-11-2010

a = b and a ^= b take the same amount of time in virtually all architectures.

I don't know of any such architecture. Or I don't understand what you are saying.

Do you mean "a = a ^ b" and "a ^= b"?

O_o

Did you mean "a = b" and "a ^ b"?

(That makes perfect sense now that I see that.)

Soma

**cyberfish** · 09-11-2010

Usually, instructions store the result in the first operand.

So "a ^= b" would be one instruction (xor a, b).

To do something like c = a ^ b you will need to do something like

Code:

c = a;
c ^= b;

Very few architectures have 3 operand instructions. I've came across one... but I don't remember which off the top of my head. ARM maybe?

**phantomotap** · 09-12-2010

^_^;

Wow. I totally misunderstood.

Now that you've explained what you were saying, it is completely obvious.

*shrug*

ARM maybe?

I doubt it. I guess maybe if they limited the relevant instructions to registers only (requiring a `mov' or similar for addresses/constants). I imagine it was something a little more exotic.

That said, I've heard some rumors about Intel adding some three operand vector instructions, but I think those are actually targeted at combined operations (add with multiply or something) instead of that sort of thing.

Soma

**KCfromNC** · 09-13-2010

[QUOTE=cyberfish;969387]They can happen at the same time, using register renaming. While b is being copied to a, the CPU can assign another physical register to be the "new b" from now on, and copy t to the new b at the same time. It's a "fake" dependency.[/QUOTE[

Compilers do this internally during optimization as well - look up static single assignment for examples. Each assignment is "renamed" to a different variable name so these sorts of false dependencies disappear. In this case, the b in lines 2 and 3 are treated as totally different variables (assuming previous optimizations don't already pick up on the fact you're swapping values).

In the XOR thing, all dependencies are "real". You can't start the next XOR until you have the result of the previous XOR.

Yep, among other problems. By the time you've compared a and b to make sure they're not the same value you've killed any chance at being faster than a straightforward swap using a temp variable. Branch mispredicts are expensive.

But aside from being either wrong or slow, there's no drawbacks to using the XOR method of swapping

And ARM (both ARM and Thumb-2 code) have 3 operand logical and math instructions. On the down side, those are all registers or immediate values so you'll need a separate load instruction for each somewhere along the line. See RealView Assembler User's Guide: AND, ORR, EOR, BIC, and ORN if you're really bored.

Thread: bitwise hack for swapping

Thread Tools

Search Thread

Display

Similar Threads

Modulus to Bitwise

Bitwise Questions

bitwise operations with double

bitwise negation

Characters into bitwise ints