# Thread: bitwise hack for swapping

1. ## bitwise hack for swapping

Looking at how to use the bitwise operators, I ran across the following technique for swapping two values, which I think should work for any primitive data type (as long as a and b have distinct memory locations), and which doesn't require creating an extra variable:

Code:
```/*
code that determines (primitive) datatypes of a and b and assigns values
*/
a ^= b;   // now a has a bit set only in those spots where 1 of a and b had a bit set
b ^= a;  // now b is set to the value of the original a
a ^= b;  // now a is set to the value of the original b```
Isn't this technique invariably faster (in addition to conserving a little memory, which presumably doesn't really matter) than the normal technique of using a temporary variable?

2. The compiler may not need the temporary variable because it can use a register.

On my machine, running 5 billion iterations of each, I get:

XOR method: 1.9 seconds
tmp method: 2.2 seconds
dummy: 1.0 seconds

So 1.0 second is due just to function call overhead, so it's really 0.9 seconds vs 1.2 seconds, which is still not that much different.

3. nice, tx!
i do think i'll start using xor method as default here. seems a little neater--and why not gain that little bit of extra speed?

4. > i do think i'll start using xor method as default here.
Until it stops working for some reason.

http://cboard.cprogramming.com/brief...operators.html

and

Easy way to reverse a string

> Isn't this technique invariably faster
No.

It's also utterly useless for any data type other than an unsigned.

5. Hmmmm... well, after that I tried it with signed int and long and it worked, but refused to compile using Netbeans (gnu) under Windows for double.

6. Sorry, I don't see the point.

7. another xor trick is to use it when making a 'doubly' linked list.

instead of having both a next and prev pointer, instead you hold onto the XOR of the next and the prev,

it saves memory in very large linked lists ...but of course only useful for certain circumstances , because it takes longer to traverse the list since you have to resolve the next address still.

with the amount of memory these days, i dont see the point still ... then again i haven't worked with things with very very small memory yet, then again i ran into a little computer the other day that had 64kb memory (still in use daily, embedded point of sale system)

8. only useful for certain circumstances
O_o

You said a mouthful.

For example, I've never seen a single valid use.

Soma

9. The XOR method introduces data dependency on each step. The CPU won't be able to execute them in parallel.

Each instruction requires the result of the previous instruction.

Code:
```t = a;
a = b;
b = t;```
Also 3 instructions, but the only dependency is the third instruction must occur after the first. The CPU can execute first 2 instructions in parallel, using register renaming ("ghost" registers to speed things up). With clever renaming, it may even be possible to do it in 1 instruction cycle (renaming both registers, and copy from old registers at the same time), but that I'm not sure.

When not constrained by the number of available registers, the tmp method will most certainly be faster.

What makes you think the xor method is faster? Because it's more cryptic?

10. > well, after that I tried it with signed int and long and it worked
Well at some point, you should probably read the standard to find out what would work in all circumstances, as opposed to just empirically extrapolating "it works" based on a single test on a single compiler.

Not to mention the train wreck which results when you try to swap a variable with itself.

Which is quicker?
- xor swapping inside bubble sort
- regular temp variable approach inside quick sort

It's a micro-optimisation of the worst kind.
Take brewbuck's example of saving say 0.3 seconds in 5Bn operations.
Now that was just swapping, nothing else.
Now imagine that in some real code - say sorting some large array which needs 5Bn swaps. How long is THAT going to take - 5 minutes? I don't know, but I'm pretty damn sure that the difference between X and X-0,3 seconds will look like a rounding error, not a stunning performance improvement.

11. Salem, if, assuming a and b are in different memory locations (that's critical), i don't see why the only issue isn't whether xor is defined for the given data type. if it is, then the swap clearly works.

i'm also not seeing the decrease in dependencies for possible parallel processing either. for
t = a;
a = b;
b = t;

clearly step 3 must also occur after step 2, or else a didn't get changed.

i do see the dangers in using it with dereferenced pointers, as in one of the threads Salem mentioned in his first post.

12. I have observed, in the field, that the compiler can usually figure out that you are exchanging two variables and it will try to do it however it thinks is fastest.

In particular, recent versions of gcc are smart enough to recognize the XOR trick and sometimes will generate a single "xchg" instruction to perform the swap.

The compiler thinks it is (and probably actually is) smarter than you, so I wouldn't try to outsmart it.

The XOR trick always has this strange fascination when people first learn about it. I suggest filing it away in your long term memory then writing clearer code.

13. as to the standard for when you can use them, i'm not finding it. but the following links seem to indicate legitimacy for an integer type (but not floating point):
Bitwise Operators in C
Cprogramming.com - Tutorials - Bitwise Operators and Bit Manipulations in C and C++
i do find a cautionary statement in king's c book about using shift operations on signed integers (p. 510), and it's pretty obvious why that's the case specifically for bitwise shifting: you're shifting the sign bit, too, with no protection.
i don't see why that would pose problems with xor, though (?).

14. A modern optimizing compiler will probably do a better job making these kinds of optimization decisions than you can. They're quite good! Let's look at how different compiler optimization flags can affect the outcome of this program:
Code:
```#include <Windows.h>
#include <iostream>
using namespace std;

int main()
{
DWORD oldtick;
unsigned int count, x, y, temp;

//
//  XOR swap
//
count = 0xFFFFFFFF;
x = 0x00000000;
y = 0xFFFFFFFF;
oldtick = GetTickCount();
while (count--)
{
x ^= y;
y ^= x;
x ^= y;
}
cout << "XOR swap time: " << (GetTickCount() - oldtick) << " ms" << endl;

//
//  Tmp swap
//
count = 0xFFFFFFFF;
x = 0x00000000;
y = 0xFFFFFFFF;
oldtick = GetTickCount();
while (count--)
{
temp = x;
x = y;
y = temp;
}
cout << "Temp swap time: " << (GetTickCount() - oldtick) << " ms" << endl;

return 0;
}```
(With MSVC)
Optimizations disabled
Code:
```XOR swap time: 21497 ms
Temp swap time: 11154 ms```
Optimize for speed
Code:
```XOR swap time: 3807 ms
Temp swap time: 2527 ms```
Full optimization
Code:
```XOR swap time: 9906 ms
Temp swap time: 9267 ms```
Optimize for size
Code:
```XOR swap time: 3791 ms
Temp swap time: 2543 ms```
Another XOR trick is to use it to zero-out memory, but guess what? Optimizing compilers use that too!

15. i'm also not seeing the decrease in dependencies for possible parallel processing either. for
t = a;
a = b;
b = t;

clearly step 3 must also occur after step 2, or else a didn't get changed.
They can happen at the same time, using register renaming. While b is being copied to a, the CPU can assign another physical register to be the "new b" from now on, and copy t to the new b at the same time. It's a "fake" dependency.

In the XOR thing, all dependencies are "real". You can't start the next XOR until you have the result of the previous XOR.