And at worst, with a very stupid compiler and a very stupid CPU, they are the same speed. Both are 3 instructions.
Why do you think XOR is quicker?
And at worst, with a very stupid compiler and a very stupid CPU, they are the same speed. Both are 3 instructions.
Why do you think XOR is quicker?
That's quite helpful BMJ, however it's only a measurement in speed, not in memory. Will the optimization from compiler keep the temp value? If programming for small chips in which copying values from registers to memory is expensive and less-desired, the XOR trick may take the upper-hand.
Environment: OS X, GCC / G++
Codes: Java, C#, C/C++
AOL IM: neandrake, Email: neandrake (at) gmail (dot) com
well obviously there are no algorithmic panaceas, it's all a cost-benefit analysis you have to make depending on the system you're working with
sometimes xor swap might be useful, but for most applications I'd say it's not
It will just use 1 more register.That's quite helpful BMJ, however it's only a measurement in speed, not in memory. Will the optimization from compiler keep the temp value? If programming for small chips in which copying values from registers to memory is expensive and less-desired, the XOR trick may take the upper-hand.
Even on microcontrollers you'll be hard pressed to find a situation where you can't even spare one temporary register.
If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein
Environment: OS X, GCC / G++
Codes: Java, C#, C/C++
AOL IM: neandrake, Email: neandrake (at) gmail (dot) com
it seemed to me like specifying and defining the contents of the memory for the tmp variable would be slower. i'd be interested actually in hearing more about the processes involved. from what some were saying, it sounds like gcc compiler might do further optimization by using some kind of shortcut when it recognizes the code, or something like that.
what i was thinking was something like this, comparing line by line:
int tmp = a; vs. a ^= b;
creating the tmp takes a little time, and assigning values to each bit in tmp is probably the very close to the same as xor-ing a in place. really the second hypothesis is the critical one, because that's the comparison in all of the next steps. e.g., does
b = a; take almost identical time to b ^= a; ?
if the assignment is faster by 1/2 the time difference between int tmp = a; and a ^= b; (assuming a ^= b; is faster when you add the overhead of creating tmp), then the traditional method wins--since that happens twice.
If you have a ^= b that means a = a ^ b. Which means
load a, load b, xor them, write on a (4 things)
If you have a = b then you have
load b, write on a (2 things)
Beyond that it really depends.
The point is that XOR is logically slower. If you don't have a reason (your system favors it in any way) then it will probably be actually slower using cyberfish's imaginary dumb compiler and CPU.
So you need a reason to actually say that the XOR will be faster.
Both I believe will receive optimizations. Using the XOR means that the optimizations used for that are actually more beneficial. Surprisingly more beneficial that is. Which might be the case in some systems. Which you would have to know in advance. Otherwise there are millions of small optimizations that you could try, but without knowing something that tells you that it will work for you system, it is mostly a waste of time.
"Creating" the variable doesn't take any time. It's just telling the compiler you want a variable called tmp.it seemed to me like specifying and defining the contents of the memory for the tmp variable would be slower. i'd be interested actually in hearing more about the processes involved. from what some were saying, it sounds like gcc compiler might do further optimization by using some kind of shortcut when it recognizes the code, or something like that.
what i was thinking was something like this, comparing line by line:
int tmp = a; vs. a ^= b;
creating the tmp takes a little time, and assigning values to each bit in tmp is probably the very close to the same as xor-ing a in place. really the second hypothesis is the critical one, because that's the comparison in all of the next steps. e.g., does
b = a; take almost identical time to b ^= a; ?
if the assignment is faster by 1/2 the time difference between int tmp = a; and a ^= b; (assuming a ^= b; is faster when you add the overhead of creating tmp), then the traditional method wins--since that happens twice.
The compiler says, "ok, I want you to use R3 (a register) as tmp". Then whenever you write to or read from tmp, and compiler generates code that reads from or write to R3.
Or, if there is no register available, when you enter the function, the compiler will push back the stack a little more so you can use that space as the variable. Subtracting by 16 doesn't take any longer than subtracting by 12.
a = b and a ^= b take the same amount of time in virtually all architectures. Just 1 instruction clock.
I don't know of any such architecture. Or I don't understand what you are saying.a = b and a ^= b take the same amount of time in virtually all architectures.
Do you mean "a = a ^ b" and "a ^= b"?
O_o
Did you mean "a = b" and "a ^ b"?
(That makes perfect sense now that I see that.)
Soma
Usually, instructions store the result in the first operand.
So "a ^= b" would be one instruction (xor a, b).
To do something like c = a ^ b you will need to do something like
Very few architectures have 3 operand instructions. I've came across one... but I don't remember which off the top of my head. ARM maybe?Code:c = a; c ^= b;
^_^;
Wow. I totally misunderstood.
Now that you've explained what you were saying, it is completely obvious.
*shrug*
I doubt it. I guess maybe if they limited the relevant instructions to registers only (requiring a `mov' or similar for addresses/constants). I imagine it was something a little more exotic.ARM maybe?
That said, I've heard some rumors about Intel adding some three operand vector instructions, but I think those are actually targeted at combined operations (add with multiply or something) instead of that sort of thing.
Soma
[QUOTE=cyberfish;969387]They can happen at the same time, using register renaming. While b is being copied to a, the CPU can assign another physical register to be the "new b" from now on, and copy t to the new b at the same time. It's a "fake" dependency.[/QUOTE[
Compilers do this internally during optimization as well - look up static single assignment for examples. Each assignment is "renamed" to a different variable name so these sorts of false dependencies disappear. In this case, the b in lines 2 and 3 are treated as totally different variables (assuming previous optimizations don't already pick up on the fact you're swapping values).
Yep, among other problems. By the time you've compared a and b to make sure they're not the same value you've killed any chance at being faster than a straightforward swap using a temp variable. Branch mispredicts are expensive.In the XOR thing, all dependencies are "real". You can't start the next XOR until you have the result of the previous XOR.
But aside from being either wrong or slow, there's no drawbacks to using the XOR method of swapping
And ARM (both ARM and Thumb-2 code) have 3 operand logical and math instructions. On the down side, those are all registers or immediate values so you'll need a separate load instruction for each somewhere along the line. See RealView Assembler User's Guide: AND, ORR, EOR, BIC, and ORN if you're really bored.