Hey everyone, I am new to the message boards but not to this site in general as I have come here frequently when I was first starting out coding. I have just finished a comparative analysis of swapping functions (I wrote it in C as it is a common language) and I was hoping some people would be able to give some input on the results that I obtained and the methods that I used. My analysis can be found here
I'm glad to see you putting some good thought and research into something like this. Testing different implementations for speed and memory usage to get actual metrics is a good habit to get into. Have you considered the practical side of these different methods? How readable, maintainable and universal are the methods?
If you're going to use standards, you should refer to C99, the most recent standard, from 1999, instead of C89/C90, which is a decade older.
Regarding the standards (C90 and C99), they only guarantee that INT_MAX is at least 32767, but it may be bigger. It's completely possible, though unlikely, that an int is only 16-bits on a C99 conforming implementation (this is more likely in the embedded world, where 8- and 16- bit uCs are still common enough). Instead, you can use LONG_MAX, which is guaranteed to be at least 32 bits (2 billion whatever).
Some of your code is specific to swapping numeric types, which, while common enough, does not give your research very broad application. Here are some notes on individual methods.
- The XOR swap won't work on structs/unions or pointers (XOR is not defined for those types).
- The subtract-add swap has similar limitations. Addition and subtraction are not defined for aggregate types. Addition and subtraction of pointers is normalized to the size of the pointed-to type. I.e. 0x20 - 0x10 will yield different results for char pointers, int pointers, etc.
- The pre-processor swap is poorly named. The swapping does not in any way happen in the pre-processor stage. A better name would be inline swap, since all that really happens is the code from the #define is inserted inline where you call PreSwap, thus avoiding function overhead.
time() is not the best function to use since it measures "wall clock" time. That means it counts time when your process was set aside by the scheduler to run some other code. You should look at something platform specific that can track how much time your code actually spent on the processor. For Linux and the like, look into getrusage (man 2 getrusage), in particular the ru_utime field of the rusage structure. That should tell you how much time was actually spent processing. If it were me, I would write 4 "versions" of this, one that uses each swap method, and use the Linux time program (man 1 time). Unfortunately, I can't help you with the Windows equivalents.
Lastly, most modern compilers are quite good at optimizing. That means the compiler may choose to inline the function for you, which will change all your metrics. You should also test the different methods with your compiler set to different optimization levels, and compare that way.
EDIT: I just skimmed your other blog articles. It looks like you're already aware of the downside of the XOR swap (and presumably the addition-subtraction swap), but I'll leave the notes in here for posterity. I will note that in the examples in your XOR swap article, you declare a and b to be int *, but they aren't actually pointing at valid integers (i.e. you never declare any ints). They just contain bogus addresses, leaving a potential seg fault, since *a or *b may not be valid locations to write to or read from.
I agree with anduril462. If you turn on compiler optimizations, they will probably all turn into the same thing.
Modern compilers are too smart to miss a simple variable swap. It will recognize the pattern, and just use whatever method it knows is the fastest for your CPU. For example, the xchg instruction.
So then as a high level programmer, our goal is to write the swap in as straight forward way as possible (for example, temporary variable swap), so the compiler can recognize it, and apply the optimization. This is true for many other things, too. Many low level hand optimizations actually make the code slower, not because the optimized code is actually slower, but because it makes the code more complicated, confuses the compiler, which then can't apply as many optimizations.
Oh and by the way, the reason that XOR and SA swaps are so much slower is probably because of the equal check. Modern CPUs hate branching. But even without the check, they will probably still be slower because the code sequence has too many data dependencies (an instruction requiring result of a previous instruction), which modern CPUs also don't like (because it defeats pipelining).