Even better is to use a compiler that uses link-time generation. This allows the compiler to see the "entire" program and as such perform global optimizations such as the above and many more.