You are right of course. This is just one very specific case, not a general statement.

SSE probably won't help. I do almost no floating point computations in the code. Didn't help for GCC, either ("-march=native" means optimize for the platform GCC is running on, and for a Core 2, which I am using, that should include SSE).