Quote Originally Posted by awsdert View Post
flp1969: Well your's was most likely slower because of the move instruction in the loop
Nope... As I explain, the problem is that I did not aligned the instructions properly and not take advantage of the static branch prediction algorithm. My point: Good C compilers, as GCC and CLANG, take advantage of CPU characteristics that is easy not to take account by a regular programmer. Usually (not always), they create better code.

I was thinking more along the lines of using the registers in asm and never moving while in the loop, just changing the values, I just figured there must be a shorter way than what the compiler produced because the compiler might not be smart enough to realise it doesn't need to reset any registers during the loop
I am anxious to see how you intend to get values from a table without dealing with memory access... And, again, "shorter" way doesn't mean "faster" way, as I demonstrated with that example...

As Salem said: A better algorithm will serve you better. Focusing on assembly, probably will not.