Oh that explains it! I wasn't going to say it but more than a 2x improvement through loop unrolling is next to impossible, so yeah I figured something was up. It turns out that probably 99% of the speed improvement you got there was from not using the very slow
pow function!
(well the speed of pow with an integer exponent does somewhat vary amoung compilers)
Aside from the obvious bug there with a few + instead of * in that code, you could make that even faster by avoiding a lot of multiplications, if you do it like this:
Code:
double eval(doublex)
{
return x*(x*(x*polynomial<3>::coefficient+polynomial<2>::coefficient)+polynomial<1>::coefficient)+polynomial<0>::coefficient;
}
Only 3 multiplies!
Now you know that the speed improvement had not so much to do with loop unrolling, perhaps you'll reconsider loop unrolling at all with your matrix stuff huh!