Very interesting. I compiled without -ffast-math and I tested using no optimisations, -O2 and -O3. And for each option and 1073741822 iterations and timing using the time command (time ./a.out) and valgrind both showed sqrt() to be significantly faster on my i7-8700K. Just one of those things I guess.