View Full Version : Fast InvSqrt

12-01-2006, 10:14 PM
Fun article, neat code. I'd like to read the paper that explains the constant.. but I have enough papers to read as is.


12-02-2006, 08:54 AM
Wow...that was very interesting. I'll have to return to that paper a few times; just skimming through it, the author tries to calculate a better constant, finds one that mathematically should work, and yet when it is tested, it performs worse than the original during the first and second iterations of Newton's method. That raises the question: How did the original constant ever get found in the first place?

12-03-2006, 03:16 PM
Interesting history. Just one error in the beginning:

i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast.
Actually, i is set to the bit pattern of the 32-bit IEEE float, using a pointer cast.
The magic is about the representation of the float. The shift doesn't divide the number by two, it does some seriously weird stuff: it cuts off the last bit of the mantissa (which divides it by two), makes the last bit of the exponent the first bit of the mantissa (halfs the exponent and does unpredictable things to the mantissa) and, if the number was negative, makes the first bit of the exponent a 1.
That's where the special makeup of the magic number comes in, but I really don't feel like analyzing it. I guess the linked paper does that.

12-04-2006, 06:37 AM
essentially the same thing, written in asm instead of C

float Fast_Sqrtf(float f)
float result;

mov eax, f
sub eax, 0x3f800000
sar eax, 1
add eax, 0x3f800000
mov result, eax

return result;

1. Although the magic number 0x3f800000 just happens to be the
floating point representation of +1.0, here it is represents the bias
in a 32-bit float. (127)

2. The sar instruction is "shift arithmetic right" which replicates
the top bit but shifts all other bits one to the right. This has the
effect of dividing by two. It could just as well be "shr". The use of
sar means that it returns the square root of a negative number as a
negative number.

The sub instruction effectively converts the exponent to a unsigned
integer from a bias-127 integer.
The SAR instruction then divides both the mantissa and the unsigned
integer by 2. Where we had n= 1.x * 2^y we now have either 1.1x *
2^(y\2) or 1.0x * 2(y\2), depending on whether the low order bit of
the exponent was set. You can work out the algebra to see what
happens when you square these things. (Note that in base 10 these
represent 2.5+x and 2.0+x respectively) You get a very crude
approximation of a square root. It undoubtedly has more accuracy in a
certain limited range.

The final add then converts the exponent back to a bias-127 exponent.

45000000h = 2048

sub eax, 3f800000h ->
0100 0101 0000 0000 0000 0000 0000 0000
-0011 1111 1000 0000 0000 0000 0000 0000

= 0000 0101 1000 0000 0000 0000 0000 0000

sar eax,1 _>
= 0000 0010 1100 0000 0000 0000 0000 0000

add eax, 3f800000h ->
0000 0010 1100 0000 0000 0000 0000 0000
+0011 1111 1000 0000 0000 0000 0000 0000
=0100 0010 0100 0000 0000 0000 0000 0000 = 42400000 = 48

The square root of 2048 is 42.something, and 48 * 48 = 2304.