Perspective

12-01-2006, 10:14 PM

Fun article, neat code. I'd like to read the paper that explains the constant.. but I have enough papers to read as is.

http://www.beyond3d.com/articles/fastinvsqrt/

http://www.beyond3d.com/articles/fastinvsqrt/

View Full Version : Fast InvSqrt

Perspective

12-01-2006, 10:14 PM

Fun article, neat code. I'd like to read the paper that explains the constant.. but I have enough papers to read as is.

http://www.beyond3d.com/articles/fastinvsqrt/

http://www.beyond3d.com/articles/fastinvsqrt/

pianorain

12-02-2006, 08:54 AM

Wow...that was very interesting. I'll have to return to that paper a few times; just skimming through it, the author tries to calculate a better constant, finds one that mathematically should work, and yet when it is tested, it performs worse than the original during the first and second iterations of Newton's method. That raises the question: How did the original constant ever get found in the first place?

CornedBee

12-03-2006, 03:16 PM

Interesting history. Just one error in the beginning:

i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast.

Actually, i is set to the bit pattern of the 32-bit IEEE float, using a pointer cast.

The magic is about the representation of the float. The shift doesn't divide the number by two, it does some seriously weird stuff: it cuts off the last bit of the mantissa (which divides it by two), makes the last bit of the exponent the first bit of the mantissa (halfs the exponent and does unpredictable things to the mantissa) and, if the number was negative, makes the first bit of the exponent a 1.

That's where the special makeup of the magic number comes in, but I really don't feel like analyzing it. I guess the linked paper does that.

i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast.

Actually, i is set to the bit pattern of the 32-bit IEEE float, using a pointer cast.

The magic is about the representation of the float. The shift doesn't divide the number by two, it does some seriously weird stuff: it cuts off the last bit of the mantissa (which divides it by two), makes the last bit of the exponent the first bit of the mantissa (halfs the exponent and does unpredictable things to the mantissa) and, if the number was negative, makes the first bit of the exponent a 1.

That's where the special makeup of the magic number comes in, but I really don't feel like analyzing it. I guess the linked paper does that.

BobMcGee123

12-04-2006, 06:37 AM

essentially the same thing, written in asm instead of C

float Fast_Sqrtf(float f)

{

float result;

_asm

{

mov eax, f

sub eax, 0x3f800000

sar eax, 1

add eax, 0x3f800000

mov result, eax

}

return result;

}

1. Although the magic number 0x3f800000 just happens to be the

floating point representation of +1.0, here it is represents the bias

in a 32-bit float. (127)

2. The sar instruction is "shift arithmetic right" which replicates

the top bit but shifts all other bits one to the right. This has the

effect of dividing by two. It could just as well be "shr". The use of

sar means that it returns the square root of a negative number as a

negative number.

The sub instruction effectively converts the exponent to a unsigned

integer from a bias-127 integer.

The SAR instruction then divides both the mantissa and the unsigned

integer by 2. Where we had n= 1.x * 2^y we now have either 1.1x *

2^(y\2) or 1.0x * 2(y\2), depending on whether the low order bit of

the exponent was set. You can work out the algebra to see what

happens when you square these things. (Note that in base 10 these

represent 2.5+x and 2.0+x respectively) You get a very crude

approximation of a square root. It undoubtedly has more accuracy in a

certain limited range.

The final add then converts the exponent back to a bias-127 exponent.

consider

45000000h = 2048

sub eax, 3f800000h ->

0100 0101 0000 0000 0000 0000 0000 0000

-0011 1111 1000 0000 0000 0000 0000 0000

= 0000 0101 1000 0000 0000 0000 0000 0000

sar eax,1 _>

= 0000 0010 1100 0000 0000 0000 0000 0000

add eax, 3f800000h ->

0000 0010 1100 0000 0000 0000 0000 0000

+0011 1111 1000 0000 0000 0000 0000 0000

=0100 0010 0100 0000 0000 0000 0000 0000 = 42400000 = 48

The square root of 2048 is 42.something, and 48 * 48 = 2304.

float Fast_Sqrtf(float f)

{

float result;

_asm

{

mov eax, f

sub eax, 0x3f800000

sar eax, 1

add eax, 0x3f800000

mov result, eax

}

return result;

}

1. Although the magic number 0x3f800000 just happens to be the

floating point representation of +1.0, here it is represents the bias

in a 32-bit float. (127)

2. The sar instruction is "shift arithmetic right" which replicates

the top bit but shifts all other bits one to the right. This has the

effect of dividing by two. It could just as well be "shr". The use of

sar means that it returns the square root of a negative number as a

negative number.

The sub instruction effectively converts the exponent to a unsigned

integer from a bias-127 integer.

The SAR instruction then divides both the mantissa and the unsigned

integer by 2. Where we had n= 1.x * 2^y we now have either 1.1x *

2^(y\2) or 1.0x * 2(y\2), depending on whether the low order bit of

the exponent was set. You can work out the algebra to see what

happens when you square these things. (Note that in base 10 these

represent 2.5+x and 2.0+x respectively) You get a very crude

approximation of a square root. It undoubtedly has more accuracy in a

certain limited range.

The final add then converts the exponent back to a bias-127 exponent.

consider

45000000h = 2048

sub eax, 3f800000h ->

0100 0101 0000 0000 0000 0000 0000 0000

-0011 1111 1000 0000 0000 0000 0000 0000

= 0000 0101 1000 0000 0000 0000 0000 0000

sar eax,1 _>

= 0000 0010 1100 0000 0000 0000 0000 0000

add eax, 3f800000h ->

0000 0010 1100 0000 0000 0000 0000 0000

+0011 1111 1000 0000 0000 0000 0000 0000

=0100 0010 0100 0000 0000 0000 0000 0000 = 42400000 = 48

The square root of 2048 is 42.something, and 48 * 48 = 2304.

Powered by vBulletin® Version 4.2.5 Copyright © 2019 vBulletin Solutions Inc. All rights reserved.