Fast InvSqrt

This is a discussion on Fast InvSqrt within the A Brief History of Cprogramming.com forums, part of the Community Boards category; Fun article, neat code. I'd like to read the paper that explains the constant.. but I have enough papers to ...

  1. #1
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640

    Fast InvSqrt

    Fun article, neat code. I'd like to read the paper that explains the constant.. but I have enough papers to read as is.

    http://www.beyond3d.com/articles/fastinvsqrt/

  2. #2
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,399
    Wow...that was very interesting. I'll have to return to that paper a few times; just skimming through it, the author tries to calculate a better constant, finds one that mathematically should work, and yet when it is tested, it performs worse than the original during the first and second iterations of Newton's method. That raises the question: How did the original constant ever get found in the first place?
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  3. #3
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,892
    Interesting history. Just one error in the beginning:
    i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast.
    Actually, i is set to the bit pattern of the 32-bit IEEE float, using a pointer cast.
    The magic is about the representation of the float. The shift doesn't divide the number by two, it does some seriously weird stuff: it cuts off the last bit of the mantissa (which divides it by two), makes the last bit of the exponent the first bit of the mantissa (halfs the exponent and does unpredictable things to the mantissa) and, if the number was negative, makes the first bit of the exponent a 1.
    That's where the special makeup of the magic number comes in, but I really don't feel like analyzing it. I guess the linked paper does that.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  4. #4

    Join Date
    May 2005
    Posts
    1,041
    essentially the same thing, written in asm instead of C
    Code:
    float	Fast_Sqrtf(float f)
    {
    	float	result;
    	
    	_asm
    	{
    		mov eax, f
    		sub eax, 0x3f800000
    		sar eax, 1
    		add eax, 0x3f800000
    		mov result, eax
    	}
    	
    	return result;
    }
    1. Although the magic number 0x3f800000 just happens to be the
    floating point representation of +1.0, here it is represents the bias
    in a 32-bit float. (127)

    2. The sar instruction is "shift arithmetic right" which replicates
    the top bit but shifts all other bits one to the right. This has the
    effect of dividing by two. It could just as well be "shr". The use of
    sar means that it returns the square root of a negative number as a
    negative number.

    The sub instruction effectively converts the exponent to a unsigned
    integer from a bias-127 integer.
    The SAR instruction then divides both the mantissa and the unsigned
    integer by 2. Where we had n= 1.x * 2^y we now have either 1.1x *
    2^(y\2) or 1.0x * 2(y\2), depending on whether the low order bit of
    the exponent was set. You can work out the algebra to see what
    happens when you square these things. (Note that in base 10 these
    represent 2.5+x and 2.0+x respectively) You get a very crude
    approximation of a square root. It undoubtedly has more accuracy in a
    certain limited range.

    The final add then converts the exponent back to a bias-127 exponent.

    consider
    45000000h = 2048

    sub eax, 3f800000h ->
    0100 0101 0000 0000 0000 0000 0000 0000
    -0011 1111 1000 0000 0000 0000 0000 0000

    = 0000 0101 1000 0000 0000 0000 0000 0000

    sar eax,1 _>
    = 0000 0010 1100 0000 0000 0000 0000 0000

    add eax, 3f800000h ->
    0000 0010 1100 0000 0000 0000 0000 0000
    +0011 1111 1000 0000 0000 0000 0000 0000
    =0100 0010 0100 0000 0000 0000 0000 0000 = 42400000 = 48

    The square root of 2048 is 42.something, and 48 * 48 = 2304.
    Last edited by BobMcGee123; 12-04-2006 at 06:54 AM.
    I'm not immature, I'm refined in the opposite direction.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 6
    Last Post: 02-27-2009, 03:43 PM
  2. Automate Fast User Switching
    By haku_nin in forum Windows Programming
    Replies: 2
    Last Post: 07-29-2005, 12:44 AM
  3. Saving a part of screen into a file fast!
    By Zap in forum C++ Programming
    Replies: 4
    Last Post: 06-28-2003, 10:56 AM
  4. Super fast bilinear interpolation
    By VirtualAce in forum Game Programming
    Replies: 2
    Last Post: 06-18-2002, 09:35 PM
  5. moving a bitmap, fast and smooth
    By werdy666 in forum Game Programming
    Replies: 1
    Last Post: 05-31-2002, 06:49 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21