Thread: code to find th square root of a number

  1. #31
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    In my experience, SSE intrinsics in Visual Studio produce pretty poor code compared to inline assembler, so I wouldn't necessarily take that as a good indication of "true SSE performance".

    Carmacks approximation seems interesting, as it's basically 2 iterations of the customary loop. I wonder how well it performs on a larger range of numbers. It also "messes up" the floating point & integer units, as it is overlaying FPU data with integer data to do integer subtraction of it. It's a bad idea to do that unless absolutely necessary, since it causes the processor to have to sync the FPU with the integer unit - normally the integer unit will operate independently of the FPU, and both units will "prefer" to work independently.

    In general, SIMD operations is only "meaningful" if there is a complete set of data.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  2. #32
    Registered User
    Join Date
    Mar 2005
    Location
    Mountaintop, Pa
    Posts
    1,058
    Quote Originally Posted by matsp View Post
    In my experience, SSE intrinsics in Visual Studio produce pretty poor code compared to inline assembler, so I wouldn't necessarily take that as a good indication of "true SSE performance".

    Carmacks approximation seems interesting, as it's basically 2 iterations of the customary loop. I wonder how well it performs on a larger range of numbers. It also "messes up" the floating point & integer units, as it is overlaying FPU data with integer data to do integer subtraction of it. It's a bad idea to do that unless absolutely necessary, since it causes the processor to have to sync the FPU with the integer unit - normally the integer unit will operate independently of the FPU, and both units will "prefer" to work independently.
    Being a high level code jockey, the above just flew over my head.
    In general, SIMD operations is only "meaningful" if there is a complete set of data.

    --
    Mats
    One thing I did notice about SIMD, is that it requires full vectorization (Needs all four inputs) for it to execute efficiently.

    For example, using
    Code:
     float fInput1[3] = {30.3F, 100.0F, 140.1F};
    will cause the execution time of _mm_load_ps and _mm_store_ps functions to spike, thus increasing the overall time for computation of the square root.

    So, to eliminate this spike, you have to submit the following for square root calculation of three floats:

    Code:
    float fInput2[4]  = {30.3F, 100.0F, 140.1F, 0.0F};

  3. #33
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by BobS0327 View Post
    Being a high level code jockey, the above just flew over my head.
    Basically: If you take the address of a float, and then process the data in that float as an integer, and then put it back into a float, the processor will shout "Hey, FPU, hang on a bit, can you STOP after THIS instruciton, and don't move finger until I say so" (or worse, the two processor units doing integer and float don't realize they are working on the same data until later on, and one of them has to "back up" - a bit like a pit-stop in Formula One or Indy-car where the driver doesn't stop in time, and has to go back a bit - which usually causes a big extra delay).
    Code:
        lTemp  = * ( long * ) &fY;
        lTemp  = 0x5f3759df - ( lTemp >> 1 );
    is the relevant code.

    On older processors, things didn't happen much in parallel, so there was less of a problem with this style of code. Modern processors definitely execute a lot of instructions in parallel, and there's some pretty complicated logic to prevent one or the other unit from getting it wrong when overlapping work between two units - and one possible scenario is "speculative execution and throwing away the results".

    Did I mention that modern processors are quite complicated?


    One thing I did notice about SIMD, is that it requires full vectorization (Needs all four inputs) for it to execute efficiently.

    For example, using
    Code:
     float fInput1[3] = {30.3F, 100.0F, 140.1F};
    will cause the execution time of _mm_load_ps and _mm_store_ps functions to spike, thus increasing the overall time for computation of the square root.

    So, to eliminate this spike, you have to submit the following for square root calculation of three floats:

    Code:
    float fInput2[4]  = {30.3F, 100.0F, 140.1F, 0.0F};
    There are two potential reasons for this:
    1. The [3] array is misaligned, which causes the processor to use a unaligned version of the "load" instructions.
    2. The [3] array overlaps the result array by 1 element, which means that the processor gets confused as to the content (and must wait for other operations before it can continue, to make sure it doesn't "get it wrong".

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #34
    Registered User
    Join Date
    Oct 2010
    Posts
    10
    any 1 has solution for it:, In the triangle Pythagoras theorem is used to find out the length of any in known side
    If the base is 4 cm and height is 7cm find out the hypotheses by using the Pythagoras
    (Hypotenuse) 2 = (base) 2+ (perpendicular )2


    , i need C++ code of it without header file of <math.h> , need to put only conio.h ,iostream.h header file only,??

  5. #35
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Wow!
    Violated guidelines: homework, old posts
    Bad practices: totally outdated headers
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Pointer confusion
    By Blackroot in forum C++ Programming
    Replies: 11
    Last Post: 09-12-2007, 12:44 AM
  2. Issue w/ Guess My Number Program
    By mkylman in forum C++ Programming
    Replies: 5
    Last Post: 08-23-2007, 01:31 AM
  3. Finding the square root! Not Working!
    By Lah in forum C Programming
    Replies: 5
    Last Post: 09-14-2003, 07:28 PM
  4. Square Root
    By Kyoto Oshiro in forum C++ Programming
    Replies: 5
    Last Post: 09-05-2002, 01:22 AM
  5. can anyone find the problem in my code
    By ArseMan in forum C++ Programming
    Replies: 2
    Last Post: 09-20-2001, 09:02 PM

Tags for this Thread