Thread: Optimized clipping of floats/doubles ?

  1. #1
    Registered User
    Join Date
    Apr 2009
    Posts
    8

    Optimized clipping of floats/doubles ?

    Is there a way to do this more efficiently?
    Code:
    inline void Clip( double &x, double a, double b ) // makes sure x is between a and b
    {
     if (x<a) x = a;
     else if (x>b) x = b;
    }
    Note that this involves lotsa fnstsw and conditional jumps, which suck.

    I personally (msvc8@core2duo) gain almost 50% speed increase if I do this:
    Code:
    inline void Clip( double &x, double a, double b ) // makes sure x is between a and b
    {
     x = b-x;
     x = (x+fabs(x))*0.5; // this essentially does "if (x<0) x=0;"
     x = b-x-a;
     x = (x+fabs(x))*0.5;
     x += a;
    }
    No status word crap, and no checks anymore.

    Any other tricks to make this faster?

  2. #2
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Well that's about the smartest thing I've ever seen posted here.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  3. #3
    Registered User
    Join Date
    May 2007
    Posts
    147
    You are kidding, right?

    The second version takes about 6 times longer than the first one.
    Last edited by JVene; 05-13-2009 at 08:11 PM.

  4. #4
    Registered User
    Join Date
    Apr 2009
    Posts
    8
    How do you measure, and what compiler/cpu?
    I did a loop of a few million times, calling this function with several different cases in a row (and adding the result for dummy output, otherwise the compiler optimizes away everything ).

    Note that the fabs() function translates directly to a single FPU instruction, whereas the compare involves storing the FPU status word, plus you have to do a conditional jump.

  5. #5
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Phuzzillogic View Post
    How do you measure, and what compiler/cpu?
    I did a loop of a few million times, calling this function with several different cases in a row (and adding the result for dummy output, otherwise the compiler optimizes away everything ).
    Indeed. After making sure that was not the case, I timed it myself on 10 million iterations and got 0.51 seconds for the if-statement version and 0.32 seconds for the fabs() version.

    I used g++ 4.1.2 with -O3.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    I'm thinking the second version would be best for general use anyway since you could have a reasonable guess (or know for a given processor) at how many clocks it would take for any invocation.

    That said, both versions were in a few clocks of each other on my system--with the first version always winning.

    "The second version takes about 6 times longer than the first one."

    Did you just count the number of instructions you could see? (Maybe on an old Pentium 3?)

    Soma

    Edit: I ran a simple test with `std::clock()' passing off the variables as references to a "do nothing" function. (Where the "do nothing" function was compiled separately preventing G++ from completely removing the test.)

  7. #7
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I'm guessing there isn't anything like CMOV for doubles.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by iMalc View Post
    I'm guessing there isn't anything like CMOV for doubles.
    FCMOV - Wikipedia, the free encyclopedia

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Registered User Jace's Avatar
    Join Date
    May 2009
    Posts
    8
    Here's a special case to clip between 0 and 1, using IEEE magic:
    Code:
    inline void Clip01( const float *src, float *dst ) // *dst = *src clipped between 0 and 1
    {
    	static const int ieeeOne = 0x3f800000;
    	int x = *(int*)(src);
    	x = ieeeOne - (x & ~(x>>(sizeof(x)*8-1)));
    	*(int*)dst = ieeeOne - (x & ~(x>>(sizeof(x)*8-1)));
    }

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Casting floating point to integer and back again is sometimes not faster than doing the relevant code in floating point. This is because it disrupts the flow within the processor (forward floating point data to the integer unit and the other way around). This sort of thing is fine if you do this in a loop that ONLY does clipping on a large array, but if you mix it with other math on the input variables, then it can be quite bad for performance.

    It is most likely more efficient if we can produce SSE instructions and use the max/min instructions.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Registered User
    Join Date
    May 2007
    Posts
    147
    To those inquiring about my timing results:

    Yes, I looped throught 5 million calls to either version - all the obvious things (I've been at this 28 years, so I'm familiar with optimization issues on such tests )

    The first version of the inline consistently took 1/6th the time of the second version.

    Visual studio 2008, unmanaged build, results tested on AMD x2 @ 3ghz, Intel q6600 at stock speed, XP 64 in 32 and 64 bit targets.

    Also, the assembler output, viewed using the disassembly view on the release build, shows considerable complication of the output for version 2. Optimization set set to favor speed.

    I can't see how the first version can be slower based on what I see; how are you getting the results you are?

    Are you expecting the multiplications to evaporate in some optimizations?

    I'm on a cheap laptop right now, can't open the test project until tonight....I'm curious as to how it is even possible the second version could be faster.

  12. #12
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Quote Originally Posted by someone
    Core uses two floating point calculation units, one dedicated to addition and the other to multiplication and division. Theoretical calculation capacity is 2 x87 instructions per cycle and 2 SSE 128 bit floating point instructions per cycle (that is 8 operations on 32 bit simple precision floating points, or 4 operations for double precision 64 bit floating points). Core is, in theory, two times faster for this type of instruction than Mobile, Netburst and K8. Letīs see how it behaves with several SSE2 instructions.
    Intel Core 2 Duo - Test - BeHardware

    Your 'fast' code would seem to be benefiting from the alternate 'add' and 'mul' thing you've got going on in the code.
    Plus, it's all bound to the FPU (unlike your first example).

    It's also highly dependent on having the right hardware to play on. As has already been commented, others will see a much worse performance.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Optimized inproduct of floats/doubles? (sse?)
    By Phuzzillogic in forum C++ Programming
    Replies: 6
    Last Post: 05-14-2009, 09:14 AM
  2. Clipping in Direct3D
    By VirtualAce in forum Game Programming
    Replies: 7
    Last Post: 11-24-2004, 01:33 AM
  3. 2D Clipping
    By Duetti in forum Game Programming
    Replies: 10
    Last Post: 12-04-2003, 12:45 AM
  4. Clipping plane prob?
    By gazsux in forum Game Programming
    Replies: 7
    Last Post: 07-04-2003, 09:49 PM
  5. 3D clipping & gluPerspective()
    By Perspective in forum Game Programming
    Replies: 8
    Last Post: 02-22-2003, 10:40 AM

Tags for this Thread