Thread: Optimizing code

  1. #31
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    But if sqrt with a double is 27 cycles, then even an L2 lookup is gong to be significantly faster? Am I likely to hit L3 using a 250,000 value lookup table?

    Come to think of it actually, the data will be weighted toward the low end of the data, so maybe only putting a table for the first 2/3 of the values would work. That means ifs though, which would probably slow it down even more. Blarg.
    Last edited by KBriggs; 06-03-2009 at 06:14 PM.

  2. #32
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If you have no other data to read/write, then L2 cache is probably big enough to hold 250000 entries - but you don't JUST read a value from the L2 cache, you need to do a little bit of math on it as well, I think? And that's where it gets hard to beat the sqrt instruction itself.

    L3 cache is, I guess, somewhere halfway between RAM and L2 cache in access time - if your system HAS a L3 cache, that is.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #33
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    All I need to do after I get the value from the table is add 0.5 and cast it to an int - it then becomes the index for an array. But I had to do that to the sqrt as well, so no, the lookup is the only operation that would be replacing the sqrt.

    There is lots of other data to though - a 10,000-100,000 by 3 aarray of doubles with x,y,z coorrdinates, and two other arrays that are negligible by comparison.
    Last edited by KBriggs; 06-03-2009 at 06:21 PM.

  4. #34
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    So, I would expect that there's little you can do about it.

    You could, I suppose, make a "2 sqrt at once" using SSE, if you don't mind doing some inline assembler - you probably have to do the rest of the math in SSE inline assembler too, but it doesn't sound like you have a HUGE amount of code.

    Seeing as the SQRTPD does it's job in 27 clock-cycles, it's the same amount of time for TWO square roots at once as it is for doing the single square root, and this rule applies to almost all other SSE instructions too, it would probably be possible to do twice as much processing in the same amount of time [subject to running into the memory throughput limit].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #35
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    Dang, I think that's out of the question for me seeing as how I have no idea what that is - I've only been using C for about a month now at work, so I have a long way to go haha.

    Thanks for the input, though

  6. #36
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by KBriggs View Post
    All I need to do after I get the value from the table is add 0.5 and cast it to an int - it then becomes the index for an array. But I had to do that to the sqrt as well, so no, the lookup is the only operation that would be replacing the sqrt.
    Really? You'd cast the double to an int to call sqrt, which probably takes a double so it gets casted back?
    Do you know that casting from double to int is well, not as slow as division, but not one of the fastest things out there you can do.

  7. #37
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    Code:
    index = (int) (sqrt(y) + 0.5)
    take sqrt of y, round to nearest int

    I guess I could make the lookup table a table of ints, where

    Code:
     table[i] = (int) (sqrt(i) + 0.5)
    And that might save time. Thing is, using the nearest integer as an approximation to the square root only gives 1 decimal place accuracy for relatively large numbers. So I would probably have to make the lookup table go by 0.5's or even 0.1's.

    Then the code would become

    Code:
    index = table[(int) y]
    Last edited by KBriggs; 06-04-2009 at 02:22 PM.

  8. #38
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by KBriggs View Post
    Code:
    index = (int) (sqrt(y) + 0.5)
    take sqrt of y, round to nearest int
    Unless you have some weird sqrt() that returns the negative root, in which case it round the wrong way
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  9. #39
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by KBriggs View Post
    But if sqrt with a double is 27 cycles, then even an L2 lookup is gong to be significantly faster? Am I likely to hit L3 using a 250,000 value lookup table?
    Unless you're targetting a specific machine, I don't think you should design your code with a particular cache size or cycle time in mind. This stuff changes all the time.

    Even if you could fit your table into L2, that would mean that not much else would fit -- so you might end up hurting other code even though you're gaining here.

    I think a simple check like:

    Code:
    if( x <= TABLE_MAX )
        return table_sqrt( x );
    else
        return sqrt( x );
    Probably won't hurt things too much. With execution speculation, the hardware sqrt() probably gets pipelined even as the condition is being checked -- if the condition passes, it simply gets cancelled. Of course, now you're filling the pipeline with bubbles, etc etc... Bust out the profiler.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  10. #40
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    Yeah I'll give it a shot later and post the results of some speed tests. Cheers all.

  11. #41
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Oh... you mean add the 0.5 after the sqrt - that makes more sense!
    Nevermind then.

  12. #42
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    The table didn't add much, but turning on the -O3 flag for compiling actually made a pretty major difference in run time.

  13. #43
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by KBriggs View Post
    The table didn't add much, but turning on the -O3 flag for compiling actually made a pretty major difference in run time.
    Optimization probably makes the difference between calling an actual function for sqrt() and using an intrinsic inline call to the FSQRT instruction.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  14. #44
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You may want to add -ffast-math as well. It's not a guarantee, but it may help a bit.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Proposal: Code colouring
    By Perspective in forum A Brief History of Cprogramming.com
    Replies: 28
    Last Post: 05-14-2007, 07:23 AM
  2. Values changing without reason?
    By subtled in forum C Programming
    Replies: 2
    Last Post: 04-19-2007, 10:20 AM
  3. Optimizing my code
    By Lina in forum C Programming
    Replies: 30
    Last Post: 10-22-2006, 01:31 PM
  4. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM
  5. Replies: 0
    Last Post: 02-21-2002, 06:05 PM