Optimizing code

This is a discussion on Optimizing code within the C Programming forums, part of the General Programming Boards category; I need some advice on speeding up my code: Question: is x = y / foo(bar) significantly slower than having ...

  1. #1
    Registered User
    Join Date
    Jun 2009
    Posts
    452

    Optimizing code

    I need some advice on speeding up my code:

    Question:

    is x = y / foo(bar) significantly slower than having foo return the reciprocal of its usual return value and changing the line to:

    x = y * foo(bar) ?

    Also, is there a significant speed difference between the operation 1 / x as opposed to y / x where y != 1?

    Thanks

  2. #2
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,246
    The answer would be dependent on which CPU you are using. At any rate, even if you got a speed increase, it would be very small, and in my opinion it would not be worth the loss of code readability.

    Also keep in mind, that it's possible that sort of change could effect the precision of the result (I'm assuming you are doing floating point arithmetic here).

  3. #3
    Registered User
    Join Date
    Jun 2009
    Posts
    452
    Yeah, I did some timing and the speed difference is negligible, on the order of 0.1%. Dang..

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If the hardware is capable of calculating reciprocal faster than usual divide - which there are some hardware that has special microcode/hardware to solve that problem, then yes.

    But normally, the reason for calculating 1/y and x * (1/y) is that y is (relatively speaking) a constant, so many x * (1/y) is faster than x / y if we only need to calculate 1/y once per loop, and then using multiply instead. If y changes every time, then there is little benefit in calculating 1/y and multiplying x * (1/y).

    As an example:
    AMD Family 10 (latest generation processors):
    SSE2:
    DIVPS = 20 clocks.
    MULPS = 4 clocks.


    X87:
    FDIV = 16-24 clocks depending on precision.
    FMUL= 11 clocks.

    So in the above cases, you only need to do about 2 x * (1/y) where (1/y) is the same value [and can be precalculated], to make it faster.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,248
    On x86, floating point division is much slower than multiplication. But it usually does not matter because the CPU is able to pipeline other operations in parallel with the division operation.

    While the FP divider unit is processing, other instructions that do not require the FPU can execute simultaneously. This can sometimes hide the latency of the division to the point of being negligible.

    Having said that, the multiply-by-reciprocal-instead-of-divide trick is commonly used, and can definitely result in a speedup in some situations, especially if it's easy to compute the reciprocal without actually dividing.

    If you are computing the reciprocal by doing a 1.0 / x, then you're still dividing and not really gaining anything.

    EDIT: There is fast square root algorithm using Newton's method that actually computes 1.0 / sqrt(x) instead of sqrt(x). You must invert the result at the end. But the key to the trick is that in the Newton loop, there are no divisions, unlike in the direct Newton computation
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  6. #6
    Registered User
    Join Date
    Jun 2009
    Posts
    452
    That square root algorithm could help me a lot - do you know where I can find it, or it a library function?

    Actually, a more specific question:

    In my loop, I compute the square root of a double and cast it to an int like

    x = (int) (sqrt(y) + 0.5). Since I cast it like this, I only care about the first decimal point of the square root. Do you have any suggestions for a quick way to get the square root of a double to one decimal place?
    Last edited by KBriggs; 06-03-2009 at 10:26 AM.

  7. #7
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,659
    Understanding Quake’s Fast Inverse Square Root | BetterExplained
    OK, it's 1/sqrt(x), but it might get you close enough in a quicker way than now.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Quote Originally Posted by KBriggs View Post
    That square root algorithm could help me a lot - do you know where I can find it, or it a library function?

    Actually, a more specific question:

    In my loop, I compute the square root of a double and cast it to an int like

    x = (int) (sqrt(y) + 0.5). Since I cast it like this, I only care about the first decimal point of the square root. Do you have any suggestions for a quick way to get the square root of a double to one decimal place?
    What's the range of your double? If your y's are reasonably close together you could try a Taylor series expansion, or use the fast 1/ as Salem mentioned.

  9. #9
    Registered User
    Join Date
    Jun 2009
    Posts
    452
    The double could be anything, but reasonably it won't be much more than say, 500.

    How about something like this for a quick hack at getting 1 decimal accuracy using Newton's method. Right now I haven't done I/O stuff, I am just altering numbers as needed.

    Code:
    #include <stdio.h>
    
    int main()
    {
      double k,prevroot,nextroot,precision;
      k = 29.5546743;
      prevroot = 15;
      nextroot = 0;
      precision = 1.0;
      while (precision > 0.05)
      {
        nextroot = (prevroot * prevroot + k) / (2 * prevroot);
        precision = nextroot - prevroot;
        if (precision < 0)
        {
          precision *= -1;
        }
        prevroot = nextroot;
      }
      printf("%f\n",nextroot);
    }
    My question is that since Newton's method depends on the initial guess, what is a good initial guess for sqrt(k)? And would this be any faster than tradition methods?


    Actually, scratch that, this seems to work (I just took it from the link):

    Code:
    #include <stdio.h>
    
    int main()
    {
      float x;
      x = 29.456;
      float xhalf;
      xhalf = 0.5f * x;
      int i;
      i = *(int*)&x; // store floating-point bits in integer
      i = 0x5f3759d5 - (i >> 1); // initial guess for Newton's method
      x = *(float*)&i; // convert new bits into float
      x = x*(1.5f - xhalf*x*x); // One round of Newton's method
      printf("%f\n",1.0 / x);
    }
    I don't even begin to understand what's going on there, but thanks so much

    And one last edit: That fastsqrt actually makes my code run 20% slower than the library sqrt function haha. Any other ideas?
    Last edited by KBriggs; 06-03-2009 at 11:29 AM.

  10. #10
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,659
    > The double could be anything, but reasonably it won't be much more than say, 500.
    That's small enough to consider a lookup table IMO.

    Or at least a lookup table which spans say 95%+ of the expected values (very quick), then use a calculation for the odd-ball exceptions which remain outside the table range.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  11. #11
    Registered User
    Join Date
    Jun 2009
    Posts
    452
    Eh, the program already uses quite a bit of memory, I don't know if a lookup table would help anything.

  12. #12
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,248
    Quote Originally Posted by KBriggs View Post
    And one last edit: That fastsqrt actually makes my code run 20% slower than the library sqrt function haha. Any other ideas?
    The library sqrt() probably calls a native instruction. It used to be (at the time Quake was written) that this was slower than computing it using the trick listed above. Apparently, these days the performance is improved.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  13. #13
    Registered User
    Join Date
    Jun 2009
    Posts
    452
    I would try the taylor expansion method but I don't know how

  14. #14
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,659
    > Eh, the program already uses quite a bit of memory, I don't know if a lookup table would help anything.
    The square root of 500 fits easily into a byte.
    So it's a 500 byte lookup table.

    Are you seriously that short of memory? Are you programming for some embedded device with only a meg (or less) of memory? Or a standard desktop with lots of memory.

    Plus, if this is the only use of sqrt() in your code, you save some by not linking with the library code.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  15. #15
    Registered User
    Join Date
    Jun 2009
    Posts
    452
    I guess I am not really that short haha. I'll look into it. I've never use one before, so I am not really sure how to implement it.

Page 1 of 3 123 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Proposal: Code colouring
    By Perspective in forum A Brief History of Cprogramming.com
    Replies: 28
    Last Post: 05-14-2007, 07:23 AM
  2. Values changing without reason?
    By subtled in forum C Programming
    Replies: 2
    Last Post: 04-19-2007, 10:20 AM
  3. Optimizing my code
    By Lina in forum C Programming
    Replies: 30
    Last Post: 10-22-2006, 01:31 PM
  4. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM
  5. Replies: 0
    Last Post: 02-21-2002, 05:05 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21