Thread: function-call overhead... real? or not?

  1. #1
    Registered User
    Join Date
    Dec 2004
    Posts
    35

    function-call overhead... real? or not?

    hi there,
    this is a subject that i know almost nothing about and I was hoping to gain enlightenment from some of you who are far more versed in the ways of C.

    I have read references to function-call overhead...
    and suggestions that using macros instead would avoid performance hits...

    is this true/still true?

    I am developing a scientific computing application that does NOT scale well with system size - by which i mean the molecular system (number of atoms and electrons) that the calculations are being done on...

    it's a stoachasitc method called Diffusion Monte Carlo - used to solve intractable multidimensional wave equations by weighted sampling.

    the more particles the longer it takes. So speed is of the essence (even on very fast computers or clusters...)

    Would it be a very bad thing to use function calls? is there a way to optimise this in C? or must I just either explicate all the calculations (in the inner loop) or use macros to get around this?

    how real is this performance hit with function calls?

    many thanks in advance for any responses
    James

    *edit
    by the way - I do of course intend to do some benchmarking to try to get some quantitative information on the alternative techniques... so i am not just expecting answers on a plate about something that for all I know might be a very hard to answer question (ie it might depend greatly on the type of function call etc - and hard to answer in such general terms) but I'd love to hear other peoples experiences/successes/failures etc related to this issue - and what they found worked best for them
    cheers!
    Last edited by eccles; 12-15-2004 at 03:26 AM.
    VIM + gcc + beer... does life get any better?

  2. #2
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    It's very much real. Here's an example benchmarking program:
    Code:
    itsme@dreams:~/C$ cat speeddiff.c
    #include <stdio.h>
    #include <sys/time.h>
    
    #define macro_test(a, b) ((a)+(b))
    
    int func_test(int a, int b)
    {
      return a+b;
    }
    
    void show_time(char *str, struct timeval *tv)
    {
      printf("%s: %d.%06d\n", str, tv->tv_sec, tv->tv_usec);
    }
    
    int main(void)
    {
      struct timeval tv;
      unsigned int i;
      int val;
    
      gettimeofday(&tv, NULL);
      show_time("function time start", &tv);
      for(i = 0;i < 4000000;++i)
        val = func_test(1, 1);
      gettimeofday(&tv, NULL);
      show_time("function time end", &tv);
    
      gettimeofday(&tv, NULL);
      show_time("macro time start", &tv);
      for(i = 0;i < 4000000;++i)
        val = macro_test(1, 1);
      gettimeofday(&tv, NULL);
      show_time("macro time end", &tv);
    
      return 0;
    }
    And a couple of test runs...
    Code:
    itsme@dreams:~/C$ ./speeddiff
    function time start: 1103103678.099678
    function time end: 1103103678.258196
    macro time start: 1103103678.258232
    macro time end: 1103103678.302928
    itsme@dreams:~/C$ ./speeddiff
    function time start: 1103103679.053172
    function time end: 1103103679.211655
    macro time start: 1103103679.211692
    macro time end: 1103103679.264009
    So you can see the difference there.

    EDIT: Keep in mind that the more time your program spends in the function, the less important the time calling the function becomes. The difference will be the most pronounced in a stupid simple function like the one I used here.
    Last edited by itsme86; 12-15-2004 at 03:47 AM.
    If you understand what you're doing, you're not learning anything.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Given
    Code:
    foo = func( bar, baz);
    pushing each parameter is about the same as an assignment
    The function call itself is about the same as about 3 assignments (call + new base pointer + new stack frame)
    The return result is of course the same as an assignment.

    As itsme86 points out, you only really notice this for very small functions called very many times.

    Before you rush off and macro-ise everything, consider
    1. increased code bloat as your macro is expanded every time. This may mean that your code loop no longer fits in the instruction cache.
    2. macros cannot be profiled.
    3. macros cannot be debugged. Most debuggers work on source lines, and macros get expanded into just one source line.
    4. macros cannot return a result, so you need an extra "parameter" indicating where you want the answer stored.
    5. macros may evaluate their parameters more than once.
    Code:
    #define SQUARE(x)  ((x)*(x))
    int square ( int x ) {
        return x*x;
    }
    ...
    a = SQUARE ( expensive_fn() );  // called twice (ouch!)
    b = square ( expensive_fn() );  // called once
    c = SQUARE ( x++ );  // undefined - x modified twice
    d = square ( x++ );  // just what you would expect
    Personally, I'd leave everything as a function until the program is finished, you've had a chance to profile it with real-world data, and you can prove that changing some bit of code into a macro is both safe and delivers a real performance improvement.

    Since you seem to be using gcc, you might consider using "inline" functions, which are a new feature of C99.

  4. #4
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Function call overhead is a necessary evil, but making all of them macros would be ugly and directly inserting the code instead of calling would make your exe size extremely large.

  5. #5
    .
    Join Date
    Nov 2003
    Posts
    307
    Get it working absolutely correctly first. Then, if it takes too long to be practical, then look at things like loop unrolling or using macros or whatever. Profile the code - find out what really is time-consuming, not what you think is time-consuming. Then make changes.

    Here is a discussion of optimization methods - AFTER profiling only:

    http://vision.eng.shu.ac.uk/bala/c/c...imization.html

  6. #6
    Registered User
    Join Date
    Dec 2004
    Posts
    35
    thanks so much for all your helpful comments and examples (and that excellent link!) and in particular the warnings about macros. I should probably stay away from them at this stage... I am going to see if I can find something to profile my code with and see where it spends the most time.

    thanks!

    If I was able to implement it properly, would a doubly linked list be more efficient than an array of structs?

    the process is basically like this (pseudo-code):
    Code:
    for ( time=0 ; time<totalTime ; time+= 0.01 )
    {
            for ( i=0 ; i< numWalkers ; i++ )
            {
                    for ( j=0 ; j<numElectrons ; j++ )
                    {
                             *** all kinds of calculations etc etc... including removal of walkers 
                            from list, or adding them to list ***
                    }
            }
    }
    So unfortunately several levels of nested loops are totally unavoidable in this situation... it's a markov process (where each 'walker' is a state, which contains information about each electron, and at each timestep the electron is moved in space by a small amount, and then the total energy recalculated to see whether to accept the move) - so unfortunately, every state depents totally on the previous state. And because the position increments are calculated as a pseudorandom gaussian,the numbers will be different each time and can't be predicted.

    so that's the problem basically. every state depends on the last one, and all the energies BETWEEN the particles are recalculated each time, (as are various other total system properties that depend on iterating over the whole system)... so yeah - no way to avoid several levels of nested loops (in fact there might be one or two more than I put in).

    oh - and would it be worth trying to do an integer loop for time, then divide it by a factor (or rather multiply it by a constant valued float) - would that improve things at all? I seem to have gathered that it might.
    Last edited by eccles; 12-18-2004 at 04:16 PM.
    VIM + gcc + beer... does life get any better?

  7. #7
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Ok no one went into this aspect too much either, but there is also a limited amount of stack space. Machine code takes up space too Functions can also lighten the work load a bit by making one set of commands freely usable across all other functions. Loops are nice to have inlined into your code block, and recursion can be evil, but other than that, don't go overboard on trying to optimize by killing off functions.

  8. #8
    Registered User
    Join Date
    Dec 2004
    Posts
    35
    thanks for mentioning that
    I think I read that function variables are stored on the heap and globals on the stack - is that what you are referring to?

    the annoying thing RE time spent calling vs time spent in function is that some of the functions are mathematical functions... they are very large expressions but mostly nothing that couldn't be represented by one (very long) line of code. like calculating the value of the wavefunction for a particular point (vector)... but the expressions are very long. But I think it's still quite a bottleneck since it prob takes almost as much time to call the function (get in and out again etc) as it does to evaluate the result... bad thing is, some of these functions also call other functions in the expression, which in turn may do similar things... so I think this might be a significant slow-down point...
    and these functions get called in the inner loop.
    VIM + gcc + beer... does life get any better?

  9. #9
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Actually everything is stored in the stack. Every command in a function is compiled into machine code which takes up space too. You may often times hear someone who uses a lot of assembler comment on how many bytes a certain operation is. Why spout such crazy giberish? Well because you can actually measure how much space a given function will take up in bytes. Since high level languages remove you from the machine code by quite a bit, it would be hard to say "This function is only 48 bytes in size and runs in 26 clock cycles!"

    But like you suggested, variables and globals are also stored in memory. If you are more interested in this topic you could read some of masm's documentation. An even basic grasp of assembler makes you understand what it is that makes your executables tick.

  10. #10
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Um master not everything is stored on the stack. Automatic local variables, parameters, and push & pop are stored on the stack. Static locals, globals, and dynamically allocated memory is stored in the heap. Instructions can be stored in their own area also.

    Function size comes into play when you are dealing with limited resources or when you are trying to optimize a program and you want to make sure that an entire function that is called often is kept in cache for faster execution.

  11. #11
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Here is where I'm confused. I was under the impression that ebp points to code blocks stores the base for every stack to maintain code execution

  12. #12
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    That's correct that ebp points to the base of the current stack frame, but Thantos is correct in that everything is not stored on the stack. Anytime you use a global or static variable, or anytime you use malloc(), you are using the heap - not the stack.

  13. #13
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Right I understand memory heaps. Just not the magic if where functions go. I know commands are placed into the cache upon execution, but i'm not clear on where blocks of code are stored leading up to execution.

  14. #14
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    The code is stored in the .code segment of the processes memory. This section of memory is read and execute only. It is not part of the stack or heap space.

  15. #15
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Ah ok I wasn't completely sure on that one. Thanks.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Undefined Reference Compiling Error
    By AlakaAlaki in forum C++ Programming
    Replies: 1
    Last Post: 06-27-2008, 11:45 AM
  2. temperature sensors
    By danko in forum C Programming
    Replies: 22
    Last Post: 07-10-2007, 07:26 PM
  3. Screwy Linker Error - VC2005
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 06-19-2007, 02:39 PM
  4. how to get function call stack
    By George2 in forum C Programming
    Replies: 18
    Last Post: 11-11-2006, 07:51 AM
  5. Game Pointer Trouble?
    By Drahcir in forum C Programming
    Replies: 8
    Last Post: 02-04-2006, 02:53 AM