function-call overhead... real? or not?

**eccles** · 12-15-2004

hi there,
this is a subject that i know almost nothing about and I was hoping to gain enlightenment from some of you who are far more versed in the ways of C.

I have read references to function-call overhead...
and suggestions that using macros instead would avoid performance hits...

is this true/still true?

I am developing a scientific computing application that does NOT scale well with system size - by which i mean the molecular system (number of atoms and electrons) that the calculations are being done on...

it's a stoachasitc method called Diffusion Monte Carlo - used to solve intractable multidimensional wave equations by weighted sampling.

the more particles the longer it takes. So speed is of the essence (even on very fast computers or clusters...)

Would it be a very bad thing to use function calls? is there a way to optimise this in C? or must I just either explicate all the calculations (in the inner loop) or use macros to get around this?

how real is this performance hit with function calls?

many thanks in advance for any responses

James

*edit
by the way - I do of course intend to do some benchmarking to try to get some quantitative information on the alternative techniques... so i am not just expecting answers on a plate about something that for all I know might be a very hard to answer question (ie it might depend greatly on the type of function call etc - and hard to answer in such general terms) but I'd love to hear other peoples experiences/successes/failures etc related to this issue - and what they found worked best for them

cheers!

**itsme86** · 12-15-2004

It's very much real. Here's an example benchmarking program:

Code:

itsme@dreams:~/C$ cat speeddiff.c
#include <stdio.h>
#include <sys/time.h>

#define macro_test(a, b) ((a)+(b))

int func_test(int a, int b)
{
  return a+b;
}

void show_time(char *str, struct timeval *tv)
{
  printf("%s: %d.%06d\n", str, tv->tv_sec, tv->tv_usec);
}

int main(void)
{
  struct timeval tv;
  unsigned int i;
  int val;

  gettimeofday(&tv, NULL);
  show_time("function time start", &tv);
  for(i = 0;i < 4000000;++i)
    val = func_test(1, 1);
  gettimeofday(&tv, NULL);
  show_time("function time end", &tv);

  gettimeofday(&tv, NULL);
  show_time("macro time start", &tv);
  for(i = 0;i < 4000000;++i)
    val = macro_test(1, 1);
  gettimeofday(&tv, NULL);
  show_time("macro time end", &tv);

  return 0;
}

And a couple of test runs...

Code:

itsme@dreams:~/C$ ./speeddiff
function time start: 1103103678.099678
function time end: 1103103678.258196
macro time start: 1103103678.258232
macro time end: 1103103678.302928
itsme@dreams:~/C$ ./speeddiff
function time start: 1103103679.053172
function time end: 1103103679.211655
macro time start: 1103103679.211692
macro time end: 1103103679.264009

So you can see the difference there.

EDIT: Keep in mind that the more time your program spends in the function, the less important the time calling the function becomes. The difference will be the most pronounced in a stupid simple function like the one I used here.

**Salem** · 12-15-2004

Given

Code:

foo = func( bar, baz);

pushing each parameter is about the same as an assignment
The function call itself is about the same as about 3 assignments (call + new base pointer + new stack frame)
The return result is of course the same as an assignment.

As itsme86 points out, you only really notice this for very small functions called very many times.

Before you rush off and macro-ise everything, consider
1. increased code bloat as your macro is expanded every time. This may mean that your code loop no longer fits in the instruction cache.
2. macros cannot be profiled.
3. macros cannot be debugged. Most debuggers work on source lines, and macros get expanded into just one source line.
4. macros cannot return a result, so you need an extra "parameter" indicating where you want the answer stored.
5. macros may evaluate their parameters more than once.

Code:

#define SQUARE(x)  ((x)*(x))
int square ( int x ) {
    return x*x;
}
...
a = SQUARE ( expensive_fn() );  // called twice (ouch!)
b = square ( expensive_fn() );  // called once
c = SQUARE ( x++ );  // undefined - x modified twice
d = square ( x++ );  // just what you would expect

Personally, I'd leave everything as a function until the program is finished, you've had a chance to profile it with real-world data, and you can prove that changing some bit of code into a macro is both safe and delivers a real performance improvement.

Since you seem to be using gcc, you might consider using "inline" functions, which are a new feature of C99.

**VirtualAce** · 12-15-2004

Function call overhead is a necessary evil, but making all of them macros would be ugly and directly inserting the code instead of calling would make your exe size extremely large.

**jim mcnamara** · 12-15-2004

Get it working absolutely correctly first. Then, if it takes too long to be practical, then look at things like loop unrolling or using macros or whatever. Profile the code - find out what really is time-consuming, not what you think is time-consuming. Then make changes.

Here is a discussion of optimization methods - AFTER profiling only:

http://vision.eng.shu.ac.uk/bala/c/c...imization.html

**eccles** · 12-18-2004

thanks so much for all your helpful comments and examples (and that excellent link!) and in particular the warnings about macros. I should probably stay away from them at this stage... I am going to see if I can find something to profile my code with and see where it spends the most time.

thanks!

If I was able to implement it properly, would a doubly linked list be more efficient than an array of structs?

the process is basically like this (pseudo-code):

Code:

for ( time=0 ; time<totalTime ; time+= 0.01 )
{
        for ( i=0 ; i< numWalkers ; i++ )
        {
                for ( j=0 ; j<numElectrons ; j++ )
                {
                         *** all kinds of calculations etc etc... including removal of walkers 
                        from list, or adding them to list ***
                }
        }
}

So unfortunately several levels of nested loops are totally unavoidable in this situation... it's a markov process (where each 'walker' is a state, which contains information about each electron, and at each timestep the electron is moved in space by a small amount, and then the total energy recalculated to see whether to accept the move) - so unfortunately, every state depents totally on the previous state. And because the position increments are calculated as a pseudorandom gaussian,the numbers will be different each time and can't be predicted.

so that's the problem basically. every state depends on the last one, and all the energies BETWEEN the particles are recalculated each time, (as are various other total system properties that depend on iterating over the whole system)... so yeah - no way to avoid several levels of nested loops (in fact there might be one or two more than I put in).

oh - and would it be worth trying to do an integer loop for time, then divide it by a factor (or rather multiply it by a constant valued float) - would that improve things at all? I seem to have gathered that it might.

**~~master5001~~** · 12-19-2004

Ok no one went into this aspect too much either, but there is also a limited amount of stack space. Machine code takes up space too

Functions can also lighten the work load a bit by making one set of commands freely usable across all other functions. Loops are nice to have inlined into your code block, and recursion can be evil, but other than that, don't go overboard on trying to optimize by killing off functions.

**eccles** · 12-19-2004

thanks for mentioning that

I think I read that function variables are stored on the heap and globals on the stack - is that what you are referring to?

the annoying thing RE time spent calling vs time spent in function is that some of the functions are mathematical functions... they are very large expressions but mostly nothing that couldn't be represented by one (very long) line of code. like calculating the value of the wavefunction for a particular point (vector)... but the expressions are very long. But I think it's still quite a bottleneck since it prob takes almost as much time to call the function (get in and out again etc) as it does to evaluate the result... bad thing is, some of these functions also call other functions in the expression, which in turn may do similar things... so I think this might be a significant slow-down point...
and these functions get called in the inner loop.

**~~master5001~~** · 12-19-2004

Actually everything is stored in the stack. Every command in a function is compiled into machine code which takes up space too. You may often times hear someone who uses a lot of assembler comment on how many bytes a certain operation is. Why spout such crazy giberish? Well because you can actually measure how much space a given function will take up in bytes. Since high level languages remove you from the machine code by quite a bit, it would be hard to say "This function is only 48 bytes in size and runs in 26 clock cycles!"

But like you suggested, variables and globals are also stored in memory. If you are more interested in this topic you could read some of masm's documentation. An even basic grasp of assembler makes you understand what it is that makes your executables tick.

**Thantos** · 12-19-2004

Um master not everything is stored on the stack. Automatic local variables, parameters, and push & pop are stored on the stack. Static locals, globals, and dynamically allocated memory is stored in the heap. Instructions can be stored in their own area also.

Function size comes into play when you are dealing with limited resources or when you are trying to optimize a program and you want to make sure that an entire function that is called often is kept in cache for faster execution.

**~~master5001~~** · 12-19-2004

Here is where I'm confused. I was under the impression that ebp points to code blocks stores the base for every stack to maintain code execution

**bithub** · 12-19-2004

That's correct that ebp points to the base of the current stack frame, but Thantos is correct in that everything is not stored on the stack. Anytime you use a global or static variable, or anytime you use malloc(), you are using the heap - not the stack.

**~~master5001~~** · 12-19-2004

Right I understand memory heaps. Just not the magic if where functions go. I know commands are placed into the cache upon execution, but i'm not clear on where blocks of code are stored leading up to execution.

**bithub** · 12-19-2004

The code is stored in the .code segment of the processes memory. This section of memory is read and execute only. It is not part of the stack or heap space.

**~~master5001~~** · 12-19-2004

Ah ok I wasn't completely sure on that one. Thanks.

Thread: function-call overhead... real? or not?

Thread Tools

Search Thread

Display

function-call overhead... real? or not?

Similar Threads

Undefined Reference Compiling Error

temperature sensors

Screwy Linker Error - VC2005

how to get function call stack

Game Pointer Trouble?