stack vs. global vars : efficiency?

**mynickmynick** · 07-31-2009

Considering vars, structs or arrays of medium size which allows to allocate them on the stack (for instance as function local vars) the question is:

If the program performs massive computaions using those vars is the program more fast if those vars are on the stack or if they are declared outside of the function as static? And also in this case may it's better to renounce to program simplicity using longer names so that they don't need to be passed by reference?

That is better example 1 or 2 or 3?

Ex.1:

Code:

static    float EffectiveUniqueInput[100];

MassiveComputation(float EffectiveUniqueInput[100])
{
    float a[100];
    float L[100];
    float l[100][100];


    for(..)
   {
       for(..)
       {
            l[i][j]=L[j]*cos(a[i])+L[j]*sin(a[i]);
        }
    }

}

main()
{

..
MassiveComputation(EffectiveUniqueInput);
..

}

Ex.2:

Code:

static    float EffectiveUniqueInput[100];

static    float MassiveComputation_a[100];
static    float MassiveComputation_L[100];
static    float MassiveComputation_l[100][100];

MassiveComputation(float EffectiveUniqueInput[100],float a[100], float L[100], float l[100][100])
{

    for(..)
   {
       for(..)
       {
            l[i][j]=L[j]*cos(a[i])+L[j]*sin(a[i]);
        }
    }

}


main()
{

..
MassiveComputation(EffectiveUniqueInput, fMassiveComputation_a, MassiveComputation_L, MassiveComputation_l);
..

}

Ex.3:

Code:

static    float EffectiveUniqueInput[100];

static    float MassiveComputation_a[100];
static    float MassiveComputation_L[100];
static    float MassiveComputation_l[100][100];

MassiveComputation(float EffectiveUniqueInput[100])
{

    for(..)
   {
       for(..)
       {
            MassiveComputation_l[i][j]=MassiveComputation_L[j]*cos(MassiveComputation_a[i])+MassiveComputation_L[j]*sin(MassiveComputation_a[i]);
        }
    }

}


main()
{

..
MassiveComputation(EffectiveUniqueInput);
..

}

**tabstop** · 07-31-2009

All we can do is talk about it. You can sit there and try the three different things and see which one does better.

That said, I would be surprised to find that there's much of a difference between the three -- memory is memory and scope resolution doesn't really affect that at all. And arrays are passed by pointer automatically, and I would guess the overhead of three pointers getting copied are going to be drowned by calling sin 10000 times. I would expect that if you have things organized so that cache misses are minimized, then you keep the arrays anywhere you want.

**Adak** · 07-31-2009

Think about your locality of reference. The analogy I'm used to is this:

You want a glass of milk. You can get it from:

1) Your refrigerator : L1 cache
2) Your friendly neighbor : L2 cache
3) The market a block away : L3 cache
4) The big super market a mile away :RAM
5) The big dairy on the outskirts of town : The HD

The trick is to keep as much data as you can, as near to #1 as possible.

Of course, if you're crashing your program because it runs out of stack space, the answer is obvious. Otherwise, any difference is insignificant, compared to

1) Algorithmic efficiency - you'll never get bubble sort to work well on 1,000,000 floats.

and

2) Cache efficiency

**mynickmynick** · 07-31-2009

Originally Posted by tabstop

All we can do is talk about it. You can sit there and try the three different things and see which one does better.

That said, I would be surprised to find that there's much of a difference between the three -- memory is memory and scope resolution doesn't really affect that at all. And arrays are passed by pointer automatically, and I would guess the overhead of three pointers getting copied are going to be drowned by calling sin 10000 times. I would expect that if you have things organized so that cache misses are minimized, then you keep the arrays anywhere you want.

of course cache miss is important but as you say independent from this problem
and in this case the arrays are just used from begin till end

my question is : which of the 3 examples have shorter/longer memory access times?

For instance may be example 3 is faster than ex.2 because the address of array elements
in ex.3 is fixed(cpmpile+linker time)address+i*rowsize+j
while in ex2 is address(maybe stored in a register)+i*rowsize+j

I guess that makes negligible difference but i should try it

And waht about ex1 and 2 compared?

**mynickmynick** · 07-31-2009

Originally Posted by Adak

Think about your locality of reference. The analogy I'm used to is this:

You want a glass of milk. You can get it from:

1) Your refrigerator : L1 cache
2) Your friendly neighbor : L2 cache
3) The market a block away : L3 cache
4) The big super market a mile away :RAM
5) The big dairy on the outskirts of town : The HD

The trick is to keep as much data as you can, as near to #1 as possible.

Of course, if you're crashing your program because it runs out of stack space, the answer is obvious. Otherwise, any difference is insignificant, compared to

1) Algorithmic efficiency - you'll never get bubble sort to work well on 1,000,000 floats.

and

2) Cache efficiency

ok i agree with most of what you say but i still don't get: are you telling me stack data is faster than static data outside the function??

**laserlight** · 07-31-2009

Originally Posted by mynickmynick

my question is : which of the 3 examples have shorter/longer memory access times?

You are asking the wrong question. The first question that you should ask is: is there a need to optimise the code even if it affects readability and increases the cost of maintenance? To answer this, you need to compare the performance goals with the measured performance of the code. (Which in turn means that you need to set performance objectives and then perform at least some simple timing tests.)

If the answer to the first question is yes, then the next question to ask is: where are the bottle necks in my code that is the "low hanging fruit" to be optimised? To determine this you would likely need to profile your code.

**Adak** · 07-31-2009

I'm telling you that it makes no difference, compared to algorithmic efficiency and locality of reference.

**mynickmynick** · 07-31-2009

Originally Posted by laserlight

You are asking the wrong question. The first question that you should ask is: is there a need to optimise the code even if it affects readability and increases the cost of maintenance? To answer this, you need to compare the performance goals with the measured performance of the code. (Which in turn means that you need to set performance objectives and then perform at least some simple timing tests.)

If the answer to the first question is yes, then the next question to ask is: where are the bottle necks in my code that is the "low hanging fruit" to be optimised? To determine this you would likely need to profile your code.

the performance objectives are : "as fast as you manage"
"profile your code" : what you mean?

I have already gone from th 200 msec first version to the 16 msec actual version of a severe image gradient patter matching routine, making several optimizations and tests

But i wanted now to make some theoric investigations, instead of proceeding by attempt-in.practice-time-measures

**laserlight** · 07-31-2009

Originally Posted by mynickmynick

the performance objectives are : "as fast as you manage"

As in this is for a competition?

Originally Posted by mynickmynick

"profile your code" : what you mean?

To analyse the various parts of the code to see how much time they tend to take to execute, both in terms of raw numbers and as a proportion of the total execution time. Along these lines, you would find out which functions are most often called, and thus can benefit most from optimisation. There exist profiling tools like gprof that you might be able to use.

**mynickmynick** · 07-31-2009

Originally Posted by laserlight

As in this is for a competition?

hahahahahaha
yes a competition with myself

The fact is that the faster i go the more I can increase the resolution, so the precision of the detections i am performing

I have already profiled a lot these routines

i was now investigating a priori some aspects

I guess another thing i could do to speed up would be substituing some parameters I am passing to the routines with precompiled defines (once i have found a stable value for them)

**Elysia** · 07-31-2009

An interesting this to ponder on with global variables, just for thought:
If variables are global, it is possible the compiler won't optimize them (ie cache their result in a register) because it doesn't know when/if that information is going to be used by another function!

Don't rely on that, though. Use a profiler to find the real bottlenecks.

**brewbuck** · 07-31-2009

From a code generation standpoint, the difference between access to a static vs. access to a stack variable is whether the address mode is absolute or relative. For instance, on x86, access to a global variable called X might look like:

Code:

mov eax, [X]

Whereas access to a local variable is done via offsetting from the base pointer:

Code:

mov eax, [ebp + 16]

On x86, these two address modes have the same speed. This may not be the case on other architectures. For instance, on an architecture that does not have register-relative addressing, you would have something like:

Code:

mov ebx, ebp
add ebx, 16
mov eax, [ebx]

And that may in fact be slower than access to a static.

As always, the answer to the question is platform-specific. The choice to change a stack variable to a global is a very serious one with a number of architectural consequences and shouldn't be undertaken lightly.

**tabstop** · 07-31-2009

Originally Posted by mynickmynick

of course cache miss is important but as you say independent from this problem

I think you've got what I'm saying backwards -- I'm saying caching is all that matters, and whether the arrays are passed in, static, or located on the moon is independent of your problem.

Thread: stack vs. global vars : efficiency?

Thread Tools

Search Thread

Display

stack vs. global vars : efficiency?

Similar Threads

stack and pointer problem

Question about a stack using array of pointers

error trying to compile stack program

What am I doing wrong, stack?

Stack Program Here