Thread: stack vs. global vars : efficiency?

  1. #1
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251

    stack vs. global vars : efficiency?

    Considering vars, structs or arrays of medium size which allows to allocate them on the stack (for instance as function local vars) the question is:

    If the program performs massive computaions using those vars is the program more fast if those vars are on the stack or if they are declared outside of the function as static? And also in this case may it's better to renounce to program simplicity using longer names so that they don't need to be passed by reference?

    That is better example 1 or 2 or 3?

    Ex.1:
    Code:
    static    float EffectiveUniqueInput[100];
    
    MassiveComputation(float EffectiveUniqueInput[100])
    {
        float a[100];
        float L[100];
        float l[100][100];
    
    
        for(..)
       {
           for(..)
           {
                l[i][j]=L[j]*cos(a[i])+L[j]*sin(a[i]);
            }
        }
    
    }
    
    main()
    {
    
    ..
    MassiveComputation(EffectiveUniqueInput);
    ..
    
    }

    Ex.2:
    Code:
    static    float EffectiveUniqueInput[100];
    
    static    float MassiveComputation_a[100];
    static    float MassiveComputation_L[100];
    static    float MassiveComputation_l[100][100];
    
    MassiveComputation(float EffectiveUniqueInput[100],float a[100], float L[100], float l[100][100])
    {
    
        for(..)
       {
           for(..)
           {
                l[i][j]=L[j]*cos(a[i])+L[j]*sin(a[i]);
            }
        }
    
    }
    
    
    main()
    {
    
    ..
    MassiveComputation(EffectiveUniqueInput, fMassiveComputation_a, MassiveComputation_L, MassiveComputation_l);
    ..
    
    }

    Ex.3:
    Code:
    static    float EffectiveUniqueInput[100];
    
    static    float MassiveComputation_a[100];
    static    float MassiveComputation_L[100];
    static    float MassiveComputation_l[100][100];
    
    MassiveComputation(float EffectiveUniqueInput[100])
    {
    
        for(..)
       {
           for(..)
           {
                MassiveComputation_l[i][j]=MassiveComputation_L[j]*cos(MassiveComputation_a[i])+MassiveComputation_L[j]*sin(MassiveComputation_a[i]);
            }
        }
    
    }
    
    
    main()
    {
    
    ..
    MassiveComputation(EffectiveUniqueInput);
    ..
    
    }

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    All we can do is talk about it. You can sit there and try the three different things and see which one does better.

    That said, I would be surprised to find that there's much of a difference between the three -- memory is memory and scope resolution doesn't really affect that at all. And arrays are passed by pointer automatically, and I would guess the overhead of three pointers getting copied are going to be drowned by calling sin 10000 times. I would expect that if you have things organized so that cache misses are minimized, then you keep the arrays anywhere you want.

  3. #3
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Think about your locality of reference. The analogy I'm used to is this:

    You want a glass of milk. You can get it from:

    1) Your refrigerator : L1 cache
    2) Your friendly neighbor : L2 cache
    3) The market a block away : L3 cache
    4) The big super market a mile away :RAM
    5) The big dairy on the outskirts of town : The HD

    The trick is to keep as much data as you can, as near to #1 as possible.

    Of course, if you're crashing your program because it runs out of stack space, the answer is obvious. Otherwise, any difference is insignificant, compared to

    1) Algorithmic efficiency - you'll never get bubble sort to work well on 1,000,000 floats.

    and

    2) Cache efficiency

  4. #4
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    Quote Originally Posted by tabstop View Post
    All we can do is talk about it. You can sit there and try the three different things and see which one does better.

    That said, I would be surprised to find that there's much of a difference between the three -- memory is memory and scope resolution doesn't really affect that at all. And arrays are passed by pointer automatically, and I would guess the overhead of three pointers getting copied are going to be drowned by calling sin 10000 times. I would expect that if you have things organized so that cache misses are minimized, then you keep the arrays anywhere you want.
    of course cache miss is important but as you say independent from this problem
    and in this case the arrays are just used from begin till end

    my question is : which of the 3 examples have shorter/longer memory access times?

    For instance may be example 3 is faster than ex.2 because the address of array elements
    in ex.3 is fixed(cpmpile+linker time)address+i*rowsize+j
    while in ex2 is address(maybe stored in a register)+i*rowsize+j

    I guess that makes negligible difference but i should try it

    And waht about ex1 and 2 compared?

  5. #5
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    Quote Originally Posted by Adak View Post
    Think about your locality of reference. The analogy I'm used to is this:

    You want a glass of milk. You can get it from:

    1) Your refrigerator : L1 cache
    2) Your friendly neighbor : L2 cache
    3) The market a block away : L3 cache
    4) The big super market a mile away :RAM
    5) The big dairy on the outskirts of town : The HD

    The trick is to keep as much data as you can, as near to #1 as possible.

    Of course, if you're crashing your program because it runs out of stack space, the answer is obvious. Otherwise, any difference is insignificant, compared to

    1) Algorithmic efficiency - you'll never get bubble sort to work well on 1,000,000 floats.

    and

    2) Cache efficiency

    ok i agree with most of what you say but i still don't get: are you telling me stack data is faster than static data outside the function??

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by mynickmynick
    my question is : which of the 3 examples have shorter/longer memory access times?
    You are asking the wrong question. The first question that you should ask is: is there a need to optimise the code even if it affects readability and increases the cost of maintenance? To answer this, you need to compare the performance goals with the measured performance of the code. (Which in turn means that you need to set performance objectives and then perform at least some simple timing tests.)

    If the answer to the first question is yes, then the next question to ask is: where are the bottle necks in my code that is the "low hanging fruit" to be optimised? To determine this you would likely need to profile your code.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I'm telling you that it makes no difference, compared to algorithmic efficiency and locality of reference.

  8. #8
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    Quote Originally Posted by laserlight View Post
    You are asking the wrong question. The first question that you should ask is: is there a need to optimise the code even if it affects readability and increases the cost of maintenance? To answer this, you need to compare the performance goals with the measured performance of the code. (Which in turn means that you need to set performance objectives and then perform at least some simple timing tests.)

    If the answer to the first question is yes, then the next question to ask is: where are the bottle necks in my code that is the "low hanging fruit" to be optimised? To determine this you would likely need to profile your code.
    the performance objectives are : "as fast as you manage"
    "profile your code" : what you mean?

    I have already gone from th 200 msec first version to the 16 msec actual version of a severe image gradient patter matching routine, making several optimizations and tests

    But i wanted now to make some theoric investigations, instead of proceeding by attempt-in.practice-time-measures

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by mynickmynick
    the performance objectives are : "as fast as you manage"
    As in this is for a competition?

    Quote Originally Posted by mynickmynick
    "profile your code" : what you mean?
    To analyse the various parts of the code to see how much time they tend to take to execute, both in terms of raw numbers and as a proportion of the total execution time. Along these lines, you would find out which functions are most often called, and thus can benefit most from optimisation. There exist profiling tools like gprof that you might be able to use.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    Quote Originally Posted by laserlight View Post
    As in this is for a competition?

    hahahahahaha
    yes a competition with myself

    The fact is that the faster i go the more I can increase the resolution, so the precision of the detections i am performing

    I have already profiled a lot these routines

    i was now investigating a priori some aspects

    I guess another thing i could do to speed up would be substituing some parameters I am passing to the routines with precompiled defines (once i have found a stable value for them)

  11. #11
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    An interesting this to ponder on with global variables, just for thought:
    If variables are global, it is possible the compiler won't optimize them (ie cache their result in a register) because it doesn't know when/if that information is going to be used by another function!

    Don't rely on that, though. Use a profiler to find the real bottlenecks.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  12. #12
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    From a code generation standpoint, the difference between access to a static vs. access to a stack variable is whether the address mode is absolute or relative. For instance, on x86, access to a global variable called X might look like:

    Code:
    mov eax, [X]
    Whereas access to a local variable is done via offsetting from the base pointer:

    Code:
    mov eax, [ebp + 16]
    On x86, these two address modes have the same speed. This may not be the case on other architectures. For instance, on an architecture that does not have register-relative addressing, you would have something like:

    Code:
    mov ebx, ebp
    add ebx, 16
    mov eax, [ebx]
    And that may in fact be slower than access to a static.

    As always, the answer to the question is platform-specific. The choice to change a stack variable to a global is a very serious one with a number of architectural consequences and shouldn't be undertaken lightly.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  13. #13
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by mynickmynick View Post
    of course cache miss is important but as you say independent from this problem
    I think you've got what I'm saying backwards -- I'm saying caching is all that matters, and whether the arrays are passed in, static, or located on the moon is independent of your problem.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. stack and pointer problem
    By ramaadhitia in forum C Programming
    Replies: 2
    Last Post: 09-11-2006, 11:41 PM
  2. Question about a stack using array of pointers
    By Ricochet in forum C++ Programming
    Replies: 6
    Last Post: 11-17-2003, 10:12 PM
  3. error trying to compile stack program
    By KristTlove in forum C++ Programming
    Replies: 2
    Last Post: 11-03-2003, 06:27 PM
  4. What am I doing wrong, stack?
    By TeenyTig in forum C Programming
    Replies: 2
    Last Post: 05-27-2002, 02:12 PM
  5. Stack Program Here
    By Troll_King in forum C Programming
    Replies: 7
    Last Post: 10-15-2001, 05:36 PM