Thread: Is there a logical way of figuring out where variables are stored?

  1. #1
    Registered User
    Join Date
    Sep 2008
    Posts
    47

    Is there a logical way of figuring out where variables are stored?

    Hi guys.

    I'm trying to figure out the following for each variable In a program:

    1. the runtime stack, including the stack base and the direction of stack growth;
    2. the dynamic data area (the heap);
    3. the static data area, including the initialized and uninitialized data segments (objects with fixed addresses);
    4. the text area (the program instructions).


    based on the address I get for a UNIX system.

    I realize for every different OS it will have a different addressing scheme.

    Here is an example, from the following program:
    Code:
    #include <stdio.h>      /* for printf() */
    #include <stdlib.h>     /* for malloc(), putenv(), setenv() */
    #include <string.h>     /* for strlen() */
    #include <math.h>       /* for sqrt() */
    
    #include "pr4.h"        /* for macros show...() */
    
    /*----------------------------------------------------------------------*/
    
    /* These should be at various locations in the address space. */
    
    int global_var_1, global_var_2;
    
    // etc.
    
    /*----------------------------------------------------------------------*/
    
    int main(int argc, char *argv[], char *envp[])
    {
      /* local and global variables */
    
      int local_main_var_1, local_main_var_2;
    
      // etc.
    
      show_int(global_var_1);
      show_int(global_var_2);
    
      show_int(local_main_var_1);
      show_int(local_main_var_2);
    
      // etc.
    
      return 0;
    }
    It will output the following:
    Code:
    
    demo.sun.32.sorted
    00000000ffbffa70 ( 4)         local_main_var_1          0
    00000000ffbffa6c ( 4)         local_main_var_2          -4195516
    0000000000020f84 ( 4)             global_var_2          0
    0000000000020f80 ( 4)             global_var_1          0
    
    
    demo.sun.64.sorted
    ffffffff7ffff930 ( 4)         local_main_var_1          0
    ffffffff7ffff92c ( 4)         local_main_var_2          2147482328
    0000000100100e68 ( 4)             global_var_2          0
    0000000100100e64 ( 4)             global_var_1          0

    Now this is the output from a 64 bit UNIX system and a 32 bit unix system.

    As you can see, the local variables that are stored on the stack, both start with:
    ffffffff7ffff on the 64 bit system
    and on the 32 bit system all the local variables start with a:
    ffbffa

    I also ran this on a different computer (32 bit) and got the following for local variables:
    Code:
    00000000ffbff9c0 ( 4)         local_main_var_1          0
    00000000ffbff9bc ( 4)         local_main_var_2          -4195680
    So you can see the local variables are starting with ff

    And all the global varaibles are starting with numbers, such as 10 or 20

    Can I simply match the address to a certain pattern, like if the address starts with a letter "f" then mark that the address is in the stack area, if the address starts with a number (0 through 9) then mark that varaible that it must be in the static area (for global).

    It is possible to figure out, I'm just not sure where to even start with this...here is basically what I'm trying to achieve:

    Code:
       
       Runtime stack, bottom (stack grows downward)
       ffffffff7ffff930 ( 4)         local_main_var_1          0
        ffffffff7ffff92c ( 4)         local_main_var_2          2147482328
        Runtime stack, top
     
          Static data area
          0000000100100e68 ( 4)             global_var_2          0
          0000000100100e64 ( 4)             global_var_1          0
    You can see, its figuring out that global variables, are in the static area, and that the local variables are in the runtime stack, and that also the stack is growing downwards.

    So just from the given information what would you recommend I analyze?
    Again this is only for a Solaris Unix system, nothing else matters.

    Thanks any help would be great.
    Last edited by mr_coffee; 09-28-2008 at 09:21 PM.

  2. #2
    Registered User
    Join Date
    Sep 2008
    Posts
    47
    I've been thinking and If do a ton of declarations like

    //should be in the stack
    int x,y,w,......


    //should be in the heap
    char *p = malloc(10*sizeof(int));
    char *x = malloc(100*sizeof(int));
    ...


    so all these guys will return an address, and I would simply find the min and max of the addresses

    and if the variable falls into a min or max then it must be in that area of data, stack or heap


    Does this sound good?

  3. #3
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Yes, take a look at the assembly generated by the compiler.

    For most archs, depends on what scope you declare them...

    Code:
    char * p = malloc(100 * sizeof(int));
    p would be on the stack (the pointer itself) pointing to memory on the heap.
    A general "guide" for most archs:
    * global variables in the data segment
    * local variables, function arguments etc on the stack
    * dynamic memory on the heap

  4. #4
    Registered User
    Join Date
    Sep 2008
    Posts
    47
    Thanks for the response, well what I am trying to do is compare address to get a "range" that I can work with.

    An example:
    Code:
     1 #include <stdio.h>
      2 
      3 
      4  
      5 int globalVar;
      6 
      7 int main()
      8 {
      9   
     10   char *p = "hello";
     11   
     12   int x, y, z;
     13   
     14   unsigned int pValue = (unsigned int) &p;
     15   
     16   unsigned int xValue = (unsigned int) &x;
     17   unsigned int yValue = (unsigned int) &y;
     18   unsigned int zValue = (unsigned int) &z;
     19   
     20   unsigned int globalValue = (unsigned int) &globalVar;
     21   
     22   printf("P's address value %u\n",pValue);
     23   
     24   printf("x's address value %u\n",xValue);
     25   printf("y's address value %u\n",yValue);
     26   printf("z's address value %u\n",zValue);
     27   printf("globalVar's address value %u\n",globalValue);
     28   
     29   
     30   
     31   
     32   
     33   
     34   return 0;
     35 }
    This will print out the following:
    P's address value 4290771400
    x's address value 4290771396
    y's address value 4290771392
    z's address value 4290771388
    globalVar's address value 134752


    So I can clearly see, that The variables in the "stack" are p,x,y,z
    and the global data is in the static data area, which is in a lower range of numbers 134752.

    I can also see that this stack is growing from top to bottom right? Meaning the first varaible declared is P, and that variable is given the number 4290771400
    Next x is declared that variable is given: 4290771396
    1400 is greater than 1396, so the direction of stack growth is down, meaning the higher the address, the higher it is on the stack.

    Is there a way to make these comparisions without actually converting the address to an unsigned int? like is there a way to get ahold of the address and compare it with other ones without the conversion?

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    There is really no guarantees (in a portable way) to determine, based on the address, where the variable belongs. It so happens that Linux uses an address near 3GB for the stack [and thus, I expect you are using Linux for this experiment]. Windows has the stack much closer to zero. And of course, if you use multiple threads, these threads may have their stack at some completely arbitrary address...

    For any given system, you could do something like this (this code assumes a single threaded process - if it is multithreaded, you'd have to keep track of the stackbase for each thread):
    Code:
    // ulong_ptr is a type that is defined to be unsigned integral type that matches the size of
    // a pointer in this architecture. It is up to the implementor to do this appropriately. 
    ulong_ptr stackBase;
    
    int isStack(void *ptr)
    {
       ulong_ptr aptr = (ulong_ptr)ptr;
       ulong_ptr biggestStack = (ulong_ptr)&aptr;
       
        return (aptr > biggestStack && aPtr <= stackBase);
    }
    
    int main(int argc, char **argv)
    {
        ...
        stackBase = (ulong_ptr)&argc;
    }
    However, it gets much harder to determine (in even a semi-portable way) to determine if a non-stack variable is a global variable or a heap variable - there may be some ways to do that in your environment.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    Sep 2008
    Posts
    47
    Thanks for the responce.

    Luckily this only has to work on a UNIX Solaris system, the only different is, it will be tested on a 32 bit system and 64 system.

    But it will always be on a Unix Solaris system.

    So if thats the case, would my method work?
    My professor mentioned to use min and max math functions, and I'm assuming he ment to apply those to the addresses.

    So I think I'm on the right track if the problem is slimmed down to a Unix Solaris ssytem, and not being portable.

    What do you think? If it doesn't have to be portable is this plausible?


    Basically what I'll be doing is defining alot of variables in the program, and recording their addresses and converting them to unsigned ints, and then make a range of local/global/dynamic etc based on that.


    Honestly I can't think of any other way to do this, the professor said this isn't a challenging assignment but I see where your getting at with how this wouldn't work with windows/linux.


    I have a diagram in a Unix book, that shows:

    High Address
    Command Line arguments and environment variables are at the very top
    Stack is growing downwards while the heap is growing upwards
    uninitialized data (bbc)
    initialized data
    text
    Low Address

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Memory maps like that are describing the world in a simplified manner, just like in the first few years of studying physics, a newtonian view of physics will be fine, but once you get "deep enough" into physics, using Newtons principles doesn't give you completely accurate results, and you need to use the Einstein's principles to get the accurate results.

    max/min can be used to determine if something is within a range. The trouble comes with figuring out the max and min valid address for each section of memory. The limit between global variables and the heap would be the most difficult one - the stack is quite easy [at least for a single thread situation].

    The OTHER solution is of course to use the debug information from the compiler, but my guess is that:
    1. You are not supposed to do that.
    2. It would be a WHOLE LOT more difficult to implement without using some existing code that essentially solves the problem for you.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Registered User
    Join Date
    Sep 2008
    Posts
    47
    Mats,

    Yah he doesn't want us to use other programs to help us.

    Well it will probably be wrong but I'm going to go with it.

    I tested it with:
    argcValue 4290771476
    argvValue 4290771480
    envpValue 4290771484


    and just like that diagram showed, the environment variables, and command line arguments come before everything, and will have the HIGHER addresses which they do, they have the highest out of my other current variables. But i understand what your saying about the diagram being so simple. I realize this is a much more complex issue and I'm really cutting corners.

    P's address value 4290771400
    x's address value 4290771396
    y's address value 4290771392
    z's address value 4290771388
    globalVar's address value 134752
    So basically I'm going to wing it.

    By the way, why is it most easiest for the stack? I would think that would be hard because the stack is growing downward, and the heap is growing upward, so the heap and stack are coming into each other.

    THanks for the info!

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    What you are showing is actually the decimal address of the location within the stack that those values are stored - I know that because the values are within 4 bytes of each other, and I'm sure that argc, argv and envp aren't just 4 bytes each of them (of course, argc is 4 bytes, but the others are variable length and variable size arrays).

    The stack is easy because in any given function, you can take a local varaible, and that will be the currently lowest stack address - since the stack grows towards zero and each time you call a function, the stack grows a bit, and shrinks when that function returns. So at any given time, the lowest part of the stack is known. You can also know the highest part of the stack by taking the first argument to main (ok, so the stack is probably not starting just there, because there is code run before main, but you probably won't need to worry about that either).

    The only exception to the above would be if you have a function using invalid (not currently in use) stack locations for some reason [or a variable that is on the stack, but BEFORE the first location on the stack].

    The heap and the stack will never overlap.

    If we know the highest global variable address (and that is relatively easy to determine in a system with only one source file - it gets more complex if there are multiple source files - something I expect your professor doesn't actually want you to solve), and where the stack is, and we further say that all pointers point to one of stack, heap or global, then we could use the process of elimination.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    Registered User
    Join Date
    Sep 2008
    Posts
    47
    Ah I see what your saying!

    Can I use a different data type, like perhaps ulong_ptr?

    can printf print out a ulong_ptr value?

  11. #11
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by mr_coffee View Post
    Ah I see what your saying!

    Can I use a different data type, like perhaps ulong_ptr?

    can printf print out a ulong_ptr value?
    ulong_ptr isn't a standard type - it is meant to be defined by your implementation to something that works in your environment. My guess would be "unsigned long" will work fine, but by putting a typedef it makes it easy to change it if you need, rather than trying to find which unsigned long are relating to your pointer addresses, and which unsigned long are other things.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  12. #12
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    printf() can print out whatever you tell it to. The trick is knowing how you need to view your data. If you want to consider a ulong_ptr as a pointer type, perhaps &#37;p would be your best printf() flag bet. Or if you are using ulong_ptr as an unsigned long, you may be better off with %u.

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by master5001 View Post
    printf() can print out whatever you tell it to. The trick is knowing how you need to view your data. If you want to consider a ulong_ptr as a pointer type, perhaps %p would be your best printf() flag bet. Or if you are using ulong_ptr as an unsigned long, you may be better off with %u.
    Or perhaps %lu if it's in fact a long.

    (Yes, I know, many people are not used to int and long being different size, but for a 64-bit build on Solaris, I expect that int is 32-bit and long is 64-bit, so there would be a difference).

    A "clever trick" that can be used here is the "use a macro to define how to print something":
    Code:
    #if Solaris64
    typedef unsigned long ulong_ptr
    #define PR_ULONG_PTR "%lu"
    #else
    typedef unsigned ulong_ptr
    #define PR_ULONG_PTR "%u"
    #endif
    ...
       ulong_ptr x; 
       ... 
       // calculate x
       ...
       printf("The value of x is " PR_ULONG_PTR "\n", x);
    Of course, I don't think this is necessary, since I'm pretty sure ulong_ptr can be defined as unsigned long everywhere, and "%lu" is a perfectly fine format everywhere.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Registered User
    Join Date
    Sep 2008
    Posts
    47
    Thanks guys,

    yah I'll just used an unsigned long, seems to be working good.

  15. #15
    Registered User
    Join Date
    Sep 2008
    Posts
    47
    Hm...somthing strange i happening when I"m trying to get the value of the address in the heap. It seems to be returning a value that looks like its shared in the Static data area (with like statics and global variables).

    Take a gander:
    > //FULL PROGRAM OUTPUT
    > PRINTING OUT LOCAL VARIABLE ADDRESSES
    > P's address value 4290771400
    > x's address value 4290771396
    > y's address value 4290771392
    > z's address value 4290771388
    >
    > PRINTING OUT COMMAND LINE ARGUMENTS AND ENVIRONMENT VAR
    > argcValue 4290771476
    > argvValue 4290771480
    > envpValue 4290771484
    >
    > PRINTING OUT GLOBAL VARIABLE ADDRESSES
    > globalVar's address value 135540
    >
    > PRINTING OUT STATIC VARIABLES
    > staticValue 135520
    >
    > PRINTING OUT MALLOC addresses &Array[0]
    > pMallocValue 135560
    > pMalloc2Value 137096
    >
    > PRINTING OUT ADDRESSES OF FUNCTIONS &ADD
    > addFunctionValue 68632
    >



    actually c code is here:
    Code:
    #include <stdio.h>
    
    
     
    int globalVar;
    static int MAX = 10;
    
    
    int add(int x, int y)
    {
      return x + y;
    }
    
    int main(int argc, char *argv[], char *envp[])
    {
      
      
      char *p = "hello";
     
      int x, y, z;
    
    
      char *pMalloc = malloc(sizeof(char) * 10);
      int  *pMalloc2 = malloc(sizeof(int) *100);
    
    
    
     
      //getting address value from the function itself
      unsigned long addFunctionValue = (unsigned long)&add;
    
      //getting address value from static varaibles
      unsigned long staticValue = (unsigned long)&MAX;
    
    
    
      //getting address values from the "heap"
      unsigned long pMallocValue = (unsigned long) &pMalloc[0];
      unsigned long pMalloc2Value = (unsigned long) &pMalloc2[0];  
    
      
      
      //getting addresses values from the "stack"
      unsigned long pValue = (unsigned long) &p;
    
      unsigned long xValue = (unsigned long) &x;
      unsigned long yValue = (unsigned long) &y;
      unsigned long zValue = (unsigned long) &z;
    
    
      //getting address values from global variables.
      unsigned long globalValue = (unsigned long) &globalVar;
    
    
      //getting address values from the command line and environment var arg
      unsigned long argcValue = (unsigned long) &argc;
      unsigned long argvValue = (unsigned long) &argv;
      unsigned long envpValue = (unsigned long) &envp;
    
    
    
      //printing information
      printf("PRINTING OUT LOCAL VARIABLE ADDRESSES\n");
      printf("P's address value %lu\n",pValue);
      printf("x's address value %lu\n",xValue);
      printf("y's address value %lu\n",yValue);
      printf("z's address value %lu\n",zValue);
      
      printf("\n\nPRINTING OUT GLOBAL VARIABLE ADDRESSES\n"); 
      printf("globalVar's address value %lu\n",globalValue);
    
      printf("\n\nPRINTING OUT COMMAND LINE ARGUMENTS AND ENVIRONMENT VAR\n");
      printf("argcValue %lu\n",argcValue);
      printf("argvValue %lu\n",argvValue);
      printf("envpValue %lu\n",envpValue);
    
    
      printf("\n\nPRINTING OUT MALLOC addresses &Array[0]\n");
      printf("pMallocValue %lu\n", pMallocValue);
      printf("pMalloc2Value %lu\n",pMalloc2Value);
    
    
      printf("\n\nPRINTING OUT ADDRESSES OF FUNCTIONS &ADD\n");
      printf("addFunctionValue %lu\n", addFunctionValue);
     
      printf("\n\nPRINTING OUT STATIC VARIABLES\n");
      printf("staticValue %lu\n", staticValue);
    
    
    
    
      return 0;
    }

    As you can see, the values are similar for the
    > PRINTING OUT GLOBAL VARIABLE ADDRESSES
    > globalVar's address value 135540
    >
    > PRINTING OUT STATIC VARIABLES
    > staticValue 135520
    >
    > PRINTING OUT MALLOC addresses &Array[0]
    > pMallocValue 135560
    > pMalloc2Value 137096


    But that shouldn't be the case should it? Am i not retrieving the malloc right?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. basic question about global variables
    By radeberger in forum C++ Programming
    Replies: 0
    Last Post: 04-06-2009, 12:54 AM
  2. Best way to avoid using global variables
    By Canadian0469 in forum C++ Programming
    Replies: 7
    Last Post: 12-18-2008, 12:02 PM
  3. Father and Son Variables
    By khdani in forum Linux Programming
    Replies: 3
    Last Post: 11-28-2008, 06:42 PM
  4. Classes, dynamic variables, and leaks.
    By Grins2Pain in forum C++ Programming
    Replies: 8
    Last Post: 09-26-2003, 03:07 PM
  5. static variables
    By Luigi in forum C++ Programming
    Replies: 4
    Last Post: 04-24-2003, 07:13 PM