Thread: Pointers and memory layout of a program on Linux system.

  1. #1
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357

    Pointers and memory layout of a program on Linux system.

    Assuming that we have the code

    Code:
    int *p
    
    *p = 2;
    That is wrong and dangerous of course because pointer is uninitialized and it has no a known address.

    Some books they write that p will modify the code data (program code) if p happens to have an address of program code of course.

    I think this is not true on the linux because of

    the memory layout of processes....

    assuming this scheme ->

    Linux Processes

    the TEXT segment which has the code of the program is ROM . Hence we can't modify its contents even if p points to this area.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    The fact that Linux makes .text read-only is all there is to it.
    The actual arrangement of code/data/stack doesn't come into it.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    Quote Originally Posted by Salem View Post
    The fact that Linux makes .text read-only is all there is to it.
    The actual arrangement of code/data/stack doesn't come into it.
    What do you mean by "is all there is to it?" I have a little problem in translation....
    Could you explain with more simple words ? Is there any wrong with my reflection ?

  4. #4
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    The reference to .text (as well as .data, .rodata and so on) come from format used for native binary executables in Linux, the ELF format.

    Salem means that in Linux, program code is read into memory that is marked read-only and executable, on all processors that support it. So, something like
    Code:
    int my_function(const int a, const int b)
    {
        return a + b;
    }
    
    /* Zero out first ten bytes of my_function */
    memset(&my_function, 0, 10);
    may work on some other platforms, but in Linux it will fail at run time with a Segmentation fault -- which basically means that the program tried to write to memory it is not allowed to write to (or read or execute memory it's not allowed to), the hardware noticed it, and the kernel decided to kill the process. (There are some embedded platforms that don't have virtual memory or a memory management where that will actually succeed; they just cannot protect against code modifications.)

    Ah, forgot about the original question.

    The memory layout does not matter at all, because the process simply cannot modify the memory containing code. In fact, current Linux kernels employ a randomization scheme, to make security exploits harder to write as memory addresses change from one execution to the next. This means that there is no specific layout.

    It also means that catching bugs where you have a pointer with fixed, wrong value, becomes more interesting: it may only fail some of the time. Every now and then the address space randomization may cause the pointer to actually point to valid data.
    Last edited by Nominal Animal; 10-08-2012 at 12:52 PM.

  5. #5
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    " because the process simply cannot modify the memory containing code "

    Because of process isolation ?

    Process isolation - Wikipedia, the free encyclopedia

    and what is the usage of memory layout? As I have posted before ?

    ELF layout != memory layout

    The conclusion is that it depends on the operating system ? each time... not if the code segment is ROM or something related with that...

    P.S Of course we can have the bad luck "Our pointer shows in address which is writeable ... " and then we modify something that we don't have the way to know what is....
    Last edited by Mr.Lnx; 10-09-2012 at 02:25 PM.

  6. #6
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    I hope this clarifies things (and I hope I have all my facts straight):

    This is not an issue of process isolation. Process isolation is a technique that keeps one process from messing with the memory of another process. What you are talking about (the code sample in your original post) is about a process trying to change parts of it's own memory that it shouldn't, and isn't allowed to. Process isolation is aided by virtual memory systems, that allow everybody to have their own address space, and prevents them from accessing other processes memory spaces, except via controlled mechanisms (IPCs) that both processes must use.

    What this really has to do with Linux memory management. Memory is divided into segments or pages, which have a starting (virtual) address and a size. Each segment is marked with permissions, similar to a file: readable, writable and executable. If you have a segment that is not writable, and a pointer to an address in that segment, then any attempt to write to the address in the pointer will result in a seg fault.

    Note, memory layout is irrelevant. Memory layout (in the context of a user process) simply refers to the relative locations of different segments in the virtual address space. Code is typically at the lowest addresses, and is readable and executable. Above that is global and static local variables that are initialized. Above that is BSS*, which is for uninitialized globals and static locals. Both data and BSS are typically readable and writable, and not executable. Some global data is read-only, for constants. The heap is above the BSS segment, is readable and writable, and grows upward as needed. The stack is at the highest address**, is also readable and writable, and grows downward as needed. Somewhere in between the heap and the stack is where shared libraries and such are mapped into memory. I don't recall how the permissions work on that.

    The ELF format is a whole other thing. It's simply a format for specifying binary (executable) files. It basically contains all the data for the code, global data and BSS sections, plus shared libraries to load. It does mark some of the information as .text (code/instructions), .data (global data), .rodata (read-only global data, like constants) and .bss, plus a few others. When the loader is asked to run a program, it reads the ELF binary file and sets up all the memory segments with correct permissions. Note that, on some Linux variants designed for processes without a MMU, they ignore the whole concept permissions on memory segments, it's a free-for-all.

    * BSS stands for Block Started by Symbol. It's an old space-saving technique. It simply stores the starting address and the number of total bytes for all your unintialized globals and static locals, in the executable file. Since you don't care what their starting value is, so the computer doesn't have to store an initial value, and just makes them all zero. The loader expands that info to enough actual memory for all the variables.

    ** The stack is not at the very top of the memory. That is actually reserved for kernel stuff, but user-mode applications can not read from or write to (or execute, I'm sure) that area.

  7. #7
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    Because the Thing-King won't allow it.
    The Paging Game
    Fact - Beethoven wrote his first symphony in C

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    The virtual address space of the process at startup is dictated by the entries in the ELF program header. It says where the segments are within the binary image, and what their protections are. The code segment (usually, the same as the .text section) is, normally, set to read-only permissions.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  9. #9
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Quote Originally Posted by brewbuck View Post
    The virtual address space of the process at startup is dictated by the entries in the ELF program header. It says where the segments are within the binary image, and what their protections are. The code segment (usually, the same as the .text section) is, normally, set to read-only permissions.
    Isn't there some kind of randomization of the positions for security purposes?
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  10. #10
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by oogabooga View Post
    Isn't there some kind of randomization of the positions for security purposes?
    If the OS and program/dynamic library both support it, then yes. It isn't enabled for all executables by default, because a program has to be compiled a certain way to be compatible with it.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  11. #11
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by oogabooga View Post
    Isn't there some kind of randomization of the positions for security purposes?
    Yes, see the section on randomize_va_space in the Linux kernel Documentation/sysctl/kernel.txt file. It describes the three possible values in /proc/sys/kernel/randomize_va_space that affect the layout randomization:
    Code:
    0 - Turn the process address space randomization off.  This is the
        default for architectures that do not support this feature anyways,
        and kernels that are booted with the "norandmaps" parameter.
    
    1 - Make the addresses of mmap base, stack and VDSO page randomized.
        This, among other things, implies that shared libraries will be
        loaded to random addresses.  Also for PIE-linked binaries, the
        location of code start is randomized.  This is the default if the
        CONFIG_COMPAT_BRK option is enabled.
    
    2 - Additionally enable heap randomization.  This is the default if
        CONFIG_COMPAT_BRK is disabled.
    
        There are a few legacy applications out there (such as some ancient
        versions of libc.so.5 from 1996) that assume that brk area starts
        just after the end of the code+bss.  These applications break when
        start of the brk area is randomized.  There are however no known
        non-legacy applications that would be broken this way, so for most
        systems it is safe to choose full randomization.
    
        Systems with ancient and/or broken binaries should be configured
        with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
        address space randomization.
    It turns out I was mistaken in how much randomization there really is. Or actually, how little.

    Consider this (a bit bastard, but illustrative) program:
    Code:
    #include <stdlib.h>
    #include <stdio.h>
    
    static const int  global_static_const = 1;
    static int        global_static = 2;
    const int         global_const = 3;
    int               global = 4;
    
    void global_func(void)
    {
    	return;
    }
    
    static void static_func(void)
    {
    	return;
    }
    
    int main(void)
    {
    	const int  local_const = 5;
            int        local = 6;
    	int       *dynamic;
    
    	void local_func(void)
    	{
    		return;
    	}
    
    	dynamic = malloc(sizeof (1000000));
    
    	printf("global static const at %p\n", &global_static_const);
    	printf("global static       at %p\n", &global_static);
    	printf("global const        at %p\n", &global);
    	printf("local const         at %p\n", &local_const);
    	printf("local               at %p\n", &local);
    	printf("dynamic             at %p\n", dynamic);
    	printf("stdin               at %p\n", &stdin);
    	printf("global function     at %p\n", &global_func);
    	printf("static function     at %p\n", &static_func);
    	printf("local function      at %p\n", &local_func);
    	printf("main()              at %p\n", &main);
    	printf("fclose()            at %p\n", &fclose);
    	printf("\n");
    
    	return 0;
    }
    When compiled normally (gcc source.c -o binary, plus optimization and/or warning flags) only local const, local, and dynamicvary. Using gcc-4.6.3, recompiling the binary does not affect the addresses. In other words, locations specified in the ELF binary for code (.text) and data (.data, .rodata, .bss -- as per objdump -t binary) do not vary; only stack (local variables) and dynamic allocations do.

    However, if you compile with -fPIE, which tells gcc to compile a position independent executable, and/or with -fPIC, which tells gcc to compile to position independent code,
    Code:
    gcc -fPIE -W -Wall -O0 source.c -o binary
    or
    gcc -fPIC -W -Wall -O0 source.c -o binary
    or
    gcc -fPIE -fPIC -W -Wall -O0 source.c -o binary
    then in addition to local const, local, and dynamic, also stdin and fclose vary. This means that local variables, and variables and functions from dynamically linked libraries, will have randomized addresses. I haven't tried any benchmarking to see if and how this affects the generated code.

    I personally did learn something new from this: code and global variable address randomization is currently available easily only for dynamically loaded libraries, and then only if the binaries are compiled with appropriate options (-fPIC and/or -fPIE). I'm a bit embarrassed, to be honest.

  12. #12
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    Well thank you guys. You give me important advice .

    My first question becames from this book ->

    Teach Yourself C in 21 Days (Sams Teach Yourself): Peter G. Aitken,Bradley L. Jones,Peter Aitken: 9780672310690: Amazon.com: Books

    PAGE 191 It writes that with the code that I have posted before (int *p ... *p=2);

    Even you can rewrite at the location which belongs to the Operating System... I found that strange . Is that true ?

    P.S Actually we need ELF format in order to find how to load the program..... the layout of a program on Unix is what I have posted in the first post

    AM I right? (even in generall)

    thank you for your time.

    Ah forgot to inform you that I found a very good link for someone newbie (like me)

    Loader

    :-)
    Last edited by Mr.Lnx; 10-10-2012 at 10:41 AM.

  13. #13
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    Quote Originally Posted by Nominal Animal View Post
    The reference to .text (as well as .data, .rodata and so on) come from format used for native binary executables in Linux, the ELF format.

    Salem means that in Linux, program code is read into memory that is marked read-only and executable, on all processors that support it. So, something like
    Code:
    int my_function(const int a, const int b)
    {
        return a + b;
    }
    
    /* Zero out first ten bytes of my_function */
    memset(&my_function, 0, 10);
    may work on some other platforms, but in Linux it will fail at run time with a Segmentation fault -- which basically means that the program tried to write to memory it is not allowed to write to (or read or execute memory it's not allowed to), the hardware noticed it, and the kernel decided to kill the process. (There are some embedded platforms that don't have virtual memory or a memory management where that will actually succeed; they just cannot protect against code modifications.)

    Ah, forgot about the original question.

    The memory layout does not matter at all, because the process simply cannot modify the memory containing code. In fact, current Linux kernels employ a randomization scheme, to make security exploits harder to write as memory addresses change from one execution to the next. This means that there is no specific layout.

    It also means that catching bugs where you have a pointer with fixed, wrong value, becomes more interesting: it may only fail some of the time. Every now and then the address space randomization may cause the pointer to actually point to valid data.
    The randomness occurs only in the stack , heap ... etc

    Address Space Layout Randomization - HowToHack

    It requires more discussion... and a good book about operating systems.
    The topic is not simple. Finally thank you for advice... all of you
    It was a good introduction for the future


    P.S What do you mean by "There is no specific layout " ?
    anyway maybe I need reading....
    Last edited by Mr.Lnx; 10-11-2012 at 12:07 PM.

  14. #14
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    Quote Originally Posted by Mr.Lnx View Post
    Assuming that we have the code
    Code:
    int *p
    *p = 2;
    That is wrong and dangerous of course because pointer is uninitialized and it has no a known address.
    I wonder, would it be better to Initialise pointers as NULL?
    Code:
    int * p = NULL;
    Now it points nowhere, and if you try to assign anything to it, that will not overwrite any valid address.
    IDE: Code::Blocks | Compiler Suite for Windows: TDM-GCC (MingW, gdb)

  15. #15
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by cfanatic
    Now it points nowhere, and if you try to assign anything to it, that will not overwrite any valid address.
    It is still wrong though, since it is wrong to dereference a null pointer. The good part is that this is more likely to result in a crash than to silently "work" when it doesn't, and perhaps more importantly, you can check for a null pointer, but not for an invalid pointer.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. memory layout??
    By ungainly_one in forum C Programming
    Replies: 14
    Last Post: 11-14-2010, 10:12 AM
  2. memory layout problem
    By iamnew in forum C Programming
    Replies: 5
    Last Post: 06-05-2010, 08:48 AM
  3. Memory Layout
    By chris.r in forum C Programming
    Replies: 5
    Last Post: 04-18-2010, 02:41 PM
  4. memory layout and declaration
    By cbastard in forum C Programming
    Replies: 6
    Last Post: 09-13-2005, 12:24 PM
  5. First program layout, need help
    By Dell Boy in forum Windows Programming
    Replies: 3
    Last Post: 05-31-2002, 06:56 AM