Thread: difference between a segment and a page

  1. #1
    Registered User
    Join Date
    Jan 2008
    Posts
    69

    difference between a segment and a page

    In this thread, which was unfortunately closed, brewbuck states:

    Speaking of page tables and memory protection, some processors (e.g. x86) also have a memory protection mechanism called "segmentation" which uses segments instead of pages for access control. This shouldn't be confused with the kind of "segments" we are talking about here, although the two things are not entirely unrelated.
    What's the difference between a segment and a page? And what kind of access controls do we have? Read/write? Anything else? How is memory protected?

    Thanks, guys.

  2. #2
    uint64_t...think positive xuftugulus's Avatar
    Join Date
    Feb 2008
    Location
    Pacem
    Posts
    355
    And a question of mine on segments but not related to cs32 is:

    How does one obtain write access to the code segment, so that one may re-write parts of the generated machine language instructions at run-time? It just poped into my head when i read that the code segment gets protected by the OS...
    Code:
    ...
        goto johny_walker_red_label;
    johny_walker_blue_label: exit(-149$);
    johny_walker_red_label : exit( -22$);
    A typical example of ...cheap programming practices.

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    The x86 architecture supports something called segments. A segment, as a basic principle, is a "chunk of memory". In the early x86, segments were introduced to allow the processor, that was 16-bit only, to access memory beyond 64k (2^16 == 64K). A segment base address gets added to the register value. In the early days, segments were only limited by the 64K architectural limit, and could start on any even 16-byte address in memory. The segment content was exactly the base-address shifted 4 bits down (2^4 = 16 - thus the limit of 16 byte alignment for the segments).

    There are three kinds of segments in x86: CS (Code Segment), DS (Data Segment) and SS (Stack Segment). There is also a second data-segment register, ES, that can be used for example when copying from one segment to another.

    To improve protection within the processor and at the same time extend the addressability of the processor, the 80286 introduced "Protected mode segments". In this mode, the segment cotnent isn't the base address of the segment, but an index into a "segment descriptor table". The segment descriptor table contains not only a base-address, but also the limit of the segment (how large the segment is). This means that a segment that is smaller than 64K can not overwrite into another segment. The segment base address was now 24 bits, giving a total address range of (a whopping) 16MB, but segments are still 64KB each (and registers are still 16 bit).

    Then came the 386, which extends the processor registers to 32 bits. It's not much point in having 32-bit registers if the processor can't use 32 bits to address memory, so the descriptor entries also got an extension to 32-bit base address and a 20-bit limit that can either be "bytes" or "multiply by 4KB" - the latter giving full 32-bit segment size.

    The 386 also introduces two new "spare" segment registers: FS, GS - they have the same sort of purpose as ES.

    A segment has several attributes, such as "Writable", "Executable". It can also be "not present", so it's possible to use segments to swap in and out data from disk. There is also bits to indicate if the code is "user-mode" or "kernel mode" [actually, there is 4 levels, but most OS's would only use level 3 (user) and level 0 (kernel)]. Thus, each segment can be accessible by kernel or user-mode.

    I'm sure this is clear as mud. A bit more in a bit.

    Now for paging.
    A page is a 4KB section of memory that is handled by the MMU (Memory Management Unit). It is another approach to protecting various bits of memory from being abused by apps that shouldn't be using that memory. The pages have attributes such as Read/Write, User/Kernel access. A page can be "present" or "not present". If the page is "not present", a "page fault" happens. This is used for two things: When the swapping process puts pages of memory to disk to make space for other apps/data, it marks that particular page as "not present". The other use is to mark memory inaccessible, so that for example NULL-pointers can be caught as "invalid memory access".

    Page-tables are used to keep track of which 4KB section of memory is where, what attributes it has, etc. In the traditional case, a page-table-entry (PTE) is 32-bits, and part of that is the PHYSICAL address in memory, the remaining part is used for attributes.

    In modern OS's page-tables are used in preference over segments. Segments still exist, because they are a basic component of the processor architecture, but the segment descriptors are there only because they must be, and the approach is to set all segment descriptors (CS, DS, SS, etc) to have base address of 0 and limit of 4GB. That gives all segments access to all of the memory, and no difference is made to what memory section is used for what purpose at the segment level.


    As to xuftugulus question: Since the page-tables is what determines where you can write and where you can't, you have to ask the OS (nicely) if it will allow you write access to the code-section.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    In Windows, you can actually ask VirtualAlloc to give you some pages where you can execute code.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Moved to Tech Board.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Elysia View Post
    In Windows, you can actually ask VirtualAlloc to give you some pages where you can execute code.
    Yes, if you want to generate code at runtime, you can ask for pages that have execute attribute.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Registered User MacNilly's Avatar
    Join Date
    Oct 2005
    Location
    CA, USA
    Posts
    466
    Sorry, but segments and pages are obsolete with the standard flat-memory model used now. I can't think why anyone would wish to learn that malformed and contorted system when flat memory model is so easy?

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Pages very much exist still today.
    Some functions, such as Window's VirtualAlloc allocates memory rounded to pages, for example.
    And you get only access violations when writing to a page you don't own. That's why you won't get an access violation when writing one byte outside of your array.
    This is relative to the x86, of course.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by MacNilly View Post
    Sorry, but segments and pages are obsolete with the standard flat-memory model used now. I can't think why anyone would wish to learn that malformed and contorted system when flat memory model is so easy?
    As Elysia says, pages are definitely still in use. That is how your Linux or Windows can run several applications that are all loaded at the same address, but in their own virtual space.

    Yes, flat memory model, using the segments as I described above (base=0, limit=4GB) is the convention used by most OS's.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    Registered User MacNilly's Avatar
    Join Date
    Oct 2005
    Location
    CA, USA
    Posts
    466
    Ugh. I shudder to think of pages and segments. Damn that assembly language class. ;p

  11. #11
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Normally, though, you don't really need to mess around with them unless you use specific functions. So it's fine not to learn them, I think.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by MacNilly View Post
    Ugh. I shudder to think of pages and segments. Damn that assembly language class. ;p
    Just wait 'til you try to understand the system of page-tables when you are using AMD Virtualization which has nested page-tables, where the virtual machine has it's page-table, but to make sure that the VIRTUAL machine doesn't clobber other virtual machines, the Hypervisor/VMM (Virtual Machine Monitor) has ANOTHER set of page-tables, so every page-table access of the virtual machine itself is also redirected through the page-table of the VMM. It gets a bit "interesting" to understand where the actual physical memory for a particular page in the virtual page that the guest is using.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  13. #13
    l'Anziano DavidP's Avatar
    Join Date
    Aug 2001
    Location
    Plano, Texas, United States
    Posts
    2,743
    There is something called the "NX" bit or the "XD" bit (depending on if it is AMD or Intel) that forbids executing code in certain segments of memory, but this is not always supported. In Windows it came online with XP SP2, but not all programs know about it or use it. The Linux 2.6 kernel supports it, but lots of times you have to compile a program specifically with the option specified that you want to use that feature.

    Lots of times this is how a buffer overflow attack works. All you have to do is overwrite the return address in your stack frame, and then when the function returns, it will go to the new address that you wrote in memory. Voila! Now whatever is in that return address that it went to will be run as code. Hackers use this technique a lot in order to gain access and execute code on a compromised machine. But we shouldn't be talking about what hackers do....I just mentioned that because that is one way to execute code generated on the fly if you want to do so.

    We have been talking about paging systems, virtual memory, and caches a lot in my operating systems class right now. There are ways of paging that are both software-based and hardware-based. The best option, obviously, is when you have hardware support to do paging.

    Like matsp already mentioned, the is something called the MMU (Memory Management Unit). A virtual address gets passed to the MMU, and the MMU spits out the physical address which then gets passed to main memory. Each virtual address is split into two portions: a virtual page number and a page offset. So imagine you have some memory address. The high "n" bits represent a virtual page number, while the lower "p" bits represent an offset into that page. When the MMU does address translation, the page offset stays the same. The virtual page number, however, gets translated into a physical page number.

    Address translation in the MMU happens via a page table. The MMU will index into the page table when it is given the virtual page number. The page table will say if that page is valid (in other words, if that page is currently in main memory....it could be that the page is not valid, or in other words, it is stored on disk).

    Usually a hardware paging system will have a TLB (Translation Look-aside Buffer) so that it doesn't have to access main memory twice in order to access one thing. The TLB is a small cache on the chip which holds a small portion of the page table (recently accessed pages in memory). By doing this, most virtual addresses can be translated to physical addresses just by looking at the pages contained in the TLB, and we don't have to go to the physical memory to look at the full page table. If the page we want isn't in the TLB, then we have to go to physical memory and find the page in the page table there (that is one memory access), and then go to physical memory again to get the actual data we were looking for (that is a second memory access).

    Then there are multilevel page tables which have been mentioned...but I won't go into.
    My Website

    "Circular logic is good because it is."

  14. #14
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by DavidP View Post
    There is something called the "NX" bit or the "XD" bit (depending on if it is AMD or Intel) that forbids executing code in certain segments of memory, but this is not always supported. In Windows it came online with XP SP2, but not all programs know about it or use it. The Linux 2.6 kernel supports it, but lots of times you have to compile a program specifically with the option specified that you want to use that feature.
    Yes, the NX/XD bit is bit 63 in a PAE-page-table entry (PAE is an extension to support more than 4GB of RAM in a 32-bit machine). This bit only means "don't execute here" if the correct bit is also set in another register (EFER, I think, or it's CR4) - and of course, only on machines that actually have this feature (all 64-bit AMD processors and most of the 64-bit Intel models support this).

    The problem with some applications is that they do what I described earlier: generate code at run-time, which requires writable memory that is also executable. Applications that were developed before the OS support for "No Execute/Execute disable" got in there would of course just use the regular "malloc()" or some such to create memory space for execution, and then fail miserably when it tried to execute the code with the Execute Disable bit set...

    --
    Mats

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  15. #15
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by matsp View Post
    The problem with some applications is that they do what I described earlier: generate code at run-time, which requires writable memory that is also executable. Applications that were developed before the OS support for "No Execute/Execute disable" got in there would of course just use the regular "malloc()" or some such to create memory space for execution, and then fail miserably when it tried to execute the code with the Execute Disable bit set...
    With OS support, the processor could set up alternate page tables pointing to the same region of memory, but with different execute/write bits.

    Failing that, on a POSIX-like operating system you can call mprotect() to cause the pages to become writable, write the generated code into them, then call mprotect() again to flip them back to read-only. That's a lot of thunking in and out of the kernel, though.

Popular pages Recent additions subscribe to a feed