how a program malloc more then 4Gb ?

**jabka** · 10-06-2007

Hello,...

i have read the last post of mika0x65 on mika0x65.livejournal.com about PAE and also the relative thread on wikipedia.

the question is how do you malloc memory and use it ?

if for eample i do :
ptr = malloc (sizeof(ptr) * amount);

could i do ptr++ above the 4gb file ?
since as writen the memory isn't liner.

if so how it is done on the os level ?

P.s.

i guess i said some thing wrong so please correct me.

**Salem** · 10-06-2007

You wouldn't read the whole file into memory to begin with. Read the file in fixed sized blocks and process the file as you go.

If you did try and fit the large file in memory all at once, most of it would get swapped straight back out to disk in the swap file.

Besides, you need a 64-bit OS and well above 4GB of real RAM to have a chance of mallocing that much in one go.

**jabka** · 10-06-2007

No , i didn't explain myself in the correct manner.
i don't won't to read 20 Gb and procces i just want to accses the 5th giga bite of RAM.

when working with more than 4 Gb PAE get in to the picture so tne memory that you get with malloc isn't liner and you cannot work directly to it.

and since i don't have more then 4gb of ram ( i actulay have only one) i can't see what happen when you try to do ptr ++ after the 4th ram giga bite.

**Salem** · 10-06-2007

http://www.microsoft.com/whdc/system...AE/pae_os.mspx
In other words, it's the usual Intel hack of introducing segmented memory, except the segments are now 4GB in size rather than 64K.

**matsp** · 10-06-2007

Originally Posted by Salem

http://www.microsoft.com/whdc/system...AE/pae_os.mspx
In other words, it's the usual Intel hack of introducing segmented memory, except the segments are now 4GB in size rather than 64K.

Let's explain what PAE does first, then discuss it's hackishness.

For some times, 4GB was "plenty big enough for everyone" - then (some) servers started needing more than that, and now it's not entirely impossible to even get 8GB of RAM in a desktop/workstation type machne.

So, when the 386 was designed with it's paged-memory management unit (PMMU or MMU for short), it had 2 levels of page-table to get to the actual memory region, called a page of memory. Each page is 4KB in the original design.

The page-table starts where special register CR3 points to - this can be "anywhere" in memory, although it low 12 bits aren't used as address, so the CR3 will always point to an even 4KB region too.

The top 10 bits [bits 31..22] (giving 1024 values) of the address forms an index into the address pointed to by CR3. At the location indicated (which is somewhere at CR3 + 0x000..0xFFC), we find a 32-bit value to indicate the next level page table.

The next 10 bits [bits 21..12] (again, 1024 different value) of the address forms an index into the page pointed to by the previous level page-table. Again, it's a 32-bit value. This time it indicates the physical address of the page.

The remaining 12 bits indicates the offset wihtin this page (4KB) that we want to use.

Now, the problem with this is that all the page-tabel entries are 32-bit, and that of course, limits the amount of memory to 4GB.

So, to change that, we either have to change the meaning of the content in the page-table, or make each entry larger - or of course, "do something completley different altogether" - but that is generally considered a bad thing when "extending the architecture".

So the Intel solution to the problem was to change the meaning of each page-table entry, chaning the entries from 4 bytes (32 bit) to 8 bytes (64-bit), called Page Address Extension. This essentially allows for really huge memory addresses, although the processor itself only had [at the time of introducing PAE] 36 address pins, giving a range of 64GB addressable memory.

Changing the size of each entry in the page-table gives two choices: Either we make each section of page-table 8KB (to hold the same number of entries), or we keep them 4KB but make each table-level hold only 512 entries. The latter was choosen. [It does make sense, since the page-table entries themselves are 4KB, and it's kind of useful if you don't have to keep pairs of pages around in various places.]

This of course, means that we pick NINE bits at a time from the 12th bit upwards. As anyone with simple math skills can see, it leaves two bits left over at the top. These three bits form the THIRD PAGE-TABLE level. It only has four valid entries, pointed to by CR3. One of these four entries is used when accessing any memory in the machine, where the 2 top bits indicate a page of 512 64-bit enries, which are indexed by the next lower 9 bits - holding a pointer to the next pagetable level, and the next lower 9 bits indexes that page-table entry, which holds the physical address.

It is worth noting that x86_64 as defined by AMD uses the same principle as PAE, except it adds a further fourth level of page-table above the third level defined by Intel. This, of course, allows for a even greater range of virtual addresses, which is necessary to support 64-bit memory addressing [although, for practical reasons, this is currently artificially limited to 48 bits of virtual address and 40 or 48 bits of physical address depending on which model of processor it is]. 48 bits of memory is 65536 times more than 32 bits can address, giving 256 TERABYTE of addressable memory. That's quite a lot.

I'm sure this isn't clear as mud, but maybe it gives a flavour of what PAE does.

Now for the "hack" comments: Really, there isn't much you can do to extend the address range of a 32-bit register - other than introduce segments. Intel tried that with their 16-bit processor, and although segments still exist even in the 64-bit x86 architecture, it's not really used to address a further range (even in PAE, the segments still only have a 32-bit base-address).

The only solution to that problem is to use 64-bit registers (or 40-bit registers, for example - but 5-byte registers make for very awkward "divide by 5" situations in the processor, so using a power of 2, such as 64 is easier).

I don't know why Intel didn't make the processor 64-bit at that time. Probably because it wasn't "needed".

There are some applications that need more than 4GB of RAM. But in most cases, a heavily loaded server has a situation where there are many medium-sized applications running, each using some megabytes or perhaps a gigabyte, rather than a single application that uses more than 4GB. For this situation, a 32-bit limit for each application is fine.

If you want one application to be able to reach above 4GB, you need "bigger registers", which means moving to 64-bit OS and 64-bit application.

It is also worth noting that a 32-bit OS reserves part of the memory as "kernel-space", which is permanently mapped for all different applications (but only available in kernel-mode, so you can't read/write this from a normal application) - this is so that system calls can be made, interrupts taken, etc. The kernel space in Windows defaults to 2GB. This means that half the 32-bit memory space can't be used by any given application. It can be changed by giving the switch /3GB in boot.ini - to, unsurprisingly, 3GB.

Linux has a similar split, although the defaults are either 1/3GB or 3/1GB, never a "middle split".

In 64-bit, the kernel space is the top 128GB or so - but since there's 256 TB to split up, I don't think many people worry about this at present.

--
Mats

Thread: how a program malloc more then 4Gb ?

Thread Tools

Search Thread

Display

how a program malloc more then 4Gb ?

Similar Threads

Using variables in system()

BOOKKEEPING PROGRAM, need help!

airport Log program using 3D linked List : problem reading from file

My program, anyhelp