How exactly does "realloc" expands memory?

**rempas** · 05-16-2022

I have heard that when you call "realloc", it will first try to expand the current memory you have and if it cannot do that, it will allocate a new memory address and return that instead. Ok, that's great!

The thing is, how exactly it "expands" memory? I checked and I couldn't find a system call that does that. I don't know how memory and virtual memory in different Operating Systems so I just know that when you allocate memory, you mark a specific block of memory as "used" by your process. So if you wanted to "expand" this block, I would think that you have to `free` the previous block and allocate a new bigger/smaller one. Any ideas?

**hamster_nz** · 05-16-2022

Originally Posted by rempas

I have heard that when you call "realloc", it will first try to expand the current memory you have and if it cannot do that, it will allocate a new memory address and return that instead. Ok, that's great!

The thing is, how exactly it "expands" memory? I checked and I couldn't find a system call that does that. I don't know how memory and virtual memory in different Operating Systems so I just know that when you allocate memory, you mark a specific block of memory as "used" by your process. So if you wanted to "expand" this block, I would think that you have to `free` the previous block and allocate a new bigger/smaller one. Any ideas?

It calls the sbrk system call:

sbrk -
man pages section 2: System Calls

In some cases it can use mmap(), at least on Linux, if you are asking for really big chunks of memory.

Growing existing allocations works as you suspect. If there is free memory after the allocation it might be able to merge the blocks to satisfy the request.

However, most likely is that new space will need to be allocated, and the existing contents copied to the new allocation, and the old one free()ed.

Have a look at the realloc() man page...

**rempas** · 05-16-2022

Originally Posted by hamster_nz

It calls the sbrk system call:

sbrk -
man pages section 2: System Calls

In some cases it can use mmap(), at least on Linux, if you are asking for really big chunks of memory.

Growing existing allocations works as you suspect. If there is free memory after the allocation it might be able to merge the blocks to satisfy the request.

However, most likely is that new space will need to be allocated, and the existing contents copied to the new allocation, and the old one free()ed.

Have a look at the realloc() man page...

Thank you for the info! I know how "mmap" works and it is actually what I'm using for my system allocator. I suppose that it uses "old_ptr + old_length + 1" as the "address" argument of "mmap" rather than "null" to check if the next memory address after the current block has enough space to "expand" the current block.

However, this raises a question! If we do that, then wouldn't the OS had to go check the memory byte by byte to check the block is a continues block? So wouldn't this be slower that if we just reallocated a new random block? Also, in the case that the memory could not be expanded, then we would have to make a second system call. So wouldn't be a better strategy to just allocate a new random block all together? Especially if like you said, most likely, it will not be able to "expand it"

As for brk, I have heard about it but never looked it in detail. Now that I read the info in the links you typed, I can see how you can use "ptr + new_size" to increase the block (If I understand it right)! However, the thing about "brk" is that it is limited and I'm making an allocator that will use "mmap" and will have all the flexibility "mmap" has.

**hamster_nz** · 05-16-2022

Originally Posted by rempas

Thank you for the info! I know how "mmap" works and it is actually what I'm using for my system allocator. I suppose that it uses "old_ptr + old_length + 1" as the "address" argument of "mmap" rather than "null" to check if the next memory address after the current block has enough space to "expand" the current block.

However, this raises a question! If we do that, then wouldn't the OS had to go check the memory byte by byte to check the block is a continues block? So wouldn't this be slower that if we just reallocated a new random block? Also, in the case that the memory could not be expanded, then we would have to make a second system call. So wouldn't be a better strategy to just allocate a new random block all together? Especially if like you said, most likely, it will not be able to "expand it"

As for brk, I have heard about it but never looked it in detail. Now that I read the info in the links you typed, I can see how you can use "ptr + new_size" to increase the block (If I understand it right)! However, the thing about "brk" is that it is limited and I'm making an allocator that will use "mmap" and will have all the flexibility "mmap" has.

As an over simplification, the malloc library maintains an ordered list of memory regions, that are either "in use" or "free".

When you free() a block of memory, it can be consolidated with any free regions before or after it. And when you malloc() a new region it can split an existing free regions (giving the new 'in use' region and a smaller 'free' region).

When you call realloc(), the lowest cost answer is to see if the next region in memory is free, and if there is enough free space that merging the two can satisfy the request. It's just a couple of small book-keeping entries if this plan works.

When malloc() doesn't have enough space anywhere to satisfy a request it calls sbrk() to get a big region and adds it the end of the ordered list of memory regions, and issues the requested space from there.

Writing your own malloc/free implementation is a great exercise in seeing how this all works, as well as being a great exercise in frustration.

**rempas** · 05-16-2022

Originally Posted by hamster_nz

As an over simplification, the malloc library maintains an ordered list of memory regions, that are either "in use" or "free".

When you free() a block of memory, it can be consolidated with any free regions before or after it. And when you malloc() a new region it can split an existing free regions (giving the new 'in use' region and a smaller 'free' region).

When you call realloc(), the lowest cost answer is to see if the next region in memory is free, and if there is enough free space that merging the two can satisfy the request. It's just a couple of small book-keeping entries if this plan works.

When malloc() doesn't have enough space anywhere to satisfy a request it calls sbrk() to get a big region and adds it the end of the ordered list of memory regions, and issues the requested space from there.

I see! So it is not as I expected it. Thank you my friend!

Originally Posted by hamster_nz

Writing your own malloc/free implementation is a great exercise in seeing how this all works, as well as being a great exercise in frustration.

Actually, I'm writing my own system library for my new programing language so it is not just an exercise. I will not copy the way `libc` works because I don't like neither its API nor the way a lot of things work under the hood. I don't think that the memory allocator should add an extra overhead to allocate more memory than you asked it and have a list that it will have to check every time `malloc` or `realloc` is called. There is also no algorithm to predict how much memory the user will request the next time so most of the times, `mmap` or `sbrk` will be called anyway. Allocating a "shared" block of memory and have your structs using it some times more efficient and its a great optimization strategy but some times you may want to just allocate a block of memory without extra overhead. So, I will personally give the choice to the programmer!

**hamster_nz** · 05-17-2022

Hope it goes well... there is a big gap from the OS page size to the user friendly allocations.

There are quite a few challenges to get a high performance memory allocator. Alignment issues and multithreading come to mind, esp releasing resources in a different thread that the one that requested it.

**rempas** · 05-17-2022

Originally Posted by hamster_nz

Hope it goes well... there is a big gap from the OS page size to the user friendly allocations.

There are quite a few challenges to get a high performance memory allocator. Alignment issues and multithreading come to mind, esp releasing resources in a different thread that the one that requested it.

Thank you! Yes, I'm putting myself into deep waters but I hope I'll come out in one piece...

**christop** · 05-17-2022

Originally Posted by rempas

I don't think that the memory allocator should add an extra overhead to allocate more memory than you asked it and have a list that it will have to check every time `malloc` or `realloc` is called. There is also no algorithm to predict how much memory the user will request the next time so most of the times, `mmap` or `sbrk` will be called anyway. Allocating a "shared" block of memory and have your structs using it some times more efficient and its a great optimization strategy but some times you may want to just allocate a block of memory without extra overhead. So, I will personally give the choice to the programmer!

Keep in mind that, since they're system calls, mmap and sbrk have far more overhead (like 100x) than the little bit of overhead of bookkeeping that libc and the like have to do. The reason that libc allocates more memory than is immediately needed or asked for is to minimize the number of times it has to call mmap or sbrk to get more memory from the OS.

Also keep in mind that many C standard libraries (especially the likes of GNU's libc) have been developed and fine-tuned for performance over the last few decades, so they're fairly well optimized by this point. You'd have a hard time beating them in the general case (such as in a new general-purpose programming language).

**john.c** · 05-17-2022

You can of course often beat the standard allocator in particular cases where you know the pattern of allocation/deallocation. In such a case you can build something on top of the standard allocator instead of replacing it.

For example, I once wrote a program which had to dynamically allocate a bunch of strings. They were allocated one-by-one but deallocated in blocks. So I would malloc a good-sized chunk of memory to hold all the strings of that block until they needed to be deallocated, keep track of the allocation point for parcelling out the space, and then deallocate the entire block at once. Simple and much more efficient than using the allocator directly.

Since managing these strings was the heart of the program, te speed-up was significant. The only trick was to be able to allocate a second (third, etc) block if needed. Basically, the last 4 bytes of a block (if it was a 32-bit machine) was used to point to the next block or was NULL if this was the last block.

This is the kind of simple allocator overlay that pays dividends. Reimplementing the allocator at the system call level is madness except as an exercise.

**rempas** · 05-17-2022

Originally Posted by christop

Keep in mind that, since they're system calls, mmap and sbrk have far more overhead (like 100x) than the little bit of overhead of bookkeeping that libc and the like have to do. The reason that libc allocates more memory than is immediately needed or asked for is to minimize the number of times it has to call mmap or sbrk to get more memory from the OS.

Also keep in mind that many C standard libraries (especially the likes of GNU's libc) have been developed and fine-tuned for performance over the last few decades, so they're fairly well optimized by this point. You'd have a hard time beating them in the general case (such as in a new general-purpose programming language).

You are actually making a fair point! I suppose, the biggest advantage that the library can give you is been able to "expand" the memory. If you get a new location then you need to also copy the old object from one place to another. At the same time, this wastes memory that may never be needed in the first place. With the same logic, you can set a minimum of your program and pre-allocate a big chunk of memory (like a Gigabyte for example) and then treat it like that. This way, the library will avoid the overhead and you will have to do less allocations anyway! So profiling your program and see what works best is the best option. Keeping the library code as simple as possible also means having less chances of bugs and been able to deliver the library faster!

I really do respect standards and the work other developers has put all these decades but I also love new ideas! They should be tried out of theory and see how they work! Specifically for `libc`, I have some mixed feeling because I think that in some cases, it does a lot of abstractions and has a lot of overhead resulting in it been bloated but at the same time, I feel like it is minimal and it makes creating big general purpose program a pain. The second point may not be so true however.

Btw, is the overhead of allocating memory really as big as 100x? Like, what in the world the OS has to do to give you more memory???

**rempas** · 05-18-2022

Originally Posted by john.c

You can of course often beat the standard allocator in particular cases where you know the pattern of allocation/deallocation. In such a case you can build something on top of the standard allocator instead of replacing it.

For example, I once wrote a program which had to dynamically allocate a bunch of strings. They were allocated one-by-one but deallocated in blocks. So I would malloc a good-sized chunk of memory to hold all the strings of that block until they needed to be deallocated, keep track of the allocation point for parcelling out the space, and then deallocate the entire block at once. Simple and much more efficient than using the allocator directly.

Since managing these strings was the heart of the program, te speed-up was significant. The only trick was to be able to allocate a second (third, etc) block if needed. Basically, the last 4 bytes of a block (if it was a 32-bit machine) was used to point to the next block or was NULL if this was the last block.

Great example! If I understood correctly, this is exactly what I answered to @christop! No reason to have the library doing extra book-keeping as the users (the programmer that uses the library) should always profile their program and see what works best for their use case. Allocating a big block of memory is probably the best scenario given the fact how much memory we have these days. Even laptops that considered old have at least 4GB of RAM. So always sacrifice some memory for performance!

Originally Posted by john.c

This is the kind of simple allocator overlay that pays dividends. Reimplementing the allocator at the system call level is madness except as an exercise.

Yeah, but don't we need a little bit of madness in our world?

Also the way I designed it, the whole memory allocation library (including code to make the system calls and the constants like 'PROT_READ') is about 110 lines of code! It will probably still change a little bit as I'm trying new stuff but you get the idea...

And like I said, it's necessary to create a new system library in a new AOT language. I also thought that it is super hard when I started out and I didn't knew our understood a lot of stuff (I make a question here about where the returned value of a system call is placed so time ago to give you an idea) but the more you do it, the easier it gets and you get more confident and you complain less about bugs and things that don't work as you expected them to ("why is my PERFECT code not working???" Sounds familiar?) and you instead spend your time and energy fixing them. Work hard, enjoy the results!

**christop** · 05-18-2022

Originally Posted by rempas

Btw, is the overhead of allocating memory really as big as 100x? Like, what in the world the OS has to do to give you more memory???

Yeah, it really can be as big as 100x or more. Reading/writing data structures that are already within a process's memory is fast (nanoseconds), but it takes a lot more time (maybe tens of nanoseconds) to switch to and from kernel mode and having the kernel modify page tables or whatever (maybe tens to hundreds of nanoseconds), and also clear the memory, to allocate a big block of memory for your process.

**rempas** · 05-19-2022

Originally Posted by christop

Yeah, it really can be as big as 100x or more. Reading/writing data structures that are already within a process's memory is fast (nanoseconds), but it takes a lot more time (maybe tens of nanoseconds) to switch to and from kernel mode and having the kernel modify page tables or whatever (maybe tens to hundreds of nanoseconds), and also clear the memory, to allocate a big block of memory for your process.

I see! Then I guess the best strategy is to pre-allocate a big chunk of memory like I said. Profile your program and act as necessary! Thank you for the information!

Thread: How exactly does "realloc" expands memory?

Thread Tools

Search Thread

Display

How exactly does "realloc" expands memory?

Similar Threads

Problem in template that uses "decltype", "typename", and "std::remove_reference"

How to improve my mouse skills to select "start new thread" and "post reply"

Some help with "realloc invalid next size" problem?

"itoa"-"_itoa" , "inp"-"_inp", Why some functions have "

"CWnd"-"HWnd","CBitmap"-"HBitmap"...., What is mean by "

Tags for this Thread