How robust is malloc() in practise?

**Richardcavell** · 04-10-2011

Hi, everyone. I have a Wikipedia bot that has the capability to download a number of items from Wikipedia - the HTML version of a page, the wikitext version, XML information about the page's author and timestamp, a past version of the page, and so on.

What I have been doing up till now is estimating an amount of RAM which can hold even the largest page (4 megs) and allocating 6 such buffers, then passing the buffer addresses to various routines as part of a handle. However, as the number of functions in my bot increases, the number of potential buffers increases. At the moment I'm allocating 24 megs, most of which will never be used. My original idea was that if I only do one malloc() at the start of the program, it will decrease the amount of paper-shuffling needed to keep track of the different buffers, and reduce the possibility of memory fragmentation.

However, I'd like to at least have the option of dynamic allocation. When downloading a page of any type, the function would malloc(1) (0 doesn't work on some OS's), and then every time stuff is added to it, realloc() and memcpy() until done. Then the function must free() the buffer.

My bot will potentially go through this routine 10 times to do the simplest Wikipedia task. Yet, I feel as though this is the 'right' way to do it. Otherwise I need to keep on adding 4-meg buffers to my handle for each possible function.

Realistically, am I creating any headaches by using malloc()/realloc()/memcpy()/free() rather than working with fixed buffers? Could it ever lead to unrecoverable run-time memory fragmentation or memory leaks? How robust is the malloc() mechanism in practise? Note that my code is very portable and designed to work with everything from the Amiga to Windows to Slackware to VxWorks.

**Salem** · 04-10-2011

> the function would malloc(1) (0 doesn't work on some OS's),
Explain please.
malloc(0) should return either a unique pointer or NULL.

Originally Posted by c99

If the size of the space requested is zero, the behavior is
implementation-defined: either a null pointer is returned, or the behavior is as if the size
were some nonzero value, except that the returned pointer shall not be used to access an
object.

Either way, if you do
p = malloc(0);
you should NOT do *p at any point

Further, if you do
p = malloc(0);

Then you can (in either case) do
q = realloc( p, newsize );
if ( q != NULL ) p = q; else error();

Or you can simply do
p = NULL;
then
q = realloc( p, newsize );
if ( q != NULL ) p = q; else error();

> Could it ever lead to unrecoverable run-time memory fragmentation or memory leaks?
Well the leaks are your responsibility.
But the amount of fragmentation is down to each specific implementation. If you want a totally robust embedded system say, then don't use malloc.

> How robust is the malloc() mechanism in practise?
It's robust in the sense that it will conform to its API, so long as you use the memory properly (no overruns for example). Mostly, it's down to what you do when you get a NULL pointer back.

**Richardcavell** · 04-10-2011

Originally Posted by Salem

>
But the amount of fragmentation is down to each specific implementation. If you want a totally robust embedded system say, then don't use malloc.

I don't like the sound of that. I would like the bot to run on embedded/small processors. Having said that, in order for the bot to do anything useful, it needs to be able to cope with at least several megabytes of text data and HTTP transactions. So running it on a microwave oven is not realistic.

Richard

**Salem** · 04-10-2011

Think about it like this.
You've got 6MB of memory in the pool, and you have a 1MB buffer allocated, and you want to extend it to 2MB with realloc.

So the memory now looks like
1(just released)+2(buffer)+3(free)
or perhaps
1(just released)+3(free)+2(buffer) (the 1 and 3 merge together to be 4)

If you then want just over 3MB buffer (should be OK, 2 + 3.x is less than 6), then one is going to fail.

This is fairly easy to cope with on a desktop machine with GB of real memory, and virtualised address spaces for each process.

But if you're trying to alloc/realloc 8MB buffers on an embedded machine with anything less than say 64MB of memory pool space (not counting program code, OS code, all other data), then it's likely to get awkward in a hurry.

Is it really necessary to store a page in contiguous memory?
Because if most pages are say less than 10K, a few are 100K, and odd-balls are 1MB, then I would do something like this.

Code:

struct pageFragment {
    struct pageFragment *next;
    size_t usedSize;
    char fragment[10240];
};
struct page {
  struct pageFragment *head;
  struct pageFragment *tail;
  size_t totalSize;
};

Sure there are more allocations (and more frees later on), but the pool manager is going to have a lot more chance of joining adjacent blocks together into larger free blocks to satisfy future requests.

What you never end up doing is trying to realloc xMB into yMB and finding that it doesn't fit in any free block.

**grumpy** · 04-10-2011

Originally Posted by Richardcavell

I don't like the sound of that. I would like the bot to run on embedded/small processors. Having said that, in order for the bot to do anything useful, it needs to be able to cope with at least several megabytes of text data and HTTP transactions. So running it on a microwave oven is not realistic.

Memory fragmentation is not caused by malloc() as such. It is more usually caused by repeated reallocations and deallocations. If you can ensure any memory block you need is only allocated once (not allocated once, and later resized), and use techniques such as allocating bigger memory blocks first, it is possible to minimise fragmentation. That often comes down to how you design your program (identifying a realistic upper limit on how much memory needs to be allocated, and sticking to it) not to whether you use malloc() or not.

malloc() will always be fundamentally limited by the amount of memory resources on the host. There is little point in trying to allocate a 2GB block on a system that only has 16 MB of total memory (virtual and RAM). By any means.

It is also generally considered poor form for any program to consume most of the available memory resources unless it is specifically designed to be the only application running exclusively on the host machine.

**Richardcavell** · 04-10-2011

Originally Posted by Salem

Code:

struct pageFragment {
    struct pageFragment *next;
    size_t usedSize;
    char fragment[10240];
};
struct page {
  struct pageFragment *head;
  struct pageFragment *tail;
  size_t totalSize;
};

Sure there are more allocations (and more frees later on), but the pool manager is going to have a lot more chance of joining adjacent blocks together into larger free blocks to satisfy future requests.

What you never end up doing is trying to realloc xMB into yMB and finding that it doesn't fit in any free block.

Salem,

I see what you're proposing, but I worry that it's not in the spirit of C. Surely C allows me to store a string in a flat memory space and operate on it directly using <string.h>. If I use your suggestion, then I'll have to wrap all my string-handling functions.

I figure that any modern operating system that can handle a TCP/IP stack is going to have virtual memory plus other trickery going on. I know that Linux doesn't actually truly allocate the memory until it's actually used, for example. If you malloc() and then free() memory without using it at all, it does the paperwork but doesn't carve off the actual RAM.

When I was a little boy I had an Amiga, and any time memory was allocated using the OS, it would permanently fragment the memory map until the next reboot. So avoidable allocation/deallocation was a very bad thing, especially when you start off with 512 k. I hope that all modern OS's, including AmigaOS, have moved on from those days.

Richard

**~~CommonTater~~** · 04-10-2011

Richard... why do you not centralize all incoming data. One function that does all your "getting"... give it a big buffer; malloc 10mb if needed. Now when something comes in it goes into 1 (one) place and guess what, you know the size! So when your getter returns or signals it can say "327677 bytes ready" and you can then use malloc to create a memory block the right size and memcpy or memmove to scoop up the data. No more guesswork.

In my experience centralizing stuff like this ultimately leads to simpler, more robust code.

**Richardcavell** · 04-10-2011

Tater,

The HTTP requests return data in pieces, each up to 4 kbytes long. A write callback function is used to put it together. The write callback can do anything it wants - serialize it to disk, print it to screen, checksum it, whatever.

I guess you could have a persistent buffer for the write callback and put the returned data together sequentially, then malloc(strlen()+1) && strcpy(). But it's just as easy to put the received data directly into the new buffer. I figure that you would code it so that the initial malloc() allocates some fixed value, and when your buffer runs out of memory, you'll allocate up to the next multiple of the fixed value, where the fixed value is something like 4 kbytes. That way you'll realloc() every 4 kbytes but you won't have to realloc() every 20 bytes if your HTTP data comes in 20 byte chunks.

Richard

**~~CommonTater~~** · 04-10-2011

You can use either scheme... but I would favor a getter function that returns the required buffer size... The caller is responsible for creating it's own memory space, fetching the data, doing whatever, then disposing of it when done. The reason I favor that is that you eliminate a lot of scoping problems with your temporary buffer pointers; the data is housed in the function operating on it. In a single tasking environment, you could even make the getter's buffer global (yes, sometimes it is appropriate) and operate directly on that one buffer, elminating all other buffers. In a multitasking environment you would use interprocess signalling to stop the getter until you copy the data out...

You can use the realloc trick inside the getter but with 4 and 8 gb machines being popular these days I would hardly think a fixed buffer size of 10mb is much of an issue. (it's not like you're trying to shoe-horn this into some dodgy old 128mb antique, is it?)

**Salem** · 04-10-2011

It seems like a dumb question, but have you read the HTTP RFC?
RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1 (RFC2616)

For example, responses usually come with a "Content-Length"
If you know this arrives in the response header, and you can parse it out, then you know immediately (and up-front) how much data you're going to get.
Just allocate the buffer, fill it, and you're done - yes?

Thread: How robust is malloc() in practise?

Thread Tools

Search Thread

Display

How robust is malloc() in practise?

Allocating a buffer

Similar Threads

Robust error handling

Where to declare variables - Best Practise

Robust error checking

Robust method for storing data outside of a program

Putting Programming Knowledge To Practise