Thread: How robust is malloc() in practise?

  1. #1
    Registered User
    Join Date
    Feb 2011
    Posts
    144

    How robust is malloc() in practise?

    Hi, everyone. I have a Wikipedia bot that has the capability to download a number of items from Wikipedia - the HTML version of a page, the wikitext version, XML information about the page's author and timestamp, a past version of the page, and so on.

    What I have been doing up till now is estimating an amount of RAM which can hold even the largest page (4 megs) and allocating 6 such buffers, then passing the buffer addresses to various routines as part of a handle. However, as the number of functions in my bot increases, the number of potential buffers increases. At the moment I'm allocating 24 megs, most of which will never be used. My original idea was that if I only do one malloc() at the start of the program, it will decrease the amount of paper-shuffling needed to keep track of the different buffers, and reduce the possibility of memory fragmentation.

    However, I'd like to at least have the option of dynamic allocation. When downloading a page of any type, the function would malloc(1) (0 doesn't work on some OS's), and then every time stuff is added to it, realloc() and memcpy() until done. Then the function must free() the buffer.

    My bot will potentially go through this routine 10 times to do the simplest Wikipedia task. Yet, I feel as though this is the 'right' way to do it. Otherwise I need to keep on adding 4-meg buffers to my handle for each possible function.

    Realistically, am I creating any headaches by using malloc()/realloc()/memcpy()/free() rather than working with fixed buffers? Could it ever lead to unrecoverable run-time memory fragmentation or memory leaks? How robust is the malloc() mechanism in practise? Note that my code is very portable and designed to work with everything from the Amiga to Windows to Slackware to VxWorks.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > the function would malloc(1) (0 doesn't work on some OS's),
    Explain please.
    malloc(0) should return either a unique pointer or NULL.
    Quote Originally Posted by c99
    If the size of the space requested is zero, the behavior is
    implementation-defined: either a null pointer is returned, or the behavior is as if the size
    were some nonzero value, except that the returned pointer shall not be used to access an
    object.
    Either way, if you do
    p = malloc(0);
    you should NOT do *p at any point

    Further, if you do
    p = malloc(0);

    Then you can (in either case) do
    q = realloc( p, newsize );
    if ( q != NULL ) p = q; else error();


    Or you can simply do
    p = NULL;
    then
    q = realloc( p, newsize );
    if ( q != NULL ) p = q; else error();


    > Could it ever lead to unrecoverable run-time memory fragmentation or memory leaks?
    Well the leaks are your responsibility.
    But the amount of fragmentation is down to each specific implementation. If you want a totally robust embedded system say, then don't use malloc.

    > How robust is the malloc() mechanism in practise?
    It's robust in the sense that it will conform to its API, so long as you use the memory properly (no overruns for example). Mostly, it's down to what you do when you get a NULL pointer back.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Feb 2011
    Posts
    144
    Quote Originally Posted by Salem View Post
    >
    But the amount of fragmentation is down to each specific implementation. If you want a totally robust embedded system say, then don't use malloc.
    I don't like the sound of that. I would like the bot to run on embedded/small processors. Having said that, in order for the bot to do anything useful, it needs to be able to cope with at least several megabytes of text data and HTTP transactions. So running it on a microwave oven is not realistic.

    Richard

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Think about it like this.
    You've got 6MB of memory in the pool, and you have a 1MB buffer allocated, and you want to extend it to 2MB with realloc.

    So the memory now looks like
    1(just released)+2(buffer)+3(free)
    or perhaps
    1(just released)+3(free)+2(buffer) (the 1 and 3 merge together to be 4)

    If you then want just over 3MB buffer (should be OK, 2 + 3.x is less than 6), then one is going to fail.

    This is fairly easy to cope with on a desktop machine with GB of real memory, and virtualised address spaces for each process.

    But if you're trying to alloc/realloc 8MB buffers on an embedded machine with anything less than say 64MB of memory pool space (not counting program code, OS code, all other data), then it's likely to get awkward in a hurry.

    Is it really necessary to store a page in contiguous memory?
    Because if most pages are say less than 10K, a few are 100K, and odd-balls are 1MB, then I would do something like this.
    Code:
    struct pageFragment {
        struct pageFragment *next;
        size_t usedSize;
        char fragment[10240];
    };
    struct page {
      struct pageFragment *head;
      struct pageFragment *tail;
      size_t totalSize;
    };
    Sure there are more allocations (and more frees later on), but the pool manager is going to have a lot more chance of joining adjacent blocks together into larger free blocks to satisfy future requests.

    What you never end up doing is trying to realloc xMB into yMB and finding that it doesn't fit in any free block.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by Richardcavell View Post
    I don't like the sound of that. I would like the bot to run on embedded/small processors. Having said that, in order for the bot to do anything useful, it needs to be able to cope with at least several megabytes of text data and HTTP transactions. So running it on a microwave oven is not realistic.
    Memory fragmentation is not caused by malloc() as such. It is more usually caused by repeated reallocations and deallocations. If you can ensure any memory block you need is only allocated once (not allocated once, and later resized), and use techniques such as allocating bigger memory blocks first, it is possible to minimise fragmentation. That often comes down to how you design your program (identifying a realistic upper limit on how much memory needs to be allocated, and sticking to it) not to whether you use malloc() or not.

    malloc() will always be fundamentally limited by the amount of memory resources on the host. There is little point in trying to allocate a 2GB block on a system that only has 16 MB of total memory (virtual and RAM). By any means.

    It is also generally considered poor form for any program to consume most of the available memory resources unless it is specifically designed to be the only application running exclusively on the host machine.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  6. #6
    Registered User
    Join Date
    Feb 2011
    Posts
    144
    Quote Originally Posted by Salem View Post
    Code:
    struct pageFragment {
        struct pageFragment *next;
        size_t usedSize;
        char fragment[10240];
    };
    struct page {
      struct pageFragment *head;
      struct pageFragment *tail;
      size_t totalSize;
    };
    Sure there are more allocations (and more frees later on), but the pool manager is going to have a lot more chance of joining adjacent blocks together into larger free blocks to satisfy future requests.

    What you never end up doing is trying to realloc xMB into yMB and finding that it doesn't fit in any free block.
    Salem,

    I see what you're proposing, but I worry that it's not in the spirit of C. Surely C allows me to store a string in a flat memory space and operate on it directly using <string.h>. If I use your suggestion, then I'll have to wrap all my string-handling functions.

    I figure that any modern operating system that can handle a TCP/IP stack is going to have virtual memory plus other trickery going on. I know that Linux doesn't actually truly allocate the memory until it's actually used, for example. If you malloc() and then free() memory without using it at all, it does the paperwork but doesn't carve off the actual RAM.

    When I was a little boy I had an Amiga, and any time memory was allocated using the OS, it would permanently fragment the memory map until the next reboot. So avoidable allocation/deallocation was a very bad thing, especially when you start off with 512 k. I hope that all modern OS's, including AmigaOS, have moved on from those days.

    Richard

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Richard... why do you not centralize all incoming data. One function that does all your "getting"... give it a big buffer; malloc 10mb if needed. Now when something comes in it goes into 1 (one) place and guess what, you know the size! So when your getter returns or signals it can say "327677 bytes ready" and you can then use malloc to create a memory block the right size and memcpy or memmove to scoop up the data. No more guesswork.

    In my experience centralizing stuff like this ultimately leads to simpler, more robust code.

  8. #8
    Registered User
    Join Date
    Feb 2011
    Posts
    144

    Allocating a buffer

    Tater,

    The HTTP requests return data in pieces, each up to 4 kbytes long. A write callback function is used to put it together. The write callback can do anything it wants - serialize it to disk, print it to screen, checksum it, whatever.

    I guess you could have a persistent buffer for the write callback and put the returned data together sequentially, then malloc(strlen()+1) && strcpy(). But it's just as easy to put the received data directly into the new buffer. I figure that you would code it so that the initial malloc() allocates some fixed value, and when your buffer runs out of memory, you'll allocate up to the next multiple of the fixed value, where the fixed value is something like 4 kbytes. That way you'll realloc() every 4 kbytes but you won't have to realloc() every 20 bytes if your HTTP data comes in 20 byte chunks.

    Richard

  9. #9
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    You can use either scheme... but I would favor a getter function that returns the required buffer size... The caller is responsible for creating it's own memory space, fetching the data, doing whatever, then disposing of it when done. The reason I favor that is that you eliminate a lot of scoping problems with your temporary buffer pointers; the data is housed in the function operating on it. In a single tasking environment, you could even make the getter's buffer global (yes, sometimes it is appropriate) and operate directly on that one buffer, elminating all other buffers. In a multitasking environment you would use interprocess signalling to stop the getter until you copy the data out...

    You can use the realloc trick inside the getter but with 4 and 8 gb machines being popular these days I would hardly think a fixed buffer size of 10mb is much of an issue. (it's not like you're trying to shoe-horn this into some dodgy old 128mb antique, is it?)

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    It seems like a dumb question, but have you read the HTTP RFC?
    RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1 (RFC2616)

    For example, responses usually come with a "Content-Length"
    If you know this arrives in the response header, and you can parse it out, then you know immediately (and up-front) how much data you're going to get.
    Just allocate the buffer, fill it, and you're done - yes?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Robust error handling
    By Memloop in forum C Programming
    Replies: 6
    Last Post: 12-19-2009, 02:08 PM
  2. Where to declare variables - Best Practise
    By darren78 in forum C++ Programming
    Replies: 1
    Last Post: 09-27-2009, 03:18 AM
  3. Robust error checking
    By Memloop in forum C++ Programming
    Replies: 4
    Last Post: 09-21-2009, 10:45 AM
  4. Robust method for storing data outside of a program
    By goatslayer in forum C++ Programming
    Replies: 17
    Last Post: 09-19-2007, 03:08 PM
  5. Putting Programming Knowledge To Practise
    By DanMarionette in forum C++ Programming
    Replies: 5
    Last Post: 07-28-2007, 05:40 AM