Thread: Custom Allocation

  1. #1
    Registered User
    Join Date
    Apr 2007
    Posts
    141

    Custom Allocation

    Standard practice in C suggests that if you have to allocate a large number of fixed sized blocks, it is best to write a custom allocator, since malloc and free have 'too much' overhead associated with them. Well I went ahead and did this. I created an allocator that allocates a larger fixed size block, cuts it up into smaller pieces and then loads them up into a free list, which is nothing more than an array of pointers.

    Allocation simply returns the next available pointer
    Code:
     
    /** fixed block size allocator.  */
    typedef struct _blkmem {
        /** size of block. */
        int bsize ;
        /** length of free pointer list in memory in bytes. */
        int freelen ;
        /** number of pointers that are free. */
        int nfree ;
        /** array of free pointers */
        void **freeptrs ;
       /** length of malloc block list in memory in bytes. */
        int mlen ;
        /** array of pointers to malloced blocks. */ 
        void **allocs ;
        /**  number of blocks alloced */
        int nalloc ;
    } Fmem ; 
    
    
    
    void *allocF(Fmem *fm)
    {
    if (fm->nfree == 0) {
        moreFmem(fm) ;
    }
    fm->nfree = fm->nfree - 1 ;
    return((fm->freeptrs)[fm->nfree]) ;
    }
    and my custom free simply puts the freed block back into the array and increments
    the array index.

    Code:
    /** push a pointer to a list growing the list as needed. */
    inline void pushptr(void *nptr, void ***ptr, unsigned int *plen, unsigned int *mlen)
    {
    void **parr;
    unsigned int ns ;
    *plen = *plen+1;
    if (*plen * sizeof(void *) > *mlen) {
        /* grow the base array */
        *ptr = growmem(*ptr, *mlen, sizeof(void *), mlen) ;
    }	
    parr = *ptr ;
    parr[*plen-1] = nptr ;
    
    }
    
    
    /** free a fixed block and return it to the free list.*/
    void *freeF(Fmem *fm, void *ptr)
    {
    pushptr(ptr, &(fm->freeptrs), &(fm->nfree), &(fm->freelen)) ;
    }
    Now the shocking thing is that in my benchmarks using
    gcc version 4.1.1 20070105 (Red Hat 4.1.1-51), my custom allocator is only
    about 5-10% faster than malloc and free. The result holds even if I deliberately add a bunch of random mallocs and frees in between each malloc and free associated with the test case. Moreover for this test I deliberately create a situation where the edge conditions that require allocating more large blocks or growing my array of pointers happens only once. Profiling the code does not suggest any egregious performance bottlenecks. In theory I could get some incremental performance improvement by using pointers instead of array indices and perhaps using a shift instead of a multiply.

    Ive noticed that in some of the example code online for custom allocators people use linked lists instead of an array of free pointers, but I would think that would be potentially slower since it requires an additional memory access instead of just pointer or index arithmetic. Is it possible that there are some fantastic optimizations in the newer versions of malloc and free or does my code have some performance issues that I have overlooked? For flexibility I do need to use some indirection for my arrays, but I am frankly quite surprised.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > Standard practice in C suggests that if you have to allocate a large number of fixed sized blocks
    Where did you here that from?
    There's an awful lot of crud advice which is either very old or very compiler specific.
    Perhaps when people say large, then mean a few million, not a few hundred.

    > Is it possible that there are some fantastic optimizations in the newer versions of malloc and free
    Well they've had the same 20+ years to work on the problem as well, so I'm going on a limb here and say "yes".

    > In theory I could get some incremental performance improvement by using pointers
    > instead of array indices and perhaps using a shift instead of a multiply.
    No and No. Yet more premature optimisations which modern compilers are perfectly capable of performing for themselves even in debug builds, never mind release builds.

    > Ive noticed that in some of the example code online for custom allocators people use linked lists...
    I'd say your comment is accurate.
    If you use a similar approach as the library version, you're not going to find a 10x increase.

    My suggestion is you write the code, test it to make sure it works, then profile it with real data before making any attempt to try and figure out where the performance problems are.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by Salem View Post
    > Standard practice in C suggests that if you have to allocate a large number of fixed sized blocks
    Where did you here that from?
    There's an awful lot of crud advice which is either very old or very compiler specific.
    Perhaps when people say large, then mean a few million, not a few hundred.
    Which is why I allocate millions of blocks in my tests. As far as the advice to write you own allocator one might look at some of these links.
    http://www.codeguru.com/cpp/misc/mis...le.php/c13101/
    http://www.codeproject.com/useritems...Heap_Class.asp

    > Is it possible that there are some fantastic optimizations in the newer versions of malloc and free
    Well they've had the same 20+ years to work on the problem as well, so I'm going on a limb here and say "yes".
    I wouldn't be so smug about it. Crappy code has a way of persisting if it works.
    However, perhaps malloc and free in my compiler is using a better algorithm such as this one
    http://g.oswego.edu/dl/html/malloc.html

    I think it ends up doing fixed sized allocations as I do, but using several of them and hashing to the correct size. That might explain things.



    > In theory I could get some incremental performance improvement by using pointers
    > instead of array indices and perhaps using a shift instead of a multiply.
    No and No. Yet more premature optimisations which modern compilers are perfectly capable of performing for themselves even in debug builds, never mind release builds.
    Which is why I didn't bother to implement them.

    My suggestion is you write the code, test it to make sure it works, then profile it with real data before making any attempt to try and figure out where the performance problems are.
    I suppose so,
    however I'd be curious then what this version of malloc and free is doing because usually one has to deal with issues such as memory fragmentation which is usually handled by merging free blocks and one has to have a way of finding the appropriately sized free block.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > I'd be curious then what this version of malloc and free is doing
    Since you're using Linux / gcc / glibc, you can get the source code for your current implementation of malloc and verify what you suspect.

    > http://www.codeguru.com/cpp/misc/mis...le.php/c13101/
    I see a lot of bold claims, and no statement about whether the standard implementation was the 'debug' or 'release' build. One things for sure, if you go all out for performance, you sacrifice an awful lot in terms of say buffer overrun detection (and survival), and leak detection.
    My personal practical experience is the opposite, the MS standard implementation trounced a local attempt to try and do it better. Anyone can cook up a benchmark which makes it look good, but how does it fare in average real-world usage?

    OK, so your malloc is 25x times faster, but if the code only spends 1% of the time in the standard malloc, then no matter how much faster you make it, you're only going to make your program at most 1% quicker. So back to my original comment, write it, profile it, then figure out what really takes the time.

    > http://www.codeproject.com/useritems...Heap_Class.asp
    Which addresses a rather specific problem of how to allocate shared memory. I'd say this is an entirely reasonable reason for wanting to do such a thing. For starters, there would be a need to ensure that critical code in the allocator was correctly guarded for the multi process environment.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    Using a custom allocator is very sensible for certain problems. For example, for processing and representation of large graphs, or sparse matrices, or other particular datastructures. You use less memory and gain speed, especially with a good cache-exploiting garbage collection mechanism (_especially_ if you tell the garbage collector what operations you're going to do next, so that it can arrange nodes in a particular ordering).

    Of course, it's only useful if you need the speed.
    There are 10 types of people in this world, those who cringed when reading the beginning of this sentence and those who salivated to how superior they are for understanding something as simple as binary.

  6. #6
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by Rashakil Fol View Post
    Using a custom allocator is very sensible for certain problems. For example, for processing and representation of large graphs, or sparse matrices, or other particular datastructures. You use less memory and gain speed, especially with a good cache-exploiting garbage collection mechanism (_especially_ if you tell the garbage collector what operations you're going to do next, so that it can arrange nodes in a particular ordering).

    Of course, it's only useful if you need the speed.
    For my stuff I always need speed and yes custom allocation seems to be a very common exercise. I've heard claims that garbage collection can actually be faster than managed memory, I suppose based on the very idea of not having to deal with fragmentation or scanning free lists etc. You just keep allocating new memory until you reach a critical size and then attempt to garbage collect. I've never seen any benchmarks however. Hans Boehm makes some claims about his garbage collector being faster for frequent small allocations. My collector does nothing more than pop the new memory off of a stack! Yet malloc is competitive with this.

  7. #7
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by SevenThunders View Post
    I think it ends up doing fixed sized allocations as I do, but using several of them and hashing to the correct size.
    That's called spin. You were so quick to dismiss their 20 years to get it right that Salem mentioned, but suddenly when things are going good for them, they're copying you, not the other way around.


    Quzah.
    Hope is the first step on the road to disappointment.

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Perhaps you're just not being clever enough about your custom allocator.
    You can store the free-list within the unused portion of the memory block you have reserved. You can also avoid having to initialise the entire free-list beforehand. outside of this reserved block, only two pointers are needed.

    It all depends on what constraints you can place on the allocation system.
    If you're trying to make something too generic and flexible, then you end up re-implementing the one provided by the language, and may as well just use malloc.

  9. #9
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by quzah View Post
    That's called spin. You were so quick to dismiss their 20 years to get it right that Salem mentioned, but suddenly when things are going good for them, they're copying you, not the other way around.


    Quzah.
    This conversation is getting a little bizarre. I ask an innocent question concerning memory allocation performance and suddenly it becomes some kind of exercise in ego and narcissism. Let's just try to stick to the technical issues since I have no idea what you are talking about and even less an idea of what your agenda is.

  10. #10
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by iMalc View Post
    Perhaps you're just not being clever enough about your custom allocator.
    You can store the free-list within the unused portion of the memory block you have reserved. You can also avoid having to initialise the entire free-list beforehand. outside of this reserved block, only two pointers are needed.

    It all depends on what constraints you can place on the allocation system.
    If you're trying to make something too generic and flexible, then you end up re-implementing the one provided by the language, and may as well just use malloc.
    Yup I thought of that, but it seems you have to increment through the free block anyway as you allocate resources. Thus in the end it may be faster to do that incrementing up front when you first grab a block of data from memory. Doing so also helps maintain the locality of reference for the memory that holds your free list as you update the free list.

    It depends on what you are trying to do. If you want to support freeing your memory, then the little chunks you allocated have to be stored back in some kind of free list. Perhaps the two pointers you are referring to are the pointers to the new block and the pointers to the free list? If so then you are maintaining two types of allocation: pointers stored on the free list and a pointer into the fresh block of data allocated. If you use a linked list, then you maintain additional overhead to hold the next pointers. If you can allocate from a free list or from a newly allocated block then you must maintain additional logic.

    I will admit that just maintaining a pointer to the current entry in the free list is probably a little more efficient, but I was expecting much more improvement given all the work that malloc and free has to do. For some reason others have felt that this means I was denigrating their work. Quite the contrary.

  11. #11
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by SevenThunders View Post
    This conversation is getting a little bizarre. I ask an innocent question concerning memory allocation performance and suddenly it becomes some kind of exercise in ego and narcissism. Let's just try to stick to the technical issues since I have no idea what you are talking about and even less an idea of what your agenda is.
    You know exactly what I'm talking about, or rather you should, since you said it:
    Quote Originally Posted by SevenThunders View Post
    I wouldn't be so smug about it. Crappy code has a way of persisting if it works.
    Furthermore, it was an easy question to answer for yourself. "Do you think they might have, in all the time the C language has been having compilers written for it come up with something better than me?" ... Now who's got the Ego?


    Quzah.
    Hope is the first step on the road to disappointment.

  12. #12
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by quzah View Post
    You know exactly what I'm talking about, or rather you should, since you said it:Furthermore, it was an easy question to answer for yourself. "Do you think they might have, in all the time the C language has been having compilers written for it come up with something better than me?" ... Now who's got the Ego?


    Quzah.
    Awesome I can talk about C programming
    and get some abuse on the same forum. Thanks.

  13. #13
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    You'd probably be interested in this memory pool allocator I wrote a while ago:
    http://www.flipcode.com/cgi-bin/fcar...cgi?show=64152

  14. #14
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Does your allocation need work something like this?

    - allocate a large number of small objects
    - do some work, possibly allocating and freeing a much smaller number of objects
    - freeing a large number of small objects
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  15. #15
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by Salem View Post
    Does your allocation need work something like this?

    - allocate a large number of small objects
    - do some work, possibly allocating and freeing a much smaller number of objects
    - freeing a large number of small objects
    Yep. Basically imagine a data structure something like a ternary tree filled with a lot of data. Strings are being added and deleted constantly. Most of the required memory that needs to be allocated are nodes of fixed and smalll size and these nodes need to be freed when an entry is removed.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. dynamic allocation from 1 instead of zero
    By cfdprogrammer in forum C Programming
    Replies: 27
    Last Post: 04-28-2009, 08:21 AM
  2. pointer to array with dynamic allocation
    By cfdprogrammer in forum C Programming
    Replies: 22
    Last Post: 04-07-2009, 09:56 AM
  3. redundant allocation
    By George2 in forum C++ Programming
    Replies: 22
    Last Post: 03-06-2008, 06:43 PM
  4. Stl sets and custom classes.
    By cunnus88 in forum C++ Programming
    Replies: 3
    Last Post: 05-12-2006, 11:58 PM
  5. Dynamic allocation (I thought it would crash)
    By Baaaah! in forum C Programming
    Replies: 16
    Last Post: 11-30-2005, 05:10 PM