Thread: Lazy allocation and *alloc()

  1. #1
    Kung Fu Kitty Angus's Avatar
    Join Date
    Oct 2008
    Location
    Montreal, Canada
    Posts
    115

    Lazy allocation and *alloc()

    Recently I read somewhere that calloc() does lazy allocation, in that it doesn't allocate any real memory, but just virtual memory. Real memory pages are only allocated as writes to it are performed, and any reads performed on uninitialized memory returns 0, just like a sparse file. This made a lot of sense to me, because it seemed strange that ANSI would come up w/a whole new function to do the same thing that malloc() followed by bzero() could do, and besides, I've never found a use for calloc() in my entire career, so far. So I thought that this would be great, because if I have a method that writes a blob to a db, and if the blob column in that db can be up to 1Gb, then my method can lazily allocate a whole Gb, and even if my app only ever handles 10-byte blobs, the most memory that would be wasted is the better part of a page.

    Since then I've tried to read up on this, and what I've read is that malloc() and calloc() allocate memory in the same way, and calloc() does actively initialize the memory. Furthermore, that lazy allocation I'm talking about is system-dependent, and if calloc() on a particular system does it, then so does malloc(). Can anyone confirm any of this for me? I'm mostly interested in the Linux kernel, since that seems to be all I program in these days.

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Angus View Post
    This made a lot of sense to me, because it seemed strange that ANSI would come up w/a whole new function to do the same thing that malloc() followed by bzero() could do
    Several ways to skin a cat. I do not think it is within the scope of the standard to define whether memory allocated should be "real" or "virtual", so unless you find an explicit statement to the contrary, that cannot be the reason for the two functions; calloc() is just a convenience.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Linux uses an optimistic memory allocation method. It works the way you described in that malloc will return a valid address, but pages will only be allocated as they are used. This allows the OS to use the memory in other applications and I/O paging while it waits for the app that allocated the memory to use it.

    There are some downsides to this approach though. Mainly, you cannot ever be sure that your memory allocation succeeded since malloc returns an address before it actually allocates the memory. If you attempt to use the memory when there is no space available, the OOM killer will simply kill your application (or another one on the system).

    Note that this behavior is configurable, and can be turned off by running:
    # echo 2 > /proc/sys/vm/overcommit_memory
    In which case memory allocations will behave the way they do on Windows.
    bit∙hub [bit-huhb] n. A source and destination for information.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > in that it doesn't allocate any real memory, but just virtual memory.
    All modern OS's allocate virtual memory.

    > Real memory pages are only allocated as writes to it are performed
    The virtual pages become mapped if that's what you mean.

    > This made a lot of sense to me, because it seemed strange that ANSI would come up w/a whole new function to do the same thing that malloc() followed by bzero() could do
    calloc pre-dates ANSI.


    > because if I have a method that writes a blob to a db, and if the blob column in that db can be up to 1Gb, then my method can lazily allocate a whole Gb,
    All this presumes a particular OS (or kind of OS).

    Since you mention Linux, you should be aware of the "OOM" issue. Linux will over-commit VM pages. So whilst your malloc may succeed, the kernel may subsequently fail to map a page of your allocation due to being out of memory.

    So you shouldn't lazily alloc 1GB because you can't be bothered to figure out a better answer. The kernel still has to account for your request in some way, even if it doesn't immediately cause a great deal of activity (it isn't free).
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Kung Fu Kitty Angus's Avatar
    Join Date
    Oct 2008
    Location
    Montreal, Canada
    Posts
    115
    Quote Originally Posted by Salem View Post
    >
    So you shouldn't lazily alloc 1GB because you can't be bothered to figure out a better answer. The kernel still has to account for your request in some way, even if it doesn't immediately cause a great deal of activity (it isn't free).
    Account for my request anymore than it would just, say, 10 bytes?

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well if you ask for 1GB, then you've used up an awful lot (of a finite number) of VM page table entries for no good reason.

    If you use them, then that's fine.
    But if you don't, nobody will be impressed by your code.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Kung Fu Kitty Angus's Avatar
    Join Date
    Oct 2008
    Location
    Montreal, Canada
    Posts
    115
    Quote Originally Posted by Salem View Post
    But if you don't, nobody will be impressed by your code.
    Is that the only drawback? If I was in the racket to impress folk I probably would have chosen pop music instead.

  8. #8
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Quote Originally Posted by Angus View Post
    Is that the only drawback? If I was in the racket to impress folk I probably would have chosen pop music instead.
    The other drawback is that you are relying on the OS behavior instead of proper coding techniques. What happens if you ever want to port your application to a non-Linux platform? What if a Linux user has the overcommit behavior turned off?
    bit∙hub [bit-huhb] n. A source and destination for information.

  9. #9
    Kung Fu Kitty Angus's Avatar
    Join Date
    Oct 2008
    Location
    Montreal, Canada
    Posts
    115
    Then
    1. The use of macros and #if #endif block should cure that
    2. A quick reading of /proc/sys/vm/overcommit_memory should tell me all I need to know

  10. #10
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Quote Originally Posted by Angus View Post
    Then
    1. The use of macros and #if #endif block should cure that
    2. A quick reading of /proc/sys/vm/overcommit_memory should tell me all I need to know
    Why don't you just allocate the amount of memory that you are using? This way you don't need to worry about this issue at all.
    bit∙hub [bit-huhb] n. A source and destination for information.

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    I'm done wasting effort - doesn't want to learn, stuck with a single idea and can't change it.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Kung Fu Kitty Angus's Avatar
    Join Date
    Oct 2008
    Location
    Montreal, Canada
    Posts
    115
    Quote Originally Posted by bithub View Post
    Why don't you just allocate the amount of memory that you are using? This way you don't need to worry about this issue at all.
    Because I don't always know how much that is, and I risk calling realloc() a lot

    That's a good philosophy, Salem. You should go by it more often
    Last edited by Angus; 09-08-2009 at 10:57 AM.

  13. #13
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Because I don't always know how much that is, and I risk calling realloc() a lot
    And what's wrong with calling realloc() a lot? Calling realloc() is exactly what you want to do if your memory usage is growing over time.
    bit∙hub [bit-huhb] n. A source and destination for information.

  14. #14
    Kung Fu Kitty Angus's Avatar
    Join Date
    Oct 2008
    Location
    Montreal, Canada
    Posts
    115
    Quote Originally Posted by bithub View Post
    And what's wrong with calling realloc() a lot? Calling realloc() is exactly what you want to do if your memory usage is growing over time.
    Well, every time you call realloc() to increase the size of the block you run the risk of having to copy all the data in that block. If you can rely on lazy allocation, though, you'd only have to deal with the expense of the odd page fault.

  15. #15
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Quote Originally Posted by Angus View Post
    Well, every time you call realloc() to increase the size of the block you run the risk of having to copy all the data in that block. If you can rely on lazy allocation, though, you'd only have to deal with the expense of the odd page fault.
    Well if it helps you sleep better at night, the glibc implementation of realloc() on Linux uses the mremap() system call. It rarely has to copy the existing memory, and will usually just allocate new virtual pages that are contiguous with the existing memory.
    bit∙hub [bit-huhb] n. A source and destination for information.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Dynamic array Allocation
    By deepak_j in forum C Programming
    Replies: 3
    Last Post: 08-17-2009, 07:18 AM
  2. dynamic allocation from 1 instead of zero
    By cfdprogrammer in forum C Programming
    Replies: 27
    Last Post: 04-28-2009, 08:21 AM
  3. pointer to array with dynamic allocation
    By cfdprogrammer in forum C Programming
    Replies: 22
    Last Post: 04-07-2009, 09:56 AM
  4. redundant allocation
    By George2 in forum C++ Programming
    Replies: 22
    Last Post: 03-06-2008, 06:43 PM
  5. Dynamic memory allocation.
    By HAssan in forum C Programming
    Replies: 3
    Last Post: 09-07-2006, 05:04 PM

Tags for this Thread