Thread: Custom Memory Allocation

  1. #1
    Registered User
    Join Date
    Sep 2009
    Posts
    48

    Custom Memory Allocation

    I recently discovered that dynamic memory allocation is pretty slow, so I've been thinking of a way to get around that. I've been thinking about going into low level video game engine programming after college/uni (studying Maths/Physics in college, then Computer Science at Uni), and it occurs to me that dynamic memory allocation is a fairly important thing in the engine and therefore, if it's slow, it's a potential bottleneck.

    I imagine one of the problems with dynamic memory allocation is that you have to ask the OS for the memory, then wait for it to find some and give it to you (please feel free to correct me if I'm mistaken). So I came up with the idea of allocating the memory to a buffer during initialisation, then I can use my own custom methods to manage that memory throughout the application. Since I have already allocated the memory, I'm simply passing around addresses from my buffer rather than asking the OS for more memory. I can't imagine this is something new, though, so I'd appreciate it if anyone can give me some advice, show me some material to look at, or point out any problems with this idea (e.g. is there actually a significant performance boost using this method?).

    This is something I quickly threw together as an example, it is by no means supposed to be a real world implementation:

    Code:
    #include <iostream>
    
    // Allocate 1GiB of memory
    char* g_buffer = new char[1024 * 1024 * 1024];
    
    typedef struct
    {
    	float x;
    	float y;
    	float z;
    	float w;
    } test;
    
    // Allocates a block of memory depending on the size requested
    void* AssignMemory(int size)
    {
    	static int offset = 0;
    	offset += size;
    	return (void*)&g_buffer[offset-size];
    }
    
    int main()
    {
    	int* p = (int*)AssignMemory(sizeof(int));
    	double* p2 = (double*)AssignMemory(sizeof(double));
    	float* p3 = (float*)AssignMemory(sizeof(float));
    
    	short* p4 = (short*)AssignMemory(sizeof(short));
    	test* p5 = (test*)AssignMemory(sizeof(test));
    
    	*p = 20;
    	*p2 = 33.23;
    	*p3 = 2.2f;
    	*p4 = 223;
    	
    	p5->x = 2.0f;
    	p5->y = 3.0f;
    	p5->z = 4.0f;
    	p5->w = 5.0f;
    
    	std::cout << "p: " << p << "\t*p: " << *p << std::endl;
    	std::cout << "p2: " << p2 << "\t*p2: " << *p2 << std::endl;
    	std::cout << "p3: " << p3 << "\t*p3: " << *p3 << std::endl;
    	std::cout << "p4: " << p4 << "\t*p4: " << *p4 << std::endl;
    	
    	std::cout << "p5: " << p5 << std::endl;
    	std::cout << "p5->x: " << p5->x << "\tp5->y: " << p5->y << std::endl;
    	std::cout << "p5->z: " << p5->z << "\tp5->w: " << p5->w << std::endl;
    
    	std::cin.get();
    
    	delete [] g_buffer;
    	g_buffer = 0;
    	p = 0;
    	p2 = 0;
    	p3 = 0;
    	p4 = 0;
    	p5 = 0;
    
    	return 0;
    }
    So far, this all seems to work with no apparent issues (each pointer contains the expected address (relative to the first address), and each value is correct).

    EDIT: Arrays are actually just like using malloc(...) in C:

    Code:
    int main()
    {	
    	// ...
    
    	int arrSize = 10;
    	int* arr = (int*)AssignMemory(sizeof(int) * arrSize);
    		
    	for(int i = 0; i < arrSize; ++i)
    		arr[i] = i;
    	
    	for(int i = 0; i < arrSize; ++i)
    		std::cout << "arr[" << i << "]: " << arr[i] << std::endl;
    
    	// ...
    }
    Last edited by Ushakal; 08-22-2011 at 10:35 AM. Reason: Added array example.

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,412
    A search of the Web for "C++ memory pool" and "C++ custom allocator" would probably yield interesting results.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Also when on windows you can use private heaps to speed things up...

    Heap Functions (Windows)

  4. #4
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    Quote Originally Posted by Ushakal View Post
    So I came up with the idea of allocating the memory to a buffer during initialisation, then I can use my own custom methods to manage that memory throughout the application.
    You've just reinvented what malloc does, only you haven't figured out to how free memory yet. Nor have you determined a way to handle fragmentation, nor a way to make your approach cache and TLB friendly. You're also not thread-safe. And you have no way to handle more than 1GB nor a way to even detect you've gone beyond that limit. And if you fix that, you'd need to fix it again to handle more than 2GB on common systems.

    Granted, maybe you intended to leave these out for a performance gain. Have you bench marked your implementation against the standard library?

    See A Memory Allocator for additional points to consider.

    Since I have already allocated the memory, I'm simply passing around addresses from my buffer rather than asking the OS for more memory.
    In modern OS implementations, asking for 1GB of memory doesn't mean it gets allocated. You first have to use it - and that means page faults for your implementation just like malloc.

    I can't imagine this is something new, though, so I'd appreciate it if anyone can give me some advice, show me some material to look at, or point out any problems with this idea (e.g. is there actually a significant performance boost using this method?).
    There are many ways to get it wrong, so unless you have code which hits some strange corner case in the default implementation you're probably wasting your time. Even then you're as likely to make it worse than better unless you have a lot of time to spend on it. It would be an interesting learning experience, though, but remember that you're competing against something that's been in development for decades.

  5. #5
    Registered User
    Join Date
    Sep 2009
    Posts
    48
    Quote Originally Posted by KCfromNC View Post
    Granted, maybe you intended to leave these out for a performance gain. Have you bench marked your implementation against the standard library?
    Like I said, the code I posted isn't meant to be a real world implementation; it's simply an example of what I mean (and the actual amount of memory allocated isn't of importance right now). I want to know if allocating memory into a buffer within my application in order to reassign that memory using my own methods would see a performance gain over using new/delete all the time. Naturally I'd have to account for all of those situations you listed in a real world example, I just want to know if there's anything to gain here. I don't mean to try and replace malloc/new/delete, simply "pre-allocate" the memory so it's readily available, and reassign it when needed.

    In modern OS implementations, asking for 1GB of memory doesn't mean it gets allocated. You first have to use it - and that means page faults for your implementation just like malloc.
    What if I initialise the buffer then? Will that avoid that problem? I tried it with my simple example, by adding this line in main:

    Code:
    // ...
    for(int i = 0; i < 1024 * 1024 * 1024; ++i)
          g_buffer[i] = 0;
    // ...
    On my PC, it takes about 2300-2700 ms to initialise the entire GiB, which may be worthwhile if the performance gains later on in the app are notable. Does doing this fully load the physical memory I've allocated, or will the memory be unloaded if it isn't used again for a while?

    When I tested it using this code:
    Code:
    StartCounter();
    
    double count1 = GetCounter();
    
    g_buffer = new char[1024 * 1024 * 1024];
    
    double count2 = GetCounter();
    
    std::cout << "Time to allocate buffer: " << count2 - count1 << "ms" << std::endl;
    
    for(int i = 0; i < 1024 * 1024 * 1024; ++i)
    	g_buffer[i] = 0;
    
    double count3 = GetCounter();
    
    std::cout << "Time to initialise buffer: " << count3 - count2 << "ms" <<std::endl;
    	
    char* newPtr = (char*)AssignMemory(sizeof(char) * 1024 * 1024 * 512);
    char* newPtr2 = (char*)AssignMemory(sizeof(char) * 1024 * 1024 * 512);
    
    double count4 = GetCounter();
    	
    std::cout << "Time to reassign memory: " << count4 - count3 << "ms" << std::endl;
    The output was consistently in the region of:

    Time to allocate buffer: 280 - 290ms
    Time to initialise buffer: 2300 - 2700ms
    Time to reassign memory: 1.4 - 1.5ms

    Naturally the result may vary depending on the machine/OS, but the results look promising (unless I've made a mistake somewhere). I think I'll look into Memory Pools and Custom Allocators some more information anyway (thanks for the direction laserlight).

    There are many ways to get it wrong, so unless you have code which hits some strange corner case in the default implementation you're probably wasting your time. Even then you're as likely to make it worse than better unless you have a lot of time to spend on it. It would be an interesting learning experience, though, but remember that you're competing against something that's been in development for decades.
    Yeah, I know that far more experienced coders than I have been mulling over a multitude of different ways to improve memory allocation performance, but like you said it could still be a good learning experience to investigate (so long as it's not too much of a waste of time).

    I'm interested in low level systems programming, so things like this are quite interesting to me. Incidentally, do you know of any other good articles that cover other low level systems programming topics?

  6. #6
    Registered User
    Join Date
    Aug 2003
    Posts
    127
    Code:
    for(int i = 0; i < 1024 * 1024 * 1024; ++i)
          g_buffer[i] = 0;
    This snippet of code is real a performance killer. There are some goods way to do it
    Code:
    const size_t size = 1024 * 1024 * 1024;
    memset(g_buffer, 0, size); //about 318ms
    ZeroMemory(g_buffer, size);  //about 308ms
    
    for(int i = 0; i < size / 4; i += 4)    //about 302ms
    {
    	buf[i] = 0;             //instruction level parallelism within a single processor.
    	buf[i + 1] = 0;
    	buf[i + 2] = 0;
    	buf[i + 3] = 0;
    }
    Quote Originally Posted by CommonTater View Post
    Naturally the result may vary depending on the machine/OS, but the results look promising (unless I've made a mistake somewhere). I think I'll look into Memory Pools and Custom Allocators some more information anyway (thanks for the direction laserlight).
    OS will lock the operation and search the free memory during dynamic memory allocation. It means OS will block your allocation till other process which is allocating finishes allocation. Allocating a big-size memory in advance and managing it by yourself is a way to reduce the race condition.
    Additionally, if your program is not allocated/deallocated frequently, the increase of performance is not obvious by using memory pool.
    Nana C++ Library is a GUI framework that designed to be C++ style, cross-platform and easy-to-use.

  7. #7
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Code:
    // ...
    for(int i = 0; i < 1024 * 1024 * 1024; ++i)
          g_buffer[i] = 0;
    // ...
    O_o

    That is just wrong. Memory may still be paged in and out even after using it.

    That said, happily, paging has nothing to do with the performance differences between average use of a general purpose allocator and appropriate use of a purpose built allocator.

    Soma

  8. #8
    Registered User
    Join Date
    Sep 2009
    Posts
    48
    Quote Originally Posted by phantomotap View Post
    That is just wrong. Memory may still be paged in and out even after using it.
    Can you point me in the direction of any decent articles about memory paging? I know it's wrong, I just don't know why. What are the consequences of a page fault and why do they occur? Are there any good programming practices that result in fewer page faults?

  9. #9
    Registered User
    Join Date
    May 2011
    Location
    Around 8.3 light-minutes from the Sun
    Posts
    1,949
    Quote Originally Posted by Ushakal View Post
    Can you point me in the direction of any decent articles about memory paging? I know it's wrong, I just don't know why. What are the consequences of a page fault and why do they occur? Are there any good programming practices that result in fewer page faults?
    This is a pretty big topic, however here are a couple of links that should keep you busy
    A memory allocator
    Writing a memory manager - OSdev wiki
    Slab Allocation - wiki
    Paging - Osdev wiki
    Quote Originally Posted by anduril462 View Post
    Now, please, for the love of all things good and holy, think about what you're doing! Don't just run around willy-nilly, coding like a drunk two-year-old....
    Quote Originally Posted by quzah View Post
    ..... Just don't be surprised when I say you aren't using standard C anymore, and as such,are off in your own little universe that I will completely disregard.
    Warning: Some or all of my posted code may be non-standard and as such should not be used and in no case looked at.

  10. #10
    Registered User
    Join Date
    Sep 2009
    Posts
    48
    Thanks.

  11. #11
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Quote Originally Posted by jinhao View Post
    Code:
    for(int i = 0; i < 1024 * 1024 * 1024; ++i)
          g_buffer[i] = 0;
    This snippet of code is real a performance killer. There are some goods way to do it
    Code:
    const size_t size = 1024 * 1024 * 1024;
    memset(g_buffer, 0, size); //about 318ms
    ZeroMemory(g_buffer, size);  //about 308ms
    
    for(int i = 0; i < size / 4; i += 4)    //about 302ms
    {
    	buf[i] = 0;             //instruction level parallelism within a single processor.
    	buf[i + 1] = 0;
    	buf[i + 2] = 0;
    	buf[i + 3] = 0;
    }

    OS will lock the operation and search the free memory during dynamic memory allocation. It means OS will block your allocation till other process which is allocating finishes allocation. Allocating a big-size memory in advance and managing it by yourself is a way to reduce the race condition.

    Additionally, if your program is not allocated/deallocated frequently, the increase of performance is not obvious by using memory pool.
    Both things you did here are examples of (bad) pre-mature optimization that won't speed things up at all, because any half decent modern optimizer will do this for you.
    Look up "partial loop unrolling" and "common sub-expression elimination".

    It only makes the code harder to read, maintain, and potentially slower, because the compiler/optmizer may get confused by the more complex code, and fail to perform more optimizations.

    Learn what the optimizer can do first, and if you want to optimize further, focus on what it CANNOT do. Most of the time that means algorithmic optimization.

    Also, this is not race condition. Race condition is something else. This is just synchronization bottleneck.

  12. #12
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I want to know if allocating memory into a buffer within my application in order to reassign that memory using my own methods would see a performance gain over using new/delete all the time.
    All modern C standard libraries do this for you already. On POSIX systems malloc() uses sbrk() to allocate large blocks from the OS, and divide them up and give them to you. Something similar probably happens on Windows.

    The reason why malloc() is slow (relatively speaking) is because it knows nothing about your memory usage pattern, and must work reliably and be reasonably fast and space-efficient in all situations (imagine a program that allocates alternating 10GB and 2 bytes blocks, and free them in random order).

    The only way you can make something faster is if you have additional information about your memory usage pattern that you can exploit -
    For example, all allocations are in 4KB blocks, you always deallocate in reverse order of allocation, etc.

    If you can just write a better general purpose allocator, they would have made it malloc() already.

  13. #13
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    If you do many small allocations, then malloc/new will generally be rather slow. This is a known problem, and that is why memory pools exist.
    If you do larger, but fewer allocations, then new will generally be efficient.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Problem with custom dynamic memory allocation routines
    By BLauritson in forum C++ Programming
    Replies: 12
    Last Post: 03-11-2010, 07:26 AM
  2. Memory allocation
    By kernelio in forum C Programming
    Replies: 6
    Last Post: 10-11-2007, 10:19 AM
  3. Custom Allocation
    By SevenThunders in forum C Programming
    Replies: 17
    Last Post: 04-09-2007, 04:10 PM
  4. memory allocation
    By mbooka in forum C Programming
    Replies: 3
    Last Post: 02-28-2006, 03:13 PM
  5. Memory Allocation
    By Marcelo in forum C Programming
    Replies: 1
    Last Post: 10-26-2001, 08:43 AM