Thread: C Optimization for Memory Allocation

  1. #1
    Registered User
    Join Date
    Jun 2008
    Posts
    93

    C Optimization for Memory Allocation

    I was recently viewing a C optimization tutorial:

    C Optimisation tutorial

    I noticed in the misc section that it states:

    "If your library supports the mallopt() function (for controlling malloc), use it. The MAXFAST setting can make significant improvements to code that does a lot of malloc work.If a particular structure is created/destroyed many times a second, try setting the mallopt options to work best with that size. "

    Is this an accurate statement? I have been looking for examples in which people have used mallopt() rather than malloc and i can't seem to find much. If I am trying to optimize my code to it's maximum potential and I am performing a very larger number of mallocs for relatively small amounts of memory is it better to use mallopt()? If so where might I find some examples or more information on this?

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Interesting. Never heard of this. Here's my guesses about it, altho to know for sure I guess you need to do some profiling tests:

    Quote Originally Posted by GNU C ref for mallopt
    M_TOP_PAD
    This parameter determines the amount of extra memory to obtain from the system when a call to sbrk is required. It also specifies the number of bytes to retain when shrinking the heap by calling sbrk with a negative argument. This provides the necessary hysteresis in heap size such that excessive amounts of system calls can be avoided.
    Apparently, sbrk() sets the high end of the data segment, which will influence the low end of the heap. Presuming free() uses sbrk(), I guess you could use this to retain an amount equal to the struct size following free(), meaning the next malloc can conveniently use this empty space at the bottom (start) of the heap and not have to move anything else around????

    Quote Originally Posted by GNU C ref for mallopt
    M_MMAP_MAX
    The maximum number of chunks to allocate with mmap. Setting this to zero disables all use of mmap.
    I would guess not using mmap will be faster. However, the description of
    M_MMAP_THRESHOLD
    All chunks larger than this value are allocated outside the normal heap, using the mmap system call. This way it is guaranteed that the memory for these chunks can be returned to the system on free. Note that requests smaller than this threshold might still be allocated via mmap.
    Implies not using mmap will lead to more wasted space, ie, memory usage. So that's a trade-off.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    Registered User
    Join Date
    Jun 2008
    Posts
    93
    so is mallopt() just an API that allows us to manage how malloc() will handle memory allocation etc? Will I still use malloc() to perform all of my allocations after making the various mallopt calls?

  4. #4
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by fguy817817 View Post
    so is mallopt() just an API that allows us to manage how malloc() will handle memory allocation etc? Will I still use malloc() to perform all of my allocations after making the various mallopt calls?
    mallopt() does not appear to be standard (eg, it is not POSIX compliant) but it is in (at least) the GNU version of malloc.h.

    Malloc Tunable Parameters - The GNU C Library

    It's just a command to "configure" how subsequent malloc() calls will operate, sort of like how fcntl() sets options for subsequent commands that operate on a specific open file descriptor. They alter the default behaviour.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #5
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    If a particular structure is created/destroyed many times a second,
    If it's destroyed and created many times a second, why bother freeing/allocating it? Stick it on a list of unused structures, and pop it off when you need it.


    Quzah.
    Hope is the first step on the road to disappointment.

  6. #6
    Registered User
    Join Date
    Jun 2008
    Posts
    93
    OK. I took your advice and have implemented the following ring buffer. However, after running it through some timing tests I don't really see the performance gain I am going to need. Do you know any way to make the ring buffer faster? I am only seeing speed increases of about 4% when using buffer vs. malloc(). What do you guys think? I apologize in advance for the spacing.

    main.c
    Code:
    #include "ring_buffer.h"
    
    #include <stdlib.h>
    #include <stdio.h>
    #include <unistd.h>
    #include <pthread.h>
    
    static int ring_buffer_size = 2000000;
    static int block_size = 1600;
    
    ring_buffer_t mem_ring_buffer;
    
    void time_test(void);
    
    int main()
    {
    	time_test();
    	//thread_test();
    	return 0;
    }
    
    void time_test(void)
    {
    	int i;
    	clock_t t1,t2;
    	float f1, f2;
    	
    	float ratio;
    	ratio = 1./CLOCKS_PER_SEC;
    	
    	ring_buffer_init(&mem_ring_buffer, ring_buffer_size, block_size, 0);
    
    	t1 = clock();
    	for(i=0; i<ring_buffer_size; i++)
    	{
    		void* memory_chunk = ring_buffer_remove(&mem_ring_buffer);
    		ring_buffer_add(&mem_ring_buffer, memory_chunk, block_size);
    	}
    	t2 = clock();
    	
    	f1 = ratio*(long)t1 + ratio*(long)t2;
    	printf("Ring Buffer Time = %f \n", f1);
    	
    	t1 = clock();
    	for(i=0; i<ring_buffer_size; i++)
    	{
    		void* memory_chunk = malloc(block_size);
    		free(memory_chunk);
    	}
    	t2 = clock();
    	
    	f2 = ratio*(long)t1 + ratio*(long)t2;
    	printf("Malloc() Time = %f \n", f2);
    	
    	printf("Difference of %f.  Malloc() is %f # slower than ring buffer.\n", f2-f1, 100-(100* (f1/f2)));
    }
    ring_buffer.h
    Code:
    #ifndef RING_BUFFER_H
    #define RING_BUFFER_H
    
    #include <pthread.h>
    
    typedef struct ring_buffer_item
    {
    	void* item;
    	int size;
    }ring_buffer_item_t;
    
    typedef struct ring_buffer
    {
    	ring_buffer_item_t* buffer;
    	unsigned int size;
    	unsigned int getindex;
    	unsigned int putindex;
    	unsigned int num_items;
    	int use_locks;
    	pthread_mutex_t rb_mutex;
    }ring_buffer_t;
    
    int ring_buffer_init(ring_buffer_t* rb, unsigned int num_blocks, int block_size, int use_locks);
    int ring_buffer_add(ring_buffer_t* rb, void* item, int size);
    void* ring_buffer_remove(ring_buffer_t* rb);
    
    #endif
    ring_buffer.c
    Code:
    #include "ring_buffer.h"
    
    #include <stdlib.h>
    #include <stdio.h>
    
    int ring_buffer_init(ring_buffer_t* rb, unsigned int size, int block_size, int use_locks)
    {
    	int success = 1;
    	
    	rb->use_locks = use_locks;
    	
    	pthread_mutex_init(&rb->rb_mutex, NULL);
    	
    	rb->buffer = calloc(size, sizeof(ring_buffer_item_t));
    	
    	if(rb->buffer)
    	{
    		rb->size = size;
    		rb->num_items = 0;
    		rb->getindex = 0;
    		rb->putindex = 0;
    		
    		if(block_size)
    		{
    			int x;
    			for(x =0; x<size; x++)
    			{
    				rb->buffer[x].item = malloc(block_size);
    				rb->buffer[x].size = block_size;
    				
    				if(!rb->buffer[x].item)
    				{
    					printf("Could not allocate ring buffer block.\n");
    					success = 0;
    					break;
    				}	
    			}
    		}
    	}
    	else
    	{
    		printf("Could not initialize ring buffer.\n");
    		success = 0;
    	}
    	
    	return success;
    }
    
    int ring_buffer_add(ring_buffer_t* rb, void* item, int size)
    {
    	if(rb->use_locks)
    		pthread_mutex_lock(&rb->rb_mutex);
    	
    	if (rb->num_items >= rb->size)
    	{
    		if(rb->use_locks)
    			pthread_mutex_unlock(&rb->rb_mutex);
    			
    		return -1;
    	}
    
    	rb->buffer[rb->putindex].item = item;
    	rb->buffer[rb->putindex].size = size;
    
    	rb->putindex = (rb->putindex + 1) % rb->size;
    		
    	rb->num_items++;
    	
    	if(rb->use_locks)
    		pthread_mutex_unlock(&rb->rb_mutex);
    	
    	return 0;
    }
    
    void* ring_buffer_remove(ring_buffer_t* rb)
    {
    	if(rb->use_locks)
    		pthread_mutex_lock(&rb->rb_mutex);
    	
    	if ( !rb->num_items )
    	{
    		if(rb->use_locks)
    			pthread_mutex_unlock(&rb->rb_mutex);
    			
    		return NULL;
    	}
    
    	void* item;
    	rb->num_items--;
    	item = rb->buffer[rb->getindex].item;
    	
    	rb->getindex = (rb->getindex + 1) % rb->size;
    		
    	if(rb->use_locks)
    		pthread_mutex_unlock(&rb->rb_mutex);
    		
    	return item;
    }

  7. #7
    Registered User
    Join Date
    Jun 2008
    Posts
    93
    Anyone have any thoughts on this?

  8. #8
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Actually I was saying just make a pile and stick them there. I don't know that this really is going to be more efficient, but if you're thinking calling free and malloc over and over is painful...
    Code:
    type *myalloc( void )
    {
        type *p;
        if( store )
        {
            p = store;
            store = store->next;
        }
        else
            p = malloc( sizeof *p );
        return p;
    }
    Probably want to do something about your pointers and any old data, but you shouldn't be assuming the structure is valid to use as-is if you call malloc anyway.
    Code:
    void myfree( type *p )
    {
        if( p )
        {
            p->next = store;
            store = p;
        }
        else /* final free */
        {
            type *n;
            for( n = store; n; n = p )
            {
                p = n->next;
                free( n );
            }
        }
    }

    Quzah.
    Hope is the first step on the road to disappointment.

  9. #9
    Registered User
    Join Date
    Jun 2008
    Posts
    93
    So I would need to use something like the following for p? In your example you are calling malloc() on p* but don't you also have to call malloc on p->memory or something similar?

    Code:
    struct store_item
    {
        void* memory;
        store_item* next;
    }
    I didn't see anywhere in your example where you were allocating memory for the actual memory block that is to be stored in each element. And if this is the case and you have to malloc an entry for the structure and for the data you want to allocate I am guessing this woud not be very efficient as you stated above.

    Perhaps you were just posting pseudocode and I am scrutinizing too much.. I apologize if that is the case.

    Thanks for all of your help.

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    A "pool allocator" seems to be what you are after.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #11
    Registered User
    Join Date
    Jun 2008
    Posts
    93
    "pool allocators" are faster than malloc?

  12. #12
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by fguy817817 View Post
    I didn't see anywhere in your example where you were allocating memory for the actual memory block that is to be stored in each element.
    I was just saving nodes of an assumed linked list. I honestly wouldn't bother making new structures to store assorted random data types.

    I'm sure there's something out there that's faster than malloc, but I doubt I'd go looking for it unless that was somehow proven to be the bottleneck in my program.


    Quzah.
    Hope is the first step on the road to disappointment.

  13. #13
    Registered User
    Join Date
    Jun 2008
    Posts
    93
    What is a good profiling tool? I tried gprof but my program is multi-threaded and it loses its mind. There was a workaround but I have yet to get it to work. Do you know of any profilers that work well with multi-threaded apps?

  14. #14
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    I've heard good things about valgrind.

    Valgrind Home

  15. #15
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by fguy817817 View Post
    "pool allocators" are faster than malloc?
    Yes. The simplest ones can be barely a half-dozen assembly instructions for each allocation or deallocation.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. dynamic allocation from 1 instead of zero
    By cfdprogrammer in forum C Programming
    Replies: 27
    Last Post: 04-28-2009, 08:21 AM
  2. pointer to array with dynamic allocation
    By cfdprogrammer in forum C Programming
    Replies: 22
    Last Post: 04-07-2009, 09:56 AM
  3. Turn Off Optimization?
    By danlee58 in forum C Programming
    Replies: 6
    Last Post: 12-10-2008, 03:52 AM
  4. need reading material for c++ database optimization
    By elninio in forum C++ Programming
    Replies: 0
    Last Post: 07-24-2008, 11:32 PM
  5. Optimization settings
    By Roaring_Tiger in forum C Programming
    Replies: 4
    Last Post: 02-23-2005, 02:53 AM