Smarter List Allocation

**otaconBot** · 10-26-2012

Hey guys,

I think this question boils down to dynamically allocating memory. Is it more expensive/efficient allocating memory once, if I know the max size I will need (even if I'm likely to use only a small fraction of the allocated memory) ?

Let me explain my issue, and my possible approaches:

Per problem, I have a set number of variables, say 500. As I iterate through them I need to save and keep track of a few variables of interest in a subset list. I have no idea how many that's gonna be, until I go through all of them...so it can be anywhere from 0 to 500. Once that's done, I need to iterate through the newly created subset (now I know how large the subset is -- and its likely about 0-10 vars). However I do not know how many I need, until I examine the given variable..so its very dynamic.

Does it make sense to reallocate a new position for my subset list every-time I find a variable of interest dynamically? Or would it simply be better to allocate a subset array of size 500 initially, only fill up a small part of it, and just never look at the parts I don't need.

I imagine both approaches are correct...the first seems prettier...but its really efficiency I'm looking for...and I'm quite a n00b @ C.

Any thoughts, or alternative approaches very welcome!

**Barney McGrew** · 10-26-2012

It's implementation dependent, although in most cases it's typically more efficient to allocate one large block, instead of many smaller blocks. Consider getting a profiler and performing some tests.

**Salem** · 10-26-2012

How big is each object?
I mean, if they're only 100 bytes each, then 500 of them are only 50Kb.
Compare that to how many GB the average desktop machine has available.

Of course, if each object is 100KB, or you're working on an embedded device (or DOS) with only 1MB to play with, then you might need to modify your approach.

**phantomotap** · 10-26-2012

O_o

In your exact case, I'd use a common storage area based on a generic assumption and build into a list after that area has been consumed.

[Edit]
I'm actually assuming that you only need to store reference information and that you will ultimately look back at the original variables to do anything with them.
[/Edit]

Soma

**otaconBot** · 10-26-2012

Its relatively small..as in a bits: num_vars*(sizeof(char)) , but the length of it is unknown, my test benchmarks are about 100-2000 variables...but industrial testing can easily be 100x that ..so the lists can become quite large. And its also very likely the subset list will still only include a very small fraction of the original variables (likely not more then 1%, but in theory it can be up to the full variable length, so I have to account for that - forcing the single allocation to be muuuuuch larger then actually needed).

From your responses so far it maybe make sense to allocate something of say size 10% of the original list, on reallocate another 10% if needed. Best of both worlds, sort of.

**otaconBot** · 10-26-2012

Originally Posted by phantomotap

O_o

In your exact case, I'd use a common storage area based on a generic assumption and build into a list after that area has been consumed.

[Edit]
I'm actually assuming that you only need to store reference information and that you will ultimately look back at the original variables to do anything with them.
[/Edit]

Soma

Hm? I think I am. I might have been using the term list, but actually meaning just a basic array. Unless you mean something else here.

Oh, and @edit: its just a bit string, not even references. Sort of like a huge array of switches.

**phantomotap** · 10-26-2012

O_o

Code:

struct coredata
{
    /* whatever */
};

struct harvesteddata
{
    /* whatever */
};

struct chunksofharvesteddata
{
    int; /* how many entries in this chunk */
    struct harvesteddata *; /* pointer to chunk */
    struct chunksofharvesteddata *; /* next element in linked list of chunks if any */
};

struct problemdata
{
    /* whatever */
    struct coredata *; /* pointer to incoming data */
    struct chunksofharvesteddata *; /* create a first chunk of size "best guess" and expand as necessary */
};

Soma

**ledow** · 10-26-2012

When in doubt I tend to do the following:

- Allocate enough space for N elements, where N is a decent guess as to the maximum list size I'll need (1 malloc). Call this a "storage array".
- When needed, use the next one of array elements in the program (so no malloc required) and keep track of how many you've used up. (By "use the element", I mean you can do this via pointers but it gets messy if you realloc, and you only need to know the INDEX of the element in the allocated list - so I tend to just hand out array indices which the list code itself uses as a "pseudo-pointer").
- When the list grows close to N elements, double the size of N and realloc the storage array (1 realloc).

So basically you have:

Memory
|
Bunch of elements in a contiguous array that you double in size whenever you run out of them. (Storage array)
|
List referring to those elements by index into their storage array.

That keeps most lists with a single allocation (for the storage array). It requires no further malloc calls (which *can* be a performance hit, just through function overhead, if you call it often enough) for most usage. When you *do* exceed the storage array size, you are pretty sure you won't exceed it again (but if you do, double the size of it again). If you consistently and madly overrun your estimate for N, the worst that happens is you hit one or two extra reallocs.

And when it comes to tidying up, one free() and it all disappears.

Sure it uses a little more memory than strictly necessary but memory is cheap (literally Gigabytes free on any PC you care to run a program on) and memory allocations / CPU time generally aren't.

I've used it in a program that ran on an ARM chip because I *needed* to do lots of quick list-building and destruction and the individual malloc calls were killing performance. When you have 10,000 elements in an array, you can either get there by 10,000 mallocs (which might be spread throughout memory) or 1 malloc and a bit of handing off yourself.

**phantomotap** · 10-26-2012

When the list grows close to N elements, double the size of N and realloc the storage array (1 realloc)

I am compelled to mention: "and wait for potentially N copies".

which *can* be a performance hit, just through function overhead

No. If you are doing literally anything beyond spinning over an empty function call, that is not going to give you a performance hit.

literally Gigabytes free on any PC you care to run a program on

I have less than 1200 MiB free on the PC I'm using at this moment.

Stupidly using RAM is why the Java server I used to use for compatibility with old kit needed a 4 GiB swap.

The point of me throwing a couple extra GiB at a box was so that I could run more instances of software; I don't throw extra RAM at a box so moron programmers could misuse it.

I've used it in a program that ran on an ARM chip because I *needed* to do lots of quick list-building and destruction and the individual malloc calls were killing performance.

The allocator implementation you used sucked.

When you have 10,000 elements in an array, you can either get there by 10,000 mallocs (which might be spread throughout memory) or 1 malloc and a bit of handing off yourself.

An array can't be "spread throughout memory" from program perspective.

[Edit]
Just to be clear: I don't find a problem with the generic suggestion of "use a doubling array" when an array is actually necessary.

That said, proper use of theory, allocations, and if necessary better allocator implementations for the win.
[/Edit]

Soma

**Nominal Animal** · 10-26-2012

(Note: this is very similar to some of the suggestions above; I just like to be explicit.)

Originally Posted by otaconBot

Its relatively small..as in a bits: num_vars*(sizeof(char))

In that case, I'd recommend a good-sized initial allocation, then reallocation when necessary. For example:

Code:

struct dataset {
    size_t          size;  /* Allocated */
    size_t          used;  /* Used */
    unsigned char  *data;
};

#define EMPTY { 0, 0, NULL }

Assuming your application runs on a typical desktop workstation or server, and has only one or just a few datasets at a time, I'd start with say 4096 byte allocations; between 64k and 8M use 25% increase, and after 8M, increase by fixed 2M:

Code:

static inline int dataset_need(struct dataset *const  set, const size_t  chars)
{
    if (set->used >= set->size || set->used + chars > set->size) {
        size_t  need = set->used + chars;
        void *data;

        if (need < 65536)
            need = (need | 4095) + 1;
        else
        if (need < 8388608)
            need = (need | 2097151) + 1;
        else
            need = (5 * need) / 4;

        data = realloc(set->data, need * sizeof (*(set->data)));
        if (!data)
            return errno = ENOMEM; /* Cannot grow, but data still intact. */

        set->size = need;
        set->data = data;
    }

    return 0;
}

Note that if you initialize your dataset to EMPTY (size and used 0, data NULL), you don't need any initial allocation, just make sure you call dataset_need(&set, chars) before adding chars new values. It will return nonzero (ENOMEM) if the dataset cannot be resized (out of memory). Even then the current data is still accessible and valid; the function just failed to make room for new data. The function is static inline, meaning the compiler will optimize it just as effectively away as if it was a preprocessor macro, so calling it often should not be a (big) performance issue.

The rationale for my suggestion is based on the balance between allocated but unused memory, and the number of realloc() calls.

To grow the data to up to 65536 bytes, you do 16 realloc() calls, but you know you always have less than 4096 allocated but unused bytes (wasted). To grow the data from there to 8M, you do an additional 22, for a total of 38 realloc() calls. At any point, you know you have less than 25% allocated but unused. After that, you do a realloc() call for every new 2M, and are guaranteed to not have more than 2M allocated but unused.

If this exact strategy does not fit your needs, the reallocation size is trivial to modify to fit your needs.(I do not wish to imply this strategy is likely to be the best, it's just the one I'd start with, and see if it can be made better for typical workloads, without penalizing odd cases too much. I wanted to show the example code and example case, so you'd see how few of those realloc() calls you need in practice with a typical strategy.

That said, if there was a lot of churn -- not just steady increase storing incoming data, but also data discarded --, I'd use a different approach, probably very similar to phantomotap's suggestion.

Finally, I'd seriously reconsider your algorithm, to see if it was possible to process the data concurrently to input, or online. For example, whether you can discard data that is proven to not affect the results, or perhaps use a binary heap or something to keep only the most important data. Algorithmic advances tend to have bigger rewards than any code optimization, in my experience.

**phantomotap** · 10-26-2012

its just a bit string

I didn't see this earlier.

Now, to me "bit string" says "array" so you probably can't use any data structure more complex.

Every optimization you could make involves trading off unused storage for fewer allocations. (This is well explained in earlier posts.)

In that case, I would probably rely on `realloc' to do the job for which it was designed. Again, there is nothing wrong with "doubling up" or something more finer grained (`dataset_need'), but a lot of vendors ship a great `realloc' these days.

Soma

Thread: Smarter List Allocation

Thread Tools

Search Thread

Display

Smarter List Allocation

Similar Threads

make the computer smarter

stl list dynamic allocation

Making Tic Tac Toe Smarter

a smarter STL?

Memory Allocation for self made Linked List

Tags for this Thread