efficiency issues with downsizing vector or array

**robatino** · 12-26-2006

I am presently using an array to store a list of elements such that the exact length is not known in advance, only an upper bound. The elements are progressively added, and when done, it would be nice to tell the OS that the remaining space is not needed. In C, this could be done using realloc(). In C++, Stroustrup says that it's usually easier and just as efficient to use a standard container. I could use a vector and use resize when done to reduce the size to the exact amount used.
The question is, in this situation (either with realloc(), or with a C++ vector using resize), when the array or vector is downsized, does the OS typically try to copy the data elsewhere, or just leave it where it is? I guess it's a matter of avoiding memory fragmentation vs. the time involved in moving the data. If the data remains in place, it's a no-brainer to release the extra space.

**Daved** · 12-26-2006

I misread the question at first, let me try again.

If you don't have a good guess at the size of the vector, then don't call reserve at the start. Let it figure it out. If, after you have added all your items, you have much larger capacity() than size() (which can happen if you reserve too much space at the start or if you let vector size itself), then reducing the memory to fit the amount of data will be expensive because I believe the data has to be copied.

In the overall scheme of things, it is probably more efficient just to let the standard container handle things, and in the rare case that there really is too much memory wasted you can just create a new vector with the appropriate amount of space reserved and copy the data over.

**robatino** · 12-26-2006

I'm inclined now to create a temporary array of the maximum possible size, calculate the elements in it, and once the exact length is known, create a new array of the exact length, copy over, and delete the temporary. The thing is, there will be having tens or hundreds of thousands of these arrays existing simultaneously in memory, and it would be expensive both in terms of memory usage and memory fragmentation not to get rid of the excess. From what you've told me, copying is unavoidable, so I may as well just use plain arrays.
I also read that size isn't the same as capacity, and it's the latter that needs to be reduced. In

http://www.informit.com/guides/conte...eqNum=268&rl=1

it says that using swap() is the most efficient way to reduce the capacity, but that involves copying anyway.

**The Brain** · 12-26-2006

would a <list> data structure handle these memory issues more efficiently..?
(yes, i know it's not an array)

**CornedBee** · 12-26-2006

Sort of, but then, not really. A list will only allocate as many nodes as needed (probably - nothing really prevents an implementation from allocating several nodes at once so it doesn't have to allocate on every add, but that is not a very likely implementation), yes. But it has a considerable size overhead for each node (16 bytes on 64-bit machines - if you're storing 4-byte ints in that list, that's 80% wasted space).

I agree with Daved. Allocate the vectors as they are and just add the elements with push_back. The vector will resize as necessary. If the number of elements differs widely, this is often the most efficient.
If not - if, after profiling, you've determined that you're losing time exactly there - you can still replace the way the vectors are filled.

**grib** · 12-29-2006

"Premature optimization is the root of all evil"
-- Donald Kunth

One of vectors tricks is that excess capacity is uninitialized. what this means is that the os will never have to allocate any page frames. We don't really care how many pages we have asked for in virtual address space. It does give the os a bit of a headache in that it sees your program as potentially larger that it really is, and gives the heap more entries to track then necessary. But it's mostly just bookkeeping.

If you can throw away a lot of the entries when you are done then it's worth your time to copy them over to a new vector with an exact capacity.

as always, profile before you optimize

Thread: efficiency issues with downsizing vector or array

Thread Tools

Search Thread

Display

efficiency issues with downsizing vector or array

Similar Threads

Efficiency with c++

Programme Efficiency

Binary tree search efficiency

hexdump issues