Assignment Operator, Memory and Scope

**SevenThunders** · 03-27-2008

Originally Posted by Daved

I'm not very familiar with garbage collection in C++. Why do you need the delete [] arr if you're using garbage collection?

I want to at least understand typical programming practices in C++, especially if at some time in the future I'm forced to pull the garbage collector out. I've not really done that much programming in C++. I've done a lot of C programming, a decent amount of Java, a lot of Matlab, some D and a lot of Haskell, but never really had a chance to program in C++ oddly enough.

Thus for now I'm actually building destructors and even making a passing attempt at freeing arrays that I know need to be freed, or otherwise commenting the places in the code where such things need to happen in the future.

I find the GC convenient however and even comforting in an odd way. It allows for different programming styles and is especially convenient when I have to allocated a lot of small objects and arrays and don't want to worry about memory leaks. Perhaps for now it's just a nice way of easing into the language.

**CornedBee** · 03-27-2008

If you've done a lot of C, you should have the memory management issues pretty much down. In C++, you have the great added advantage of having automatically called constructors, copy assignment operators and destructors to help you in managing your calls to new and delete.

**Elysia** · 03-27-2008

Garbage Collection and destructors don't always agree. Because memory can be deleted at any time, there's no guarantee that destructors will be called "at the right time."
There's an alternative in C++, as well. Smart pointers. They usually do reference counting and delete objects when the reference count reaches 0. You can try that, as well. It means no need for freeing memory yourself and is similar (and better IMHO) than a garbage collector.

**CornedBee** · 03-27-2008

I would say that it is effectively impossible to remove automatic garbage collection from a project that uses it. Tracking down every single allocation is hard enough, but GC also leads to programming patterns where alternatives simply aren't possible.

**Elysia** · 03-27-2008

Well, maybe. Maybe not. But at least it's something to consider for other projects.

**SevenThunders** · 03-27-2008

Originally Posted by CornedBee

If you've done a lot of C, you should have the memory management issues pretty much down. In C++, you have the great added advantage of having automatically called constructors, copy assignment operators and destructors to help you in managing your calls to new and delete.

For any large project that you've written in C, have you escaped without memory leaks or writes to the wrong place in memory? Thank God for tools like valgrind or it may take forever to find these bugs.

One of my largest projects implemented a matrix library on a stack, calling underlying BLAS libraries for performance. I allowed multiple views into the same matrix so I used reference counting (a form of garbage collection) to manage matrix memory. I also riddled my code with tons of array bounds and dynamic type checking.

Once that was debugged (and that involved tracking down at least 1 memory leak and a few other pointer issues), I was largely free of memory errors from then on. IMHO memory error free in C is the exception not the rule. The other approach, used for more real time code is to allocate everything up front globally or in a stack frame and NEVER call malloc or free. That approach is what you will often see in real time code hosted by DSPs and other hardware. In fact I think some of those specialized C compilers may not even support malloc.

Thus my experience is that auto-managing your memory (and I include reference counting in that bag of tricks) makes for a far better programming experience, even and perhaps especially in C.

This reminds me of another memory management trick I've used in C that was surprisingly effective albeit not for the faint of heart. I would allocate a buffer for temporary variables with short lifetimes. A suballocator would just loop through the buffer in a circular fashion. Thus eventually anything allocated by this would be overwritten. Memory was viewed as a time limited, decaying commodity, but the suballocator was about as fast as could possibly be coded.

I would then tune the size of the temporary buffer so that it was large enough not to cause data corruption for the application at hand. I remember having the option to either copy the allocated object to more permanent memory or to simply refresh it periodically to keep it from being overwritten.

**brewbuck** · 03-27-2008

Originally Posted by SevenThunders

For any large project that you've written in C, have you escaped without memory leaks or writes to the wrong place in memory? Thank God for tools like valgrind or it may take forever to find these bugs.

Valgrind is miraculous. When we started using it, I was actually surprised at how few such problems we had in our (+1 million line) code base. Maybe a dozen or so. Most of them were incredibly bizarre bugs. I love it.

**Daved** · 03-27-2008

>> I want to at least understand typical programming practices in C++
That's why vector is being recommend. Typically, if you're using delete [] arr, you should be using a vector instead. If you're using garbage collection that's fine, but the vector recommendation is in reference to the use of the dynamic array that you are managing yourself.

**SevenThunders** · 03-27-2008

Originally Posted by Elysia

Garbage Collection and destructors don't always agree. Because memory can be deleted at any time, there's no guarantee that destructors will be called "at the right time."
There's an alternative in C++, as well. Smart pointers. They usually do reference counting and delete objects when the reference count reaches 0. You can try that, as well. It means no need for freeing memory yourself and is similar (and better IMHO) than a garbage collector.

OK so we've had this discussion on the other thread. As a matter of fact it's quite typical that many of your destructors will NEVER be called. Depending on the garbage collector, in fact the better ones, sort memory based on locality and/or usage patterns. It may free up a region and not touch a particular structure because it's region is not really full yet or the program may end prior to the need for a full collection.

So if you want to manage other resources besides memory, you use other techniques. D as an example, has specific hooks for this. However not needing destructors has some interesting infrastructure implications. There would be no need for overhead for reference counting, and there is no need for a bunch of excess (implicit) calls to the various destructors as you change scope etc.

In theory you could dispense with a lot of unnecessary copies that are wrapped around the assignment and copy constructors. That's useful for implementing mathematical objects that overload *, + etc. I also see advantages for GC when implementing things like tries, trees, graphs and other complex self referential data structures. Reference counting fails in the presence of cycles and the overhead and extra logic required to manage memory for these beasts is quite extraordinary. That's been my personal experience.

Finally in response to corned bee, it really depends how you are using your GC. The Hans Boehm collector can be used as a leak detector. You write 'normal' C++ code but use the GC version of new. You can then check to see if there are any uncollected blocks of memory at various parts of your program. However I think you are right in that you either take advantage of your GC or not and it effects the way you program.

**Elysia** · 03-27-2008

SevenThunders,
Perhaps. But as you say yourself, it can eliminate calls to the destructor and that is not always a good thing at all. So you need to think very carefully before using a GC. It will essentially destroy how C++ managed objects.
However, the garbage collector has its own overhead, as well. It must keep track of all the memory. And if it rearranges stuff in memory, then it needs to update references and pointers, which can be expensive. And if the design is single threaded, then it needs stop execution at a certain point and do a little housecleaning, which can be hazardous for time critical applications. If it's multi-threaded, then it needs to make sure it can lock all and every pointer so it can perform maintenance on them while they're being used. This introduces a lot of overhead with atomic operations or locking when it's in an update cycle.

I tried writing my own GC a while ago, so I certainly do have some experience. But in the end, it seems that there's just no suitable API for a GC, so I gave up.

GC can, in fact, be dangerous and complicated to use. And they might not be that much better than smart pointers in all situations either.

**SevenThunders** · 03-27-2008

Originally Posted by Elysia

SevenThunders,
Perhaps. But as you say yourself, it can eliminate calls to the destructor and that is not always a good thing at all. So you need to think very carefully before using a GC. It will essentially destroy how C++ managed objects.

It may destroy some common C++ paradigms that is correct, but then it gives you the freedom to do things a little differently and perhaps helps prevent certain types of runtime errors.

However, the garbage collector has its own overhead, as well. It must keep track of all the memory. And if it rearranges stuff in memory, then it needs to update references and pointers, which can be expensive. And if the design is single threaded, then it needs stop execution at a certain point and do a little housecleaning, which can be hazardous for time critical applications.

Modern real time garbage collectors can give you guaranteed returned times. Collecting unused blocks in one sweep is more efficient then doing a lot of new deletes and reference counting.

If it's multi-threaded, then it needs to make sure it can lock all and every pointer so it can perform maintenance on them while they're being used. This introduces a lot of overhead with atomic operations or locking when it's in an update cycle.

I never use multi-threaded code this way so I'll take your word for it. Usually I give my threads complete ownership over inherited data ie I try to avoid directly sharing data, but then my use for threads is probably a bit different than others.

I tried writing my own GC a while ago, so I certainly do have some experience. But in the end, it seems that there's just no suitable API for a GC, so I gave up.

GC can, in fact, be dangerous and complicated to use. And they might not be that much better than smart pointers in all situations either.

I presume you mean there is no suitable API for a GC in C++? It's not an all or nothing proposition really. I choose to make all new classes children of the GC class so that new is automagically GC'd for them. I otherwise don't touch what I use from std and avoid using new with these objects. I also use new (GC) for raw low level array allocation. Works for me so far and is not particularly complicated.

What's complicated actually are the crazy semantics of C++. Once you understand it, it's not so bad, and I'm even starting to enjoy it. However it's quite a bit easier and more productive to program in languages with a built in GC. Compare say, Java, D, or some of the functional languages such as Haskell, Ocaml or even Lisp/Scheme.

Other than D, however these languages underperform computationally and D is not yet mature enough to have good development and debugging tools. Thus C++ is still pretty attractive (especially if I can have a garbage collector too

).

**Elysia** · 03-27-2008

Originally Posted by SevenThunders

It may destroy some common C++ paradigms that is correct, but then it gives you the freedom to do things a little differently and perhaps helps prevent certain types of runtime errors.

Prevent runtime errors? It seems to me it will only worsen it since you will have pretty much no idea when the error occurred since objects can be moved around and deleted at any time. What do you mean, exactly?

Modern real time garbage collectors can give you guaranteed returned times. Collecting unused blocks in one sweep is more efficient then doing a lot of new deletes and reference counting.

Yes, it's probably more effective to do a lot of freeing at the same time, but again, it has other overhead that can negate this...

I never use multi-threaded code this way so I'll take your word for it. Usually I give my threads complete ownership over inherited data ie I try to avoid directly sharing data, but then my use for threads is probably a bit different than others.

Yes, that's good. But you can't have two threads accessing the same data at the same time, which is called a race condition. It means one thread can write some data and the other thread reading it at the same time, resulting in data corruption among other things...
A lock operation can take anywhere to 50-100 cycles per lock (for critical sections). Windows API CC takes about 100 cycles. I found an optimized one at CodeProject that takes around 50 cycles to lock.
I don't know fast atomic operations are, though. But they are much faster than locking.

I presume you mean there is no suitable API for a GC in C++? It's not an all or nothing proposition really. I choose to make all new classes children of the GC class so that new is automagically GC'd for them. I otherwise don't touch what I use from std and avoid using new with these objects. I also use new (GC) for raw low level array allocation. Works for me so far and is not particularly complicated.

What's complicated actually are the crazy semantics of C++. Once you understand it, it's not so bad, and I'm even starting to enjoy it. However it's quite a bit easier and more productive to program in languages with a built in GC. Compare say, Java, D, or some of the functional languages such as Haskell, Ocaml or even Lisp/Scheme.

Other than D, however these languages underperform computationally and D is not yet mature enough to have good development and debugging tools. Thus C++ is still pretty attractive (especially if I can have a garbage collector too

).

I meant suitable Windows API. There's no good function to release a lot of data at once.
I prefer smart pointers myself, seeing as I do write a lot of object-oriented code

**cpjust** · 03-27-2008

Originally Posted by SevenThunders

Collecting unused blocks in one sweep is more efficient then doing a lot of new deletes and reference counting.

But to do it properly, you'd need to run the destructors for all the objects in the huge block of memory you're deleting, so other than extra function call overheads, I don't see how it could make that much of a difference. Besides, then whenever the GC starts its deletion cycle, you'd have to wait a long time until all the memory is freed instead of just a lot of little waits by cleaning up memory as you go.
There's also the issue about total memory usage. If your program is a big memory hog, it will continue to grow until the system starts running low on RAM and starts paging out some of your data to disk. If everybody was using GC and using a lot of memory, this situation would occur even faster. Sure you could say install more RAM, but most of it is just junk waiting to be garbage collected, so why force people to install more RAM when they shouldn't have to?

**brewbuck** · 03-27-2008

Originally Posted by SevenThunders

Collecting unused blocks in one sweep is more efficient then doing a lot of new deletes and reference counting.

Why do you think so? The destructors eventually have to run anyway, so you aren't gaining anything by delaying them.

The argument about time spent deallocating blocks is moot, because the delete operator can do the exact same thing the GC does -- that is, instead of actually freeing blocks, place them on a free list, which is periodically swept, or potentially never swept at all, if there is no memory pressure. The advantage compounds itself, because if you never actually free any blocks, you can't get memory fragmentation.

And you gain the enormous advantage of having your destructors run when they're supposed to: when the pointer gets deleted.

**SevenThunders** · 03-27-2008

Originally Posted by Elysia

Prevent runtime errors? It seems to me it will only worsen it since you will have pretty much no idea when the error occurred since objects can be moved around and deleted at any time. What do you mean, exactly?

I'm talking about two types of errors that are usually only found during runtime. Dangling pointers (pointing to already freed blocks) and memory leaks. These two types of errors go away with GC and is one of the major attractions for the technology.

Yes, it's probably more effective to do a lot of freeing at the same time, but again, it has other overhead that can negate this...

For app.s with large blocks of persistent data and/or that rarely use new and or delete I agree with you. For app.s that have a lot of small blocks of data to allocate and delete I'm not so sure. Let's agree to disagree on this.

Yes, that's good. But you can't have two threads accessing the same data at the same time, which is called a race condition. It means one thread can write some data and the other thread reading it at the same time, resulting in data corruption among other things...
A lock operation can take anywhere to 50-100 cycles per lock (for critical sections). Windows API CC takes about 100 cycles. I found an optimized one at CodeProject that takes around 50 cycles to lock.
I don't know fast atomic operations are, though. But they are much faster than locking.

My interest in threads is primarily for multiprocessor systems and partitioning computational cycles. Usually you can partition the data with the computations, otherwise you use locking mechanisms like you describe. I like FIFOs for asynchronous comm.s between threads, but then I think often how hardware would do things instead of software. By the way, even at the hardware level it is theoretically impossible to create race free asynchronous data transfer. You can only reduce the probability to close to 0.

I know that there are multithreaded versions of various GC's. Boehm has one and how he manages it is an interesting question. I suppose there would be a few ways to do it. You could either clone the GC into each thread, or let the GC run in it's own thread and queue services, it's probably an interesting research topic all on it's own.

I meant suitable Windows API. There's no good function to release a lot of data at once.
I prefer smart pointers myself, seeing as I do write a lot of object-oriented code

I'll have to think about this statement a bit. Obviously Java does window management and they have a GC and of course there are windows APIs for many other garbage collected languages. I think if you are stuck with the API of an existing library in C++, that's not using garbage collection you are probably correct. You are relying on destructors to do more than free memory, you are probably tearing down other resources. You mess with that logic at your peril. I'm curious what sort of garbage collector you tried to write and why?

Also it makes sense at first to use object oriented code for windows management. After all each of those windows and dialogues have their own state, and that's how the libraries are structured. I have found personally some problems with the paradigm however as the complexity increases. Control flow can become nightmarish. Just what method am I executing now? If I want to add behavior X which darn object and wretched method do I use or overload to implement it?

However there are other approaches. Have you ever played around with functional languages? Haskell has some nice wrappers around some C and C++ gui libraries. I found it surprisingly easy to use. It tends to be more event driven, kind of like a giant state machine than pure object oriented. You can check out some examples here

http://en.wikibooks.org/wiki/Haskell/GUI

and here
http://haskell.org/gtk2hs/documentation/#examples