Personally, the only time I use volatile in production code is for variables (or data members) that may be changed asynchronously (for example, by interrupt or signal handlers).
I don't even use the volatile keyword in multi-threaded code (unless one or more of the threads is watching some data that might be changed by an interrupt or signal handler). If data is used in multiple threads, I will use synchronisation primitives (critical sections, mutexes, semaphores, etc depending on the operating system) or atomic types to prevent multiple threads accessing a variable simultaneously (i.e. force all threads to wait until another have finished before it accessed the variable). If I can prove through analysis that the only accesses of the data is read-only (eg if data is initialised before threads are spawned, and subsequent access by any threads is read-only) I don't even bother with synchronisation. If, however, the data is modifiable (i.e. one thread modifies the shared data, and others read it) then I always synchronise.
The thing is, in multithreaded code, I have found that volatile is neither necessary nor sufficient.
In terms of not necessary: I've encountered few compilers (even with quite aggressive optimisation) that break multithreaded code, with or without the volatile keyword, simply because compilers are quite pessimistic when they encounter function calls they don't know (for example, functions to grab a mutex and to release a mutex). If the volatile keyword does make a difference, I have almost always found that analysis exposes the presence of bugs in the compiler/optimiser, so there is more benefit in turning down compiler optimisation and eliminating other bugs, than there is in keeping the volatile keyword in place.
In terms of insufficient: I have yet to find an instance where the volatile keyword can reliably compensate for design issues (for example, two threads that access some shared data, without any synchronisation, where at least one of the accesses involves modifying that shared data).
In practice, in modern code hosted on modern operating systems (windows, unix variants, any RTOS, etc) it is rare that I need to write code that uses interrupts or signal handlers, because usually some other mechanism is available that is better for the intended purpose (for example, asynchronous I/O which calls a callback function when an operation completes).
I find that code for device drivers (or board support packages) does sometimes need to be aware of things like memory fences. But such code also needs to explicitly do what is intended and, again, if I need the volatile keyword in such code, it is either a sign of something I've missed in design, or a documented "feature" of the compiler that volatile is fit for a particular purpose.