This is another case where MS is confusing the hell out of everybody.....
First, from a standards perspective, the volatile keyword is only useful when used with sig_atomic_t inside a signal handler. The standard does provide other "semantics" to volatile objects but those can't be applied to multi-threading. Basically, what the standard provides for volatile objects is a defined ordering of reads and writes with respect to other volatile objects. It's been a great source of debate as to what this really means to compiler implementors and what guarantees this gives the programmer. Because of this, the general consensus that I've gathered over the years is that any use of volatile other than "volatile sig_atomic_t" is implementation defined.
A note on re-ordering.....
There are two ways in which reads and writes to memory can become "re-ordered". The first way is that the compiler itself is free to re-order reads and writes. For example:
Code:
int v1, v2;
...
v1 = 1;
...
v2 = 2;
In this example, the compiler is free to perform "v2 = 2" first. Now consider:
Code:
volatile int v1, v2;
...
v1 = 1;
...
v2 = 2;
Now the compiler must perform "v1 = 1" first. It's also commonly accepted that a compiler implementation should translate each read or write of a volatile object into an actual read or write on the address bus. In other words, volatile objects shouldn't be cached in a CPU register.
The other way in which reads and writes can be re-ordered arises from the hardware itself. On some architectures the processor can re-order the sequence of read and write instructions before executing those instructions. Also, on some architectures the cache-coherency algorithms used between cores can result in reads and writes appearing as if they were performed in a different order. Both of these issues are solved using "memory barriers". In general, a full (or fence) mbar means that all cores (and associated cache-lines) are "in-sync" with each other. There are also "acquire" and "release" mbars which are less restrictive than a full mbar.
The great debate: to volatile, or not to volatile in MT.....
So what does all this mean with respect to multi-threaded programming? Unfortunately, this has also been a great source of debate - which makes finding a definitive answer on the NET and news groups difficult. Under POSIX, things are more clear since POSIX defines memory coherency guarantees in a multi-threaded environment. Under POSIX, the only use of volatile is with sig_atomic_t in signal handlers or for memory mapped I/O (where each read or write must translate to a read or write on the address bus). If you're doing non-POSIX, multi-threaded programming, then you're in "implementation defined" territory when it comes to memory coherency guarantees.
Defined Implementation of the Microsoft compiler....
Starting with version 14.0.0 of the MS compiler, the volatile keyword was given additional semantics for use in a multi-threaded environment. From that version forward, reads from volatile objects generate an implied "acquire mbar" and writes to volatile objects generate an implied "release mbar". Note that this is only on architectures that support acquire and release mbars. MS-bashers out there like to suggest that MS made this move so that incorrectly written multithreaded device drivers and applications, which only used the volatile keyword as a means of shared variable synchronization, would simply start working correctly after re-compiling with version 14 or higher. In any case, here's a direct quote from a MS document on the use of volatile:
Originally Posted by
Microsoft Document - Multiprocessor Considerations for Kernel-Mode Drivers
If you look at the sample drivers shipped with the Windows DDK, you will see that volatile appears infrequently. In general, volatile is of limited use in driver code for the following reasons:
• Using volatile prevents optimization only of the volatile variables themselves. It does not prevent optimizations of nonvolatile variables relative to volatile variables. For example, a write to a nonvolatile variable that precedes a read from a volatile variable in the source code might be moved to execute after the read.
• Using volatile does not prevent the reordering of instructions by the processor hardware.
• Using volatile correctly is not enough on a multiprocessor system to guarantee that all CPUs see memory accesses in the same order.
Windows synchronization mechanisms are more useful in preventing all these potential problems.
Conclusion? In my opinion, volatile has no use in multi-threaded programming. MSVC is the only compiler I know of that actually defines volatile semantics in a multi-threaded environment - and even then MS themselves admit that "volatile is of limited use".
If you look closely at the MSDN page on volatile, there's a big chunk of the remarks section between "[Microsoft Specific]" and "[End Microsoft Specific]". What it doesn't tell you is that the example code demonstrates and relies only on the "Microsoft Specific" implementation of volatile. That example code is not even correct when using a MSVC version prior to 14! The best thing to do is to use proper synchronization primitives provided by the Windows SDK so that your code is correct regardless of the compiler being used......MSVC new and old, mingw, Borland C++, etc..., etc...
gg