The volatile qualifier has nothing to do with multi-threading and cross-thread synchronization. Never has, never will. The original (and current) intent is to address two things: (1) Memory-mapped I/O and (2) interrupts/signals.
When you have a memory-mapped register at a physical address, you have to forge your pointer as a pointer to volatile to ensure that all reads/writes to that address are emitted by the compiler as loads/stores to that actual address, and not to a register due to compiler optimization. Also, the compiler may not reorder volatile accesses - so the register(s) will be accessed in program-order.
For signals, the C/C++ and Posix standards guarantees the type volatile sig_atomic_t as safe to access within the handler (with respect to any other in-program accesses). This means that loads/stores of this type are atomic with respect to CPU context switches - which are required for interrupts and interrupt-based signals. If signals are implemented via OS context switching (typically the same as an interrupt but more thread state is saved/restored), then that must also be atomic with respect to volatile sig_atomic_t accesses (to be Posix compliant anyway).
Because of this original intent, there are side-effects that can be useful when hand-rolling your own synchronization primitives. For us hand-rollers, it primarily provides (1) guaranteed load/stores to memory (meaning the compiler will not generate stores/loads to/from a cpu register as an optimization), and (2) a guaranteed compiler-ordering of volatile accesses (meaning the compiler will generate the assembly of volatile accesses in program-order).
>> I really do mean it both ways.
There is nothing wrong with adding a volatile qualifier and accessing it. The compiler just has to obey the rules outlined above, for that access.
Code:
/*
dcas_ptr_safe_assign -
Assign the ABA-counter followed by the pointer, ensuring that the
ABA-counter is always read first - by the compiler and HW.
http://groups.google.com/group/comp.programming.threads/msg/d3fe6c226f685d85
NOTE - nothing currently relies on reading the counter first, but the code has
been written just in case.
*/
#if defined(LIBxxxx_MSVC)
# define dcas_ptr_safe_assign(p1, p2) \
{ ((void**)(p1))[DCAS_CNT] = ((void**)(p2))[DCAS_CNT]; \
_ReadBarrier(); \
((void**)(p1))[DCAS_PTR] = ((void**)(p2))[DCAS_PTR]; }
#elif defined(LIBxxxx_GCC)
# if defined(LIBxxxx_X86)
/* Reads don't move ahead of other reads on x86, so we just need to make sure
the compiler emits the reads in the correct order. GCC will emit a lock
instruction for __sync_synchronize() on x86, which is unnecessary in this
case since we know reads won't move ahead of other reads on x86.
*/
# define dcas_ptr_safe_assign(p1, p2) \
{ ((void**)(p1))[DCAS_CNT] = ((void*volatile*)(p2))[DCAS_CNT]; \
((void**)(p1))[DCAS_PTR] = ((void*volatile*)(p2))[DCAS_PTR]; }
# else /* non-x86, use __sync_synchronize as compiler and hw barrier */
# define dcas_ptr_safe_assign(p1, p2) \
{ ((void**)(p1))[DCAS_CNT] = ((void**)(p2))[DCAS_CNT]; \
__sync_synchronize(); \
((void**)(p1))[DCAS_PTR] = ((void**)(p2))[DCAS_PTR]; }
# endif
#else
# error "dcas_ptr_safe_assign not implemented for this platform"
#endif
Notice the case for GCC on X86. The volatile casts are applied to the loads to ensure the compiler emits them in program-order.
>> This means that any types on pointers can't be relied upon, ...
No, the compiler must adhere to all types with volatile, just like const. Otherwise, memory-mapped I/O wouldn't work.
gg