Casting away volatile

**Toby Douglass** · 12-31-2016

Originally Posted by Codeplug

>> Volatile prevents compiler optimization.
Where in your list code do you believe this is preventing a potential bug? There may be another solution that doesn't involve volatile.

Well, the prototypes on MS require them, so there's a compiler error if they are missing.

Volatile is not required by the GCC intrinsics, so I could as far as the compilers are concerned #define volatile and make it optional, or cast in the MS calls.

Note that this is not just about the list. All pointers which are targets for atomic operations are declared volatile.

I have understood (or rationalized, rather, since I don't know enough assembly to meaningfully inspect the output from the compiler, so I don't really know what it's doing) volatile to be required because the compiler is not taking the actions of other threads into account, and so would assume that the value in the pointer would only be changed by itself, when in fact other threads are also modifying the pointer.

Memory barriers and so on are also required for these cross-thread operations to work successfully, but I am thinking that without volatile, the compiler may end up (say) keeping a copy of a pointer value in a register for a long time (which might in principle mean forever, although it never would in practise), causing problems.

**Toby Douglass** · 01-08-2017

Originally Posted by Hobbit

I'm not sure, I tend to use lock-free structures that are tried and tested and weren't written by me. I've checked the source for my lock free structures this morning and it seems they all use multibyte compare and swap, and they use a preprocessor macro VOLATILE which is defined as an empty tag with a */ volatile /* and a note that it should be uncommented only on insane compilers. I think you should be able to remove the volatile throughout the lock-free list.

I have an understanding of this now, although whether it's a correct understanding is another question.

The Standard always talks when discussing accessing variables about the type of the *object*, not the type of the pointers or what-have-you being used to get to the object.

As such, by the Standard, the volatile-ness of the access depends upon the object itself. If the compiler can work out a given object is *actually* non-volatile, then it is within its rights to access it as non-volatile, regardless of the type of a pointer being used to get to that object.

In practise, the popular compilers actually honour the type qualifier of the pointer.

So the library you refer to, their "insane" compiler is actually within its right. It's not actually insane - it's the wrong word.

The problem with the approach taken by that library is that users will have no idea this issue even exists, let alone how to find out what the compiler does in this regard, or how its behaviour has changed over versions of the compiler, *and* in the event of the library configuration being wrong, it is hard to test for this problem since although it may be present it by no means necessarily will show up on the test platform.

Correctness is the first requirement of software; if it doesn't do what it's supposed to do, then it has failed.

**Hobbit** · 01-08-2017

The structures I use were written largely by Dr Kier Fraser, a research associate at Cambridge University whose dissertation can be read here. This dissertation won the 2004 Distinguished dissertation award from the British Computer Society. The code for the lock-free data structures should still be downloadable from the university site. It'll provide some interesting bedtime reading if nothing else.

**Toby Douglass** · 01-08-2017

Originally Posted by Hobbit

The structures I use were written largely by Dr Kier Fraser, a research associate at Cambridge University whose dissertation can be read here. This dissertation won the 2004 Distinguished dissertation award from the British Computer Society. The code for the lock-free data structures should still be downloadable from the university site. It'll provide some interesting bedtime reading if nothing else.

Ya. I think I met him, or one of his group anyway, back when I lived in Cambridge.

I'm a bit surprised you're using the code because as far as I remember it's not documented? is there a test suite?

Their more capable data structures (btree, etc) use their MCAS construct, I think. If so, there may be a performance problem, because I read MCAS doesn't scale. I think this is probably true, because it seems to offer generic solution to atomically updating multiple pointers, but it was never adopted in the wide field.

I may be wrong, but I would say their work is a research effort - it's not aimed at or designed or treated appropriately for industrial use across a wide range of platforms. I suspect they're viewing the world more-or-less through GCC and Linux lenses.

**Hobbit** · 01-08-2017

Yes it's research code. Yes it's written for GCC. It isn't documented too well, only in code comments and the dissertation, but at the time it was the only lock-free red-black tree I could find. I've used it a fair bit back in the mid-noughties. It was intended to provide library writers with the algorithms for creating lock-free structures.
At this time I'm not overly worried by scaleability. It certainly performs fine with up to 8 cores. The dissertation mentions it was tested with up to 96 cores I think it was. A small test harness is provided.
There are many more options for me now. I mostly use C++ rather than C (it's so long since I used straight C I'm only really familiar with C89, and why I don't have much in the way of non-blocking code for C) and C++11 added atomics and a threading library. Boost has a non-blocking queue and stack and there's always TBB which gives queue, pqueue, hash map and vector as well as task based parallelism although recently they changed the licence from GPL to a dual licence so I need to look at the implications of that.
I have to use other people's non-blocking code. I don't have the confidence that I know enough about the hardware issues to even think about writing my own.

**Toby Douglass** · 01-09-2017

Originally Posted by Codeplug

>> Volatile prevents compiler optimization.
Where in your list code do you believe this is preventing a potential bug?

I may be wrong, but I think volatile may be necessary. The problem it addresses is the compiler optimizing a value into a register. I think when load barriers operate, copies of values already in registers are not modified; only the caches are affected. As such, the compiler can end up - in theory forever - using an earlier version of a value, never seeing it modified by load barriers. This will break most lock-free data structures.

I may be wrong, but I think taking a pointer to a non-volatile, casting that pointer to volatile and then accessing through the pointer is not supported by the Standard, but I think it is supported by modern GCCs. However, I'm not sure which versions, or what other compilers will do.

**Codeplug** · 01-10-2017

>> Well, the prototypes on MS require them, so there's a compiler error if they are missing.

Code:

#include <Windows.h>
int main()
{
    LONG i = 0;
    InterlockedIncrement(&i);
    return 0;
}//main

That compiles without warnings or errors. volatile qualifier behaves like const. Think of strlen(const char*) - you can pass in a char* just fine.

>> ... taking a pointer to a non-volatile, casting that pointer to volatile and then accessing through the pointer is not supported by the Standard, ...
That's backwards. See Post #3 of this thread.

>> The problem it addresses is the compiler optimizing a value into a register.
You don't have to worry about that since you are using synchronization primitives that the compiler understands - it will not hoist variables into registers across those primitives ("asm volatile", "InterlockedXXX", etc). Using volatile means you will never get those optimizations, so your code may be slower than it needs to be.

gg

**Toby Douglass** · 01-14-2017

Originally Posted by Codeplug

Code:

#include <Windows.h>
int main()
{
    LONG i = 0;
    InterlockedIncrement(&i);
    return 0;
}//main

That compiles without warnings or errors. volatile qualifier behaves like const. Think of strlen(const char*) - you can pass in a char* just fine.

Ah, yes - I see it.

I need to make sure const and volatile are always treated the same.

>> ... taking a pointer to a non-volatile, casting that pointer to volatile and then accessing through the pointer is not supported by the Standard, ...
That's backwards. See Post #3 of this thread.

I really do mean it both ways. What I mean to say is that (as I understand it - I may be wrong, since I've not yet read the latest Standard, or all of them) the Standard talks about the type of the *object*, always. This implies if the compiler can know the type of the object, then it's fine to ignore the types of pointer. This means that any types on pointers can't be relied upon, whether it's volatile pointing to non-volatile or non-volatile pointing to volatile.

>> The problem it addresses is the compiler optimizing a value into a register.
You don't have to worry about that since you are using synchronization primitives that the compiler understands - it will not hoist variables into registers across those primitives ("asm volatile", "InterlockedXXX", etc). Using volatile means you will never get those optimizations, so your code may be slower than it needs to be.

I see what you're saying. I thought about this in the past, but I couldn't know if it was so or not. My thought was that where the prototypes specify volatile, the prototype was indicating that the compiler did *not* treat the code differently (and I read always the compiler and processor see the world only in terms of a single thread - and special treatment for some instructions violates that "rule"), and so needed the volatile type to ensure things went well.

**Codeplug** · 01-15-2017

The volatile qualifier has nothing to do with multi-threading and cross-thread synchronization. Never has, never will. The original (and current) intent is to address two things: (1) Memory-mapped I/O and (2) interrupts/signals.

When you have a memory-mapped register at a physical address, you have to forge your pointer as a pointer to volatile to ensure that all reads/writes to that address are emitted by the compiler as loads/stores to that actual address, and not to a register due to compiler optimization. Also, the compiler may not reorder volatile accesses - so the register(s) will be accessed in program-order.

For signals, the C/C++ and Posix standards guarantees the type volatile sig_atomic_t as safe to access within the handler (with respect to any other in-program accesses). This means that loads/stores of this type are atomic with respect to CPU context switches - which are required for interrupts and interrupt-based signals. If signals are implemented via OS context switching (typically the same as an interrupt but more thread state is saved/restored), then that must also be atomic with respect to volatile sig_atomic_t accesses (to be Posix compliant anyway).

Because of this original intent, there are side-effects that can be useful when hand-rolling your own synchronization primitives. For us hand-rollers, it primarily provides (1) guaranteed load/stores to memory (meaning the compiler will not generate stores/loads to/from a cpu register as an optimization), and (2) a guaranteed compiler-ordering of volatile accesses (meaning the compiler will generate the assembly of volatile accesses in program-order).

>> I really do mean it both ways.
There is nothing wrong with adding a volatile qualifier and accessing it. The compiler just has to obey the rules outlined above, for that access.

Code:

/*
dcas_ptr_safe_assign - 
    Assign the ABA-counter followed by the pointer, ensuring that the 
    ABA-counter is always read first - by the compiler and HW. 
    http://groups.google.com/group/comp.programming.threads/msg/d3fe6c226f685d85

NOTE - nothing currently relies on reading the counter first, but the code has
       been written just in case.
*/
#if defined(LIBxxxx_MSVC)
#   define dcas_ptr_safe_assign(p1, p2) \
        { ((void**)(p1))[DCAS_CNT] = ((void**)(p2))[DCAS_CNT]; \
          _ReadBarrier(); \
          ((void**)(p1))[DCAS_PTR] = ((void**)(p2))[DCAS_PTR]; }
#elif defined(LIBxxxx_GCC)
#   if defined(LIBxxxx_X86)
/* Reads don't move ahead of other reads on x86, so we just need to make sure
   the compiler emits the reads in the correct order. GCC will emit a lock 
   instruction for __sync_synchronize() on x86, which is unnecessary in this 
   case since we know reads won't move ahead of other reads on x86.
*/
#       define dcas_ptr_safe_assign(p1, p2) \
            { ((void**)(p1))[DCAS_CNT] = ((void*volatile*)(p2))[DCAS_CNT]; \
              ((void**)(p1))[DCAS_PTR] = ((void*volatile*)(p2))[DCAS_PTR]; }
#   else /* non-x86, use __sync_synchronize as compiler and hw barrier */
#       define dcas_ptr_safe_assign(p1, p2) \
            { ((void**)(p1))[DCAS_CNT] = ((void**)(p2))[DCAS_CNT]; \
              __sync_synchronize(); \
              ((void**)(p1))[DCAS_PTR] = ((void**)(p2))[DCAS_PTR]; }
#   endif
#else
#   error "dcas_ptr_safe_assign not implemented for this platform"
#endif

Notice the case for GCC on X86. The volatile casts are applied to the loads to ensure the compiler emits them in program-order.

>> This means that any types on pointers can't be relied upon, ...
No, the compiler must adhere to all types with volatile, just like const. Otherwise, memory-mapped I/O wouldn't work.

gg

**Toby Douglass** · 01-16-2017

Originally Posted by Codeplug

The volatile qualifier has nothing to do with multi-threading and cross-thread synchronization. Never has, never will.

Yes. However, although I may be wrong, I think where this mistake is of the mistakes made with regard to volatile one of the more common, and where understanding others is always something of a barrier, it can be that this matters comes to the forefront when it is not the matter in hand.

When you have a memory-mapped register at a physical address, you have to forge your pointer as a pointer to volatile to ensure that all reads/writes to that address are emitted by the compiler as loads/stores to that actual address, and not to a register due to compiler optimization. Also, the compiler may not reorder volatile accesses - so the register(s) will be accessed in program-order.

Yes.

This is quite close to the matter in my mind. For the compiler/processor, both of which see the world in terms of a single thread, it can seem entirely reasonable to place a value into a register and use it from there - for an indefinite period of time. However, in reality, where other threads are also writing to the value, the compiler and processor are unaware of changes being made elsewhere.

To ensure those changes are seen, we appropriately use memory barriers and atomic operations; but my concern is that once a variable has been copied into a register and is being used from there, that changes made to that variabe by other threads - even where we are correctly using memory barriers and atomic operations - will not propagate to the copy in the register.

To solve this, we must use volatile, since it prevents the compiler/processor from acting in this way. A value cannot be indefinitely stored in a register. It is the final link in the chain.

Because of this original intent, there are side-effects that can be useful when hand-rolling your own synchronization primitives. For us hand-rollers, it primarily provides (1) guaranteed load/stores to memory (meaning the compiler will not generate stores/loads to/from a cpu register as an optimization), and (2) a guaranteed compiler-ordering of volatile accesses (meaning the compiler will generate the assembly of volatile accesses in program-order).

Yes.

I really do mean it both ways.

There is nothing wrong with adding a volatile qualifier and accessing it. The compiler just has to obey the rules outlined above, for that access.

I do not think this is always so. I think it is supported by GCC, and it may (well) be supported by other compilers, but it is *not* compliant with the language of the Standard, and so cannot be assumed to always be so.

[snip code]

Notice the case for GCC on X86. The volatile casts are applied to the loads to ensure the compiler emits them in program-order.

Yes. I think GCC supports this, and if I only supported GCC, I could use it.

**King Mir** · 01-16-2017

Originally Posted by Toby Douglass

Yes.

This is quite close to the matter in my mind. For the compiler/processor, both of which see the world in terms of a single thread, it can seem entirely reasonable to place a value into a register and use it from there - for an indefinite period of time. However, in reality, where other threads are also writing to the value, the compiler and processor are unaware of changes being made elsewhere.

To ensure those changes are seen, we appropriately use memory barriers and atomic operations; but my concern is that once a variable has been copied into a register and is being used from there, that changes made to that variabe by other threads - even where we are correctly using memory barriers and atomic operations - will not propagate to the copy in the register.

To solve this, we must use volatile, since it prevents the compiler/processor from acting in this way. A value cannot be indefinitely stored in a register. It is the final link in the chain.

The compiler and processor are actually different in this case. C has a concept of volatile which the compiler must obey, because it signifies that changes may be external to the program. C has no concept of registers; volatile is a level of abstraction above that, and the register keyword is semantically synonymous with auto. In contrast, assembly that the processor interprets has no concept of volatile. The compiler may move non-volatile objects into registers. The processor may not, but what is does do is have special registers for frequently accessed memory know as the CPU cache. Both can perform optimizations that reorder instructions. The compiler can reorder instructions liberally in the absence of volatile and atomics. The processor can reorder memory writes provided it obeys it's own memory model, which is different from C's. Volatile also does not prevent the compiler from reordering non-volatile access near a volatile variable. Except for memory mapped io which the processor would know about, the processor treats all other (non-atomic) reads and writes the same.

Atomics provide a different guarantee: that other threads may modify them, and for non-relaxed atomics, use them to synchronize access to other parts of memory. There probably aren't many compiler would put an atomic variable in a register, but unlike with volatile, this would be possible if the compiler can prove that only one thread uses that variable. Perhaps some compiler may make atomic variables into regular variables if threading is disabled for the build. The processor may have atomic instructions and memory barriers, or for stronger memory architectures, the same of the guarantees atomic may apply to all memory.

In summary, volatile is not sufficient for thread synchronization; you need atomic operations. Properly used atomic operations are sufficient to ensure synchronization.

I do not think this is always so. I think it is supported by GCC, and it may (well) be supported by other compilers, but it is *not* compliant with the language of the Standard, and so cannot be assumed to always be so.

[snip code]

]Yes. I think GCC supports this, and if I only supported GCC, I could use it.

Volatile reads are guaranteed to preserve order from the compiler perspective; it sees volatile like IO. You can't reorder print instructions, and same with volatile. This is true on any compiler. Here the code is explicitly taking advantage of x86's strong memory model, guaranteeing that the processor won't reorder anything either.

**Toby Douglass** · 01-17-2017

Originally Posted by King Mir

In summary, volatile is not sufficient for thread synchronization; you need atomic operations. Properly used atomic operations are sufficient to ensure synchronization.

Yes.

I may be wrong, but I think what you have discussed does not relate to the point I argued about volatile. Rather, what has been discussed is the most common misunderstanding in the use of volatile.

Volatile reads are guaranteed to preserve order from the compiler perspective; it sees volatile like IO. You can't reorder print instructions, and same with volatile. This is true on any compiler. Here the code is explicitly taking advantage of x86's strong memory model, guaranteeing that the processor won't reorder anything either.

Yes.

I may be wrong a second time, but I think also here what you have said is true, but it is not related to the point I argued. I wrote that GCC honours the casting of type qualifiers on pointers, but that the Standard does not.

**Codeplug** · 01-17-2017

>> ... but that the Standard does not.
I'm not aware of anything in the standard that prevents adding volatile via a type cast.

gg

**Toby Douglass** · 01-17-2017

Originally Posted by Codeplug

>> ... but that the Standard does not.
I'm not aware of anything in the standard that prevents adding volatile via a type cast.

What I've found when Googling (I've not yet checked the Standard itself) is that the language of the Standard always talks about the type qualifier of the *object*. This is taken to mean that if the compiler can know the type qualifier of the object, it is free to wholly ignore the type qualifiers of the pointers being used.

GCC I think does not do this; it honours the type qualifiers on the pointer.

I'd provide some URLs (and indeed go check the Standard) but I'm at work =-)

**King Mir** · 01-17-2017

Originally Posted by Toby Douglass

Yes.

I may be wrong, but I think what you have discussed does not relate to the point I argued about volatile. Rather, what has been discussed is the most common misunderstanding in the use of volatile.

It is part of my attempt to explain why compilers can't ignore volatile qualifiers.

Yes.

I may be wrong a second time, but I think also here what you have said is true, but it is not related to the point I argued. I wrote that GCC honours the casting of type qualifiers on pointers, but that the Standard does not.

It must honor volatile casts. Consider the following example: an interupt service routine(ISR) can in general access all memory, but usually you would designate a global variable that it will restrict it's modifications to. You could also make that variable a pointer, and make the ISR write to or read from the memory pointed to by that global pointer. The object pointed to by that pointer is only volatile when is pointing to it. When it is, all other pointers to that object must also be pointers to volatile, but only if they are actually dereferenced. The compiler has no way of knowing such complicated semantics. Instead, it must rely on the programmer to properly apply volatile casts to objects when they may be modified or read by the ISR, and only then.

Again: volatile mean external to the program. Therefore a complier cannot prove something it compiles is not actually volatile.

EDIT:
I took a look at the C standard wording. In my interpretation, what I said above holds true for variables that are not defined as volatile but are accessed through pointers. However, the standard also states:

6.7.3 par. 6: ... If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined.

Footnote:This applies to those objects that behave as if they were defined with qualified types, even if they are never actually defined as objects in the program (such as an object at a memory-mapped input/output
address).

In other words, you cannot cast away volatile for an object defined as volatile in your program. (unless you don't actually access the object through the non-volatile pointer).