Here's another possible order:
Code:
writer reader
====== ======
store c0
store c1
store barrier
load barrier
store c1
load c1 //data race!
store c0
store barrier
store c0
load c0 //data race again!
store c1
store barrier
After all, there is no barrier between the loads of c0 and c1, so they can be freely reordered. Like wise can the stores of c0 and c1 can be reordered.
Generally we use an atomic variable, with either memory fences or C11 atomic functions to synchronize other variables. So not only do you need to match acquire operations with release operations, but for them to match they must be associated with the same memory location. And then you would somehow use the value contained there to make sure the threads do not race.
If your goal is to make sure that two counters are incremented always in sync, the easiest way would not make either of them atomic. Instead you'd use a third variable to control them. Something like this untested code:
Code:
int long long unsigned c1=0, c2=0;
_Bool static volatile __attribute__( (aligned(128)) )
c3 = 0;
void increment(){
_Bool expected= 0;
while(!__atomic_compare_exchange_n(&c3, &expected, 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED));
++c1;
++c2;
__atomic_store_n(&c3, 0, __ATOMIC_RELEASE);
}
void read(){
_Bool expected= 0;
while(!__atomic_compare_exchange_n(&c3, &expected, 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED));
if(c1 !=c2)
printf( "uh-oh! c0 is %llu and c1 is %llu\n", c0, c1 );
__atomic_store_n(&c3, 0, __ATOMIC_RELEASE);
}
This is all a little bit easier in C11.