Quote Originally Posted by Toby Douglass View Post
Yes. I have seen this, and it confused me when I first saw it, and I now think it wrongly named. An ordinary store, regardless of memory barriers, may never be seen by any other physical core, and I expect it also falls prey as usual to the load-modify-store problem, exemplified by multiple threads incrementing the same counter; counts are lost. In my book, atomic means in such a scenario, the count is correct; no counts are lost.
An atomic increment would indeed be more than just a move instruction. But a store to an atomic variable can be. Read-modify-write operations require either hardware support, or a spinlock. Like move instructions, those operations may also require barriers around them.

Similarly, when thinking of this (so-called) atomic load, the invalidation request queue is cleared immediately prior to the load - but then anything can happen, and the queue can become completely full, and so the load can be of an invalid-but-not-yet-invalidated cache line. This problem does not happen with (what I would call) atomic loads. You get what was really there at that moment.
Firstly, invalid cachelines are at a lower level of understanding than needed for C's memory model. Thinking of cachelines seems to be confusing you, so I suggest you don't, until you understand c's memory model at a higher level.

Secondly, there's no such thing as "what was really there"; C is at a higher level than hardware so this definition is meaningless in a C context. Unsynchronized memory access means instructions can be arbitrarily reordered, provided that data dependency is preserved. Even with relaxed atomics this is true. You can't reason about them as if two threads execute in program order. You can only do that if you use sequentially consistent memory order, which implies the use of memory barriers. Now if you do use sequentially consistent atomic loads and stores, then you don't need memory barriers, because those are already build into the operation.


I'm not sure what you mean. Can you elucidate?
I just meant the code is described as working, so if it doesn't make sense to you, it's probably because you don't understand something.

Quote Originally Posted by Toby Douglass View Post
Yes. However, I think it all reduces down to something simple; whatever you store may never by other physical cores be seen at all, and if it is seen, it may be seen in any order.
That can be true for weak memory architectures. However, working in C, you can reason at a higher level. If your code is synchronized by the right kind of atomics, or by locks, loads and stores will have the order you need.

I have one unanswered question here though, which is to do with stores to the same location. I wonder if these can be sure to be seen in order of stores, or not.
The answer is yes, provided that there is no data-race. In other words, C disallows two threads to write to the same memory location at the same time, unless the variable is atomic. This allows compiler optimizations, but as mentioned above, once compiled relaxed atomic write are often simple load instructions.

I may be wrong, but I think this is not so. Memory barriers do solve the ordering problem, but they do not solve the visibility problem; what you store *if it becomes visible* will become visible in the correct order (which is to say, order as constrained by store barriers) but there is no guarantee it *will become visible*. A forced write to memory (and only a force write to memory) guarantees that earlier stores will then be visible (as constrained by such store barriers issued up to that point).
A memory barrier does insures that memory operations become visible; that's part of their purpose. The details depend on which type of memory barrier is used.

You don't actually ever want to force a write to main memory. At most you would invalidate the L1 and L2 caches of other threads, but this is an implementation detail of memory barriers.