this is what you're saying, if I got you right, and this is the reason why the 2nd assertion can fire
O_o
I don't know how you got any of that information from what I said, but you are wrong regardless.
I said nothing about `ptr.store(p, std::memory_order_release);' being moved. I said the visibility of `ptr' could change when using `std::memory_order_relaxed' before the visibility of earlier writes--the character array--changes. The differences regarding change in visibility has nothing really to do with moving around instructions. The differences regarding change in visibility exist because `std::memory_order_relaxed' only guarantees atomicity. (The `std::memory_order_relaxed' does not guarantee any ordering of visibility.) The `std::memory_order_relaxed' option exists for scenarios where no ordering is required, but you are wanting to use the option for a scenario where a specific order of visibility is required.
We don't need to assume an elaborate environment to see how the lack of ordering requirements may exhibit differences in visibility. The character array--the focus of earlier writes--doesn't likely live in the same operational block as the `ptr' atomic. A simple environment having operations to flush the operational cache to a memory block with specific atomic reads and writes implemented by unbreakable instructions for flushing or invalidating the block where the atomic lives is sufficient to see problems.
Let's pretend we could insert such invalidation/flushing instructions ourselves; take a look at some slightly modified example code which eliminates spurious context:
Code:
std::atomic<char *> g;
void producer()
{
char * s(new char[6]);
s[0] = 'H';
s[1] = 'e';
s[2] = 'l';
s[3] = 'l';
s[4] = 'o';
s[5] = '\0';
FLUSH(s); // 1
// The change in the memory referenced by `s' becomes visible.
FLUSH(&g);
// The `consumer' thread is receives the change in `g' value.
g.store(s, std::memory_order_release);
}
void consumer()
{
char * s;
INVALIDATE(&g);
while (!(s = g.load(std::memory_order_consume)));
// The `consumer' thread has received the change in `g' value.
INVALIDATE(s); // 2
assert(!strcmp(s, "Hello"));
}
We know that the memory regarding `g' and `s' will eventually be copied from processor cache to main memory, but the order the visibility changes--which copy happens first--is relevant because the correct operation of the code depends on the memory referenced by `s' being completely written. We know that `1' happens before `2' because the write to `g' isn't allowed to "float above" the earlier writes.
Let's pretend we didn't care about the order by using the `std::memory_order_relaxed' option; the following reordered instructions changing visibility is perfectly valid:
Code:
std::atomic<char *> g;
void producer()
{
char * s(new char[6]);
s[0] = 'H';
s[1] = 'e';
s[2] = 'l';
s[3] = 'l';
s[4] = 'o';
s[5] = '\0';
FLUSH(&g);
// The `consumer' thread is receives the change in `g' value.
FLUSH(s); // 1
// The change in the memory referenced by `s' becomes visible.
g.store(s, std::memory_order_relaxed);
}
void consumer()
{
char * s;
INVALIDATE(&g);
while (!(s = g.load(std::memory_order_relaxed)));
// The `consumer' thread has received the change in `g' value.
INVALIDATE(s); // 2
assert(!strcmp(s, "Hello"));
}
We know told the environment that we don't care about ordering. We don't know that `1' happens before `2' is completed. We know that the environment is, for some definition or parallel, executing the functions in parallel. However, the environment may take longer to complete `FLUSH(s);' than `INVALIDATE(s);' which means that the memory referenced by `s' within `consumer' doesn't hold the values we expect.
Be fairly warned, I've simplified both example and explanation to the breaking point. One could easily poke holes in both example and explanation. The fact is that the visibility requirements only place an "as if" behavioral expectation with a multitude of different behaviors seen in practice. Despite the arguably frail examples and explanation, the point remains that apparently ordered operations do not necessarily correspond to changes in visibility having the same order. Actually, you could even go a bit further. If the visibility of changes did precisely correspond to the order of operations, you wouldn't need anything other than `std::memory_order_relaxed' to reason about threaded code.
Ultimately, the `std::memory_order_release' and `std::memory_order_acquire' options in particular allow us to reason about code as if the visibility of operations corresponded to the order of operations when correctly used.
Soma