I think I was pretty clear. It appears barrier() is a function call, is it not? Since we don't know what's in it, and given your comment, I'm assuming it is just there to guarantee some sort of ordering by the complier. I see no reason why it should necessary do this. Yes, putting a mutex around things fixes the problem. This is "by the book". Your solution seems convoluted and not guaranteed to work.
My current game engine project uses threads. I keep thread communication to a minimum and use critical sections where I have to. The threading has turned out to be the least of my problems. I imagine that's because I don't use any circus tricks. I feel overall design simplicity is the key.
I find it interesting that you are the one who is claiming significant debugging complexity with thread programming and you are also the one using funky tricks. I'm thinking that's possibly more than a coincidence.
Then again it's your code and you can do what you want.