The raw algorithm has been proven correct in an abstract machine that doesn't support concurrency the way it works on most machines in real life and assumes strictly ordered reads and writes.Here is one such algorithm that requires no special instructions
Consider a machine with multiple real threads of execution that don't share real (hardware) cache: real thread one may cache the shared "ownership" array and mutates real cache which is distinct from the real cache owned by real thread two. Without memory barriers being thrown across the shared array the logic fails in any number of ways.
In other words, practical implementations in C and C++ for processors with multiple real threads will still need to use primitives that guarantee atomic operations across the real shared array. You can't do that without help from the hardware or operating system.
On the other hard, the raw algorithm works fine for systems with only one single instruction layer and one real thread of execution.
This is true, but in practice the "i" is computed so as to be unique to a given thread.If one thread half sets a Number[i] and then another thread calls max() there, the value of Number[i] in the second thread will be set to an UB value.