Funny you mention that.
I'm looking at this : https://www.cs.cmu.edu/~410-s05/lect...1_LockFree.pdf
and they also use an atomicCAS() operation but it's in a do-while loop. How is a do-while loop more efficient than a locking mechanism? Sure, there's a larger storage overhead using a mutex per node but RAM is dirt cheap so it's almost a non-issue.