I find it strange that GCC guarantees atomic long long operations even on 32-bit machines. That must incur huge performance penalty.

Are you sure it's not Itanium-specific? The page refers to the...