Yes, and combined with 16 register for SSE, it makes it really easy to generate decent SSE code, and it's never been particularly easy for any compiler to do REALLY good FPU code.
Yes, and it's fairly well published that the new quad core processor has 128-bit units capable of doing SSE as fast as a Core2.It's true that the Core2 is the first CPU to have a 128-bit execution bandwidth and thus to be able to do SSE operations in one step, whereas all previous CPUs, including AMD's, need to process them in two steps, thus taking twice as many cycles for all operations.
What is not true is that AMD's x87 unit is slower than Intel's.
And if memory bandwidth is really the limiting factor, the AMD processors were definitely better before Core2 came out [which is a major change to Intel's top-end architecture, as it's slower [in MHz] than the predecessors, have a shorter pipeline, no hyperthreading, and better memory bandwidth].
--
Mats