Originally Posted by
abachler
actually you have that backwards. The longer a pipeline (assuming its an efficiently implemented one) the more effecive it is at minimizing small innefficiencies in the instruction stream. Shorter pipelines are statistically less likely to catch every possible chance to perform OOE. They had to shorten the pipelines in the core2 because of going multicore, where pipeline refills due to failed branch prediction have a larger impact on cache performance. Its a trade-off that wont always benefit every program.
When you start measuring AMD vs Intel, all bets are off, because they implement their ALU's in completely different ways. They both execute similar instruction sets, but internally they are like apples and oranges.