Intel introduced something like a 12 stage pipeline and said it would beat AMD's core 5 or 6 stage pipeline. I think AMD is winning that battle.
Printable View
Intel introduced something like a 12 stage pipeline and said it would beat AMD's core 5 or 6 stage pipeline. I think AMD is winning that battle.
Well, the Core2 beats everything AMD currently has. Still waiting for their next generation ...
It has everythign to do with OOE. With more decoded instruction in the pipeline you get an increase in the chance that all sections of the ALU can be used to satisfy some portion of an instruction. If you can only precache 6 instructions then you can only utilize the ALU fully if those 6 instructions access all portions of it. If you cache 12 instructions, you increase the likelyhood that one of those 12 instructions will use any given portion of the ALU. The benefit of a longer pipeline on a single processor is that if the branch prediction fails, the 'lost' cache cycles would have been unused to begin with, so there is no overall loss of performance. On a multiprocessing system however, those cache cycles needed to fill a longer pipeline are more likely to have been used by another processor if they had not been used to fill the now invalid sections of the pipeline, so a failed branch prediction has a real cost in performance. By reducing the length of the pipeline, you increase the chacne that the ALU will go idle, btu you increase teh efficiency fo the cache useage. On modern systems, the primary limitation is memory bandwidth, which has lagged seriously behind processor speeds for the last 20 years. Therefor improving cache efficiency has a greater overall impact on system throughput than longer pipelines. Its a critical tradeoff that is endemic of parallel systems.
That may be awhile. The rumor is that AMD lost alot of their critical engineering staff to a competitior (not necessarily Intel).Quote:
Well, the Core2 beats everything AMD currently has. Still waiting for their next generation ...
I know it was long ago since last reply to this thread, anyway i've found two working solutions for the situation:
one of which is a complete rewritence of code to multithreaded language.
...i've rewritten the whole application on JAVA using JOGL, and all performance issues were solved since the application is completely multithreaded and the opengl context also.
i guess it's possible to rewrite on C# using TAO or OpenTK and it may work not less good than on JAVA,but I haven't tried that.
...if rewriting the code is not possible,it's very recommended to multithread most loops in code, the best thing to use i guess is OpenMP. One should take into account that not all loops should be multithreaded (OpenMP refference explains that well)