Thread: Weird Performance

  1. #16
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Intel introduced something like a 12 stage pipeline and said it would beat AMD's core 5 or 6 stage pipeline. I think AMD is winning that battle.

  2. #17
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Well, the Core2 beats everything AMD currently has. Still waiting for their next generation ...
    All the buzzt!

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #18
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Quote Originally Posted by matsp View Post
    A long pipeline has nothing to do with OOE (Out Of Order execution).
    It has everythign to do with OOE. With more decoded instruction in the pipeline you get an increase in the chance that all sections of the ALU can be used to satisfy some portion of an instruction. If you can only precache 6 instructions then you can only utilize the ALU fully if those 6 instructions access all portions of it. If you cache 12 instructions, you increase the likelyhood that one of those 12 instructions will use any given portion of the ALU. The benefit of a longer pipeline on a single processor is that if the branch prediction fails, the 'lost' cache cycles would have been unused to begin with, so there is no overall loss of performance. On a multiprocessing system however, those cache cycles needed to fill a longer pipeline are more likely to have been used by another processor if they had not been used to fill the now invalid sections of the pipeline, so a failed branch prediction has a real cost in performance. By reducing the length of the pipeline, you increase the chacne that the ALU will go idle, btu you increase teh efficiency fo the cache useage. On modern systems, the primary limitation is memory bandwidth, which has lagged seriously behind processor speeds for the last 20 years. Therefor improving cache efficiency has a greater overall impact on system throughput than longer pipelines. Its a critical tradeoff that is endemic of parallel systems.

    Well, the Core2 beats everything AMD currently has. Still waiting for their next generation ...
    That may be awhile. The rumor is that AMD lost alot of their critical engineering staff to a competitior (not necessarily Intel).
    Last edited by abachler; 10-17-2007 at 09:47 AM.

  4. #19
    Registered User khdani's Avatar
    Join Date
    Oct 2007

    possible solution

    I know it was long ago since last reply to this thread, anyway i've found two working solutions for the situation:
    one of which is a complete rewritence of code to multithreaded language.
    ...i've rewritten the whole application on JAVA using JOGL, and all performance issues were solved since the application is completely multithreaded and the opengl context also.
    i guess it's possible to rewrite on C# using TAO or OpenTK and it may work not less good than on JAVA,but I haven't tried that.

    ...if rewriting the code is not possible,it's very recommended to multithread most loops in code, the best thing to use i guess is OpenMP. One should take into account that not all loops should be multithreaded (OpenMP refference explains that well)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 6
    Last Post: 02-27-2009, 04:43 PM
  2. Performance and footprint of virtual function
    By George2 in forum C++ Programming
    Replies: 8
    Last Post: 01-31-2008, 07:34 PM
  3. File map performance
    By George2 in forum C++ Programming
    Replies: 8
    Last Post: 01-04-2008, 04:18 AM
  4. Observer Pattern and Performance questions
    By Scarvenger in forum C++ Programming
    Replies: 2
    Last Post: 09-21-2007, 11:12 PM
  5. inheritance and performance
    By kuhnmi in forum C++ Programming
    Replies: 5
    Last Post: 08-04-2004, 12:46 PM