PGO is amazing

This is a discussion on PGO is amazing within the Tech Board forums, part of the Community Boards category; I just tried out PGO (profile-guided optimization) on my program using GCC 4.3 on Linux. The result was quite amazing. ...

  1. #1
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183

    PGO is amazing

    I just tried out PGO (profile-guided optimization) on my program using GCC 4.3 on Linux. The result was quite amazing.

    Sped up my program (a chess engine, 100% CPU) by almost 18%-20% (repeated a few times).

    Code:
    g++ -O3 -march=native -pg -fprofile-generate ...
    //run my program's benchmark
    g++ -O3 -march=native -fprofile-use ...
    vs
    Code:
    g++ -O3 -march=native ...
    I guess it could be due to the number of hard-to-predict branches in my code (I don't expect GCC to know there are usually no more than 2 queens on the board...).

    I remember trying it out in the earlier (GCC 4.0?), and it didn't really help. Surprised to see such a huge gain this time.

    I'm beginning to fall in love with GCC.

    So my suggestion: try it out if you haven't, it's quite a bit of performance gain for little cost (especially if your program already has a benchmark function, then everything can be coded into Makefile, and the only cost then would be that compilation takes longer). Of course, this only applies to CPU-intensive programs.

    I suspect MSVC has something like this, too, but I haven't looked into that.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I'm not sure msvc has such an option, but I know some of the other commercial compilers do (Intel, PGI).

    And yes, it can be very effective, not only for hard facts on branches, but also for stuff like "do we inline function X or function Y", as well as using registers in the places that count.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,252
    If only a 20% speed improvement meant that you can search 20% deeper. Of course, it doesn't...

    For years I've hacked on programs which play the game of Amazons. This game has an astounding branching factor of 2176 on the first move. Even assuming perfect alpha-beta, that's a tree growth factor per half-ply of 47 in the opening game.

    What this means, depressingly, is that to search only a single ply deeper (remembering that a ply is a move followed by a counter move) in a given time, I must speed up my program by 2176 times.

    Counterintuitively, it is better to instead make your evaluation heuristic more "intelligent" even if it makes it slower. You aren't going to lose much (maybe half a ply) but your estimates are more robust and so you can perform better at those shallower search depths. Or to think of it another way, even if I made my evaluation function 2176 times more complicated, I'd only lose a single ply.

    Chess is somewhat different in that the branching factor is MUCH lower, and it can vary significantly from ply to ply. A 20% speed increase is nothing to sneeze at, but I have to wonder how this has actually affected your game play performance.
    Last edited by brewbuck; 02-05-2009 at 01:08 PM.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    I don't know if the fish has implemented it, but most chess games are played on the clock, so even taking 20% less time on every move means more moves you can make before going into "panic mode".

  5. #5
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,252
    Quote Originally Posted by tabstop View Post
    I don't know if the fish has implemented it, but most chess games are played on the clock, so even taking 20% less time on every move means more moves you can make before going into "panic mode".
    Yes, but if you could have gotten another ply deeper if you had only searched for 0.3 seconds longer, it would be sad to miss out on the better evaluation (and possibly better move). The problem is being able to gauge how deep you can search from your current position in a given amount of time, and weigh that against how much time you have left on the clock.

    In Amazons for instance, it's pointless to search more than one ply on the first move, because you just ain't going to get it in any realistic amount of time. So say you can make a move within 0.5 seconds. A dozen moves later, you might be able to get an extra ply in 15 seconds -- much longer than 0.5 seconds, but hey, it's an extra ply.

    The complexity is deciding how to make the tradeoff, at what phase of the game, etc.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  6. #6
    Dr Dipshi++ mike_g's Avatar
    Join Date
    Oct 2006
    Location
    On me hyperplane
    Posts
    1,218
    I agree that a good heuristic is far more important than bit poking optimizations for this kind of stuff. But I guess a 20% speed increase is worth having if its no extra effort.

  7. #7
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    If only a 20% speed improvement meant that you can search 20% deeper. Of course, it doesn't...

    For years I've hacked on programs which play the game of Amazons. This game has an astounding branching factor of 2176 on the first move. Even assuming perfect alpha-beta, that's a tree growth factor per half-ply of 47 in the opening game.

    What this means, depressingly, is that to search only a single ply deeper (remembering that a ply is a move followed by a counter move) in a given time, I must speed up my program by 2176 times.

    Counterintuitively, it is better to instead make your evaluation heuristic more "intelligent" even if it makes it slower. You aren't going to lose much (maybe half a ply) but your estimates are more robust and so you can perform better at those shallower search depths. Or to think of it another way, even if I made my evaluation function 2176 times more complicated, I'd only lose a single ply.

    Chess is somewhat different in that the branching factor is MUCH lower, and it can vary significantly from ply to ply. A 20% speed increase is nothing to sneeze at, but I have to wonder how this has actually affected your game play performance.
    In chess, for a naive implementation, the branching factor is about 30 on average, which means, assuming 2Mnps (million nodes per second), which is about right for programs running on a modern PC, a program can search about 5 plies (log(20,000,000)/log(30)) in 10 seconds (or 10 plies maximum using alpha-beta with perfect move ordering, realistically 6-7 plies). That is obviously no good. Human masters can often look much further ahead than that.

    Therefore, people made chess engines much more selective about what moves to search further (extensions on captures, checks, promotions, for example), and what move to search shallower (reductions on moves that "fail low" in alpha-beta after a shallow search even after the moving side skips a move). And then there are things like transposition table. Since a position is likely to be searched multiple times in the tree through different paths (transpositions), results from a previous search is used if the depth is sufficient.

    With all kinds of clever tricks, that cuts down the "effective branching factor" in modern chess engines to about 2-4, and that allows programs to search to about 13-15 plies in mid game (>20 on supercomputers).

    With an effective branching factor that small, a 20% speed increase is quite substantial. It's estimated that every doubling of computing power (or speed, or time) gives ~50 Elo points, so 20% would bring about 50*log2(1.2), about 13 Elo points, which is definitely measurable (the one stronger by 13 Elo has a winning probability of ~52%).

    My program has a website if anyone's interested
    http://cyberfish.wecheer.com/Brainless/

  8. #8
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    I just tested PGO with ICC 11 (free Linux version),

    Code:
    Compiler - Mnps (best of 3 runs)
    GCC          - 2.736
    GCC+PGO - 3.045
    ICC           - 2.794
    ICC+PGO  - 2.813
    Seems like ICC produces slightly faster binary by default, and GCC produces A LOT faster binary with PGO.

    Google says that MSVC 2005 and up also has PGO support, and is available as a dll (?) that is only included in the Pro version, so I can't test that.

    For ICC, I used
    Code:
    icc -fast -march=core2 -prof-gen ...
    //benchmark run
    icc -fast -march=core2 -prof-use ...

  9. #9
    Registered User
    Join Date
    Feb 2009
    Posts
    1
    Quote Originally Posted by cyberfish View Post
    I just tested PGO with ICC 11 (free Linux version),

    Code:
    Compiler - Mnps (best of 3 runs)
    GCC          - 2.736
    GCC+PGO - 3.045
    ICC           - 2.794
    ICC+PGO  - 2.813
    Seems like ICC produces slightly faster binary by default, and GCC produces A LOT faster binary with PGO.

    Google says that MSVC 2005 and up also has PGO support, and is available as a dll (?) that is only included in the Pro version, so I can't test that.

    For ICC, I used
    Code:
    icc -fast -march=core2 -prof-gen ...
    //benchmark run
    icc -fast -march=core2 -prof-use ...
    I am working on a project were we are compiling the kernel with ICC: www.linuxdna.com

    We are working on getting PGO working for that... perhaps you are interested?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Amazing Pool Tricks
    By hk_mp5kpdw in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 05-07-2005, 07:50 PM
  2. Hiking in the Rockies..Amazing!
    By jverkoey in forum A Brief History of Cprogramming.com
    Replies: 5
    Last Post: 01-11-2005, 09:36 AM
  3. fastest sort algorithm ever??? it's just been found!!! simply amazing!
    By cozman in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 04-24-2002, 10:13 PM
  4. It's amazing...
    By kermi3 in forum A Brief History of Cprogramming.com
    Replies: 18
    Last Post: 01-18-2002, 01:44 AM
  5. Its...Its...Its AMAZING!!!
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 31
    Last Post: 11-16-2001, 05:47 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21