Thread: Compiling for performance (GCC)

  1. #1
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413

    Compiling for performance (GCC)

    For the longest time now, I've been keeping my compiler switches pretty minimal. Just about all I do is -O2 and -march=native, which is what I figure is probably a safe amount of optimization without going crazy. There are a ton of options, especially for x86 specifically.

    Are there any other switches that are pretty norm for optimization? About the only thing I can see that I've actually heard of before is using -mfpmath=sse, which I've had to use before to force floating-point arithmetic to have the same answer between 32/64 bit builds. Apparently SSE is faster than the 387.

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    march=native should pickup supported SSE stuff. But I've also seen it pick a "lesser" march (like "core2" when running on a "corei7").

    You can see what gets picked up with: "gcc -O2 -march=native -Q --help=target".

    Other things you could play with are -O3 and -flto, unless you consider them "going crazy"

    gg

  3. #3
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    My standard flags are
    -W -Wall -O3 -fomit-frame-pointer -msse3
    on my AMD Family 10h CPU (compiles x86-64 SSE3 code). If I want to do a test compile for generic x86-64,
    -W -Wall -O3 -fomit-frame-pointer -m64 -msse2 -march=x86-64 -mtune=generic
    and for most 32-bit (SSE2-capable) recent x86 processors,
    -W -Wall -O3 -fomit-frame-pointer -m32 -msse2 -mtune=generic
    I do a lot of math stuff, vectorizing complex expressions and so on, so SSE2 support is pretty much a requirement for my target machines.

    I don't know if you consider -O3 -fomit-frame-pointer overkill, but for me on GCC it has been a long-time favourite, as a general big hammer that yields very tight results. Individual programs can benefit from additional (and rarely different) flags, but in general, these seem to produce very good results every time I benchmark some code.

    I haven't seen any compilation issues in a very long time in my own code, either. For others' code, I mostly rely on their choices (of compiler flags), at least until I've gone over the code myself.

    It is true that high levels of optimization do bring up "bugs" in the code -- traditional behaviour that is more strict than the C89/C90 standard specifies --, especially for math expressions. Personally, I've found that the C99/C11 casting rules take care of all that I need to worry about. In short, C99/C11 casting rules state that (int)(a*[I]b[/]) can be optimized by the compiler, as long as it computes that result at most at integer precision (and not, say, at floating-point or infinite precision). Making use of this rule makes implementing algorithms like Kahan summation algorithm predictable and reliable regardless of optimization level. To tell GCC that you're using C99, add -std=c99 to the compiler flags.

  4. #4
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    I have messed around with -flto a bit when I was comparing gcc to icc (Intel C compiler). Guess I could play with that again.

    I used the line Codeplug gave: "gcc -O2 -march=native -Q --help=target". What's weird is that it gives crap results for this laptop (doesn't enable hardly anything and detects the wrong arch), but gives perfect results for my Core i3 desktop.

    For reference, my standard is: -std=c++11 -pedantic -Wall -Wextra -O2 -march=native, which may change soon from this discovery. Just need the SSE part. Looks like the omit-frame-pointer is enabled by default now.

    Thanks guys.

  5. #5
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Epy View Post
    Looks like the omit-frame-pointer is enabled by default now.
    Yes. On GCC-4.6 and later, -O, -O2, -O3, and -Os all include -fomit-frame-pointer, as it no longer hinders debugging on x86 and x86-64. I prefer to include it explicitly, since many of the machines I use still have pre-4.6 GCC.

    Quote Originally Posted by Epy View Post
    -std=c++11
    Which, by the way, has the same numeric precision casting rules as C99/C11. It's only the old versions of C and C++ that need tricks, temporary variables, or extra compiler options to optimize complicated math expressions right.

  6. #6
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Code:
    -ffast-math
    may be useful in floating-point intensive code if you know and can work with the limitations.

    From my experience, as of right now, auto-vectorization (SSE) is still not very good. I always use GCC intrinsics if I want SSE.

    My standard flags:
    Code:
    -std=gnu++11 -Wall -g -O2 -march=native

  7. #7
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by cyberfish View Post
    From my experience, as of right now, auto-vectorization (SSE) is still not very good.
    Agreed. Fortunately, GNU C code similar to

    Code:
    typedef double  double4  __attribute__((vector_size(32)));
    double4 a, b, c, d;
    
    a = b * c + d;
    using basic arithmetic (+ - * /), unary (-, ~), and binary (| & ^) operators, compiles on all architectures, and vectorizes quite well. For example, SSE2 only supports double pairs (V2DF type), so on SSE2 the above example uses two XMM register pairs.

    GCC-4.8 has even better support, adding vectorized comparisons. (Square root and reciprocal square root still require intrinsics.) This makes it easy to write a 3D vector/matrix functions that vectorize well on many different architectures, and to use new extensions, for example AVX, you only need to recompile the sources; no source changes needed.

    Still, I too use intrinsics, at least for the critical parts, and expect to do so in the future, too. But, for a lot of library code, the vectorized types may yield a nice performance boost while keeping the code portable (between architectures, not between compilers).

  8. #8
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    Modern versions of GCC have -Ofast as an option. You could try this as an alternative to -O3 and see if it does anything different.

  9. #9
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    -Ofast is currently -O3 -ffast-math

    Optimize Options - Using the GNU Compiler Collection (GCC)

    Looks like there's also a new -Og which is pretty cool.

  10. #10
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    I'll third the SSE comment -- I did some tests last night on my Fortran programs with the SSE options and it didn't speed anything up at all. Disappointed.

    Nominal: I always use either c++11 or c99 for the standard when compiling C/C++ programs, so I'm glad to hear that about numerical precision.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Performance with MS Vis C++ 2005 and /clr
    By mynickmynick in forum C++ Programming
    Replies: 2
    Last Post: 08-14-2008, 04:58 AM
  2. Finalization and Performance
    By George2 in forum C# Programming
    Replies: 1
    Last Post: 05-02-2008, 03:09 AM
  3. File map performance
    By George2 in forum C++ Programming
    Replies: 8
    Last Post: 01-04-2008, 04:18 AM
  4. SDL Performance problem
    By cboard_member in forum Game Programming
    Replies: 8
    Last Post: 04-09-2006, 01:23 PM
  5. Best performance.
    By Benzakhar in forum Game Programming
    Replies: 3
    Last Post: 01-14-2004, 10:34 AM