Best speed switches for gcc

This is a discussion on Best speed switches for gcc within the Linux Programming forums, part of the Platform Specific Boards category; So far I've noticed that: -O3 and -fomit-frame-pointer significantly speed up my proggies. Are there other goodies I missed for ...

  1. #1
    Registered User
    Join Date
    Oct 2002
    Posts
    46

    Question Best speed switches for gcc

    So far I've noticed that:

    -O3 and -fomit-frame-pointer significantly speed up my proggies.

    Are there other goodies I missed for maxing speed?

    Barring for now an assembly rewrite of the choke points. I'll get there later.

    [edit]
    Yes, I can RTFM, but what really works? vs. what's available.
    [/edit]
    Last edited by rafe; 11-01-2002 at 02:30 PM.

  2. #2
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    Agreed!

    But I'm trying to milk an O(n^2) algo for all it's worth. The O(n^2) is written in stone & I know what the hot spot is, in this rare case I knew this before I started coding. Of course there's plenty of stuff a newbie like me isn't doing efficiently but I'm trying to coast as long as I can before I have to pedal hard.
    Last edited by rafe; 11-01-2002 at 03:09 PM.

  3. #3
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    OK, when I do this I get burried in the output
    -m elf_i386

    Sooo... given that I'm working with an athlon I should reinstall gcc for my target processor? Giving it a better chance to optimize for the chip when I do the -O3? It should also allow it to use all those fancy MMX registers &c. Or is there a better way?

  4. #4
    Registered User
    Join Date
    Oct 2002
    Posts
    46

    Smile

    Thanks! I'm currently using gcc 3.2 so it has the Athlon switches. I'll experiment & time against my prog.

  5. #5
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    A few extra ticks thanks. Given how heavily this program will be hit in the next few months the difference *IS* significant.

    When I use "-O3 -fomit-frame-pointer" the times cluster near:
    real 1m10.404s
    user 1m10.220s
    sys 0m0.170s

    By adding then adding "-mcpu=athlon-xp" the following is fairly typical:
    real 1m5.608s
    user 1m5.500s
    sys 0m0.100s

    Oh yea, without any of the above switches it takes over 4 times as long.

    This is good to have before getting into hard to maintain coding tricks.

    Thanks again.

  6. #6
    Registered User
    Join Date
    Oct 2002
    Posts
    46

    for the morbidly curious only

    I know that program optimization can be fickle. But I'm trying to learn what I can about gcc switches & I figured I'd share my findings. It is odd to see how many of the switches slow things down, but of course they may do the reverse in other situations.

    I was able to lose another second with -fdelete-null-pointer-checks & -fschedule-insns2. But i would be skeptical about using them in other situations & my testing shows little or no gain as a rule.

    On the other hand, the -O3, -fomit-frame-pointer & -mcpu=whatever seem to be winners on all of the compute bound programs I tested. Still a small N tho & limited to integer math.

    Easy-to-read source code optimizations (only in the problem area) have also been big winners.

  7. #7
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    No. While reading the man pages on gcc I read:
    -mcpu=cpu-type
    This is identical to specifying both -march and -mtune.
    So, your hypothesis is that -mtune could have a negative effect on speed? I didn't think of that & seeing how the other options can have a negative effect I'll give it a try.

    Truthfully, my biggest surprise was that the option for pretouching the memory to load the cache line didn't have a net gain. This suggests to me that I'd better make sure that my malloc() structs are properly aligned, I thought that malloc() did that for me by default. This is of course a source code tweak.

    Thanks for all the help. So much for me to learn so few brain cells to do it with.

  8. #8
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    Nix that! Whoa! User time dropped from 0m48.910s to 0m37.400s.

    Great catch!

  9. #9
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    A few more source code tweaks & the user time is now at 0m31.610s. I'm only posting this because those same tweaks with the -mcpu vs -march switch were actually *slowing down* the times. This was causing me much confusion because it should have been stuffing more data into the cache for the inner loop & speeding things up. Now I know that I wasn't giving the compiler the correct info. An important lesson.

  10. #10
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    > Well they will be data aligned for sure -
    > meaning you can store a data type with the
    > most restrictive type (usually a double) at
    > the address returned to you.
    Yup, my tests confirm that I was writing nonsense about malloc() alignments. Everything is aligned to the 8s.

    >It might be worth looking at the -fbranch-probabilities option
    Unfortunately, the -fbranch-probabilities actually slows the times down by almost exacly one second. Another switch that seems to be context sensitive. The problem area is a nested for loop so the switch seems to be to be a logical choice but I've hoisted about as much as I can out of it. It's pretty lean at this point.

    FYI: I'm doing a variant on the old edit distance problem, AKA dynamic programming. This algo searches for a best score in a 2D array. Each cell's score depends upon its neighbors to the North, East, and Northeast. The matrix cell structs are down to 2 ints. A nearly ideal size for caching. With some other tricks I've been able to reduce the "matrix" to 2 rows (well there is one more for the 1st iteration) which I toggle between. And because I'm not using an actual matrix and the rows are a few hundred cells on average it tends to be quite cachable.

    Hm, that gives me another idea... Anyway, with your help I'm learning a lot going thru this exercise. I'm going to have to pull the plug on this sooner or later but given that you've helped me take a 2 week task down to under one. I think that the time has been well spent so far & deserves a few more edits. Thanks again.
    Last edited by rafe; 11-06-2002 at 11:51 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. I am very new . . . :(
    By Eternalglory47 in forum C++ Programming
    Replies: 6
    Last Post: 09-05-2008, 11:29 AM
  2. Flight Simulator Wind Speed!!
    By Dilmerv in forum C++ Programming
    Replies: 6
    Last Post: 03-19-2006, 11:40 PM
  3. Replies: 6
    Last Post: 01-08-2006, 01:49 PM
  4. increased net speed
    By PING in forum A Brief History of Cprogramming.com
    Replies: 20
    Last Post: 03-29-2005, 06:05 AM
  5. VB Speed compared to VC++ Speed.
    By Xei in forum C++ Programming
    Replies: 6
    Last Post: 05-19-2002, 04:01 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21