My standard flags are -W -Wall -O3 -fomit-frame-pointer -msse3
on my AMD Family 10h CPU (compiles x86-64 SSE3 code). If I want to do a test compile for generic x86-64,
-W -Wall -O3 -fomit-frame-pointer -m64 -msse2 -march=x86-64 -mtune=generic
and for most 32-bit (SSE2-capable) recent x86 processors,
-W -Wall -O3 -fomit-frame-pointer -m32 -msse2 -mtune=generic
I do a lot of math stuff, vectorizing complex expressions and so on, so SSE2 support is pretty much a requirement for my target machines.
I don't know if you consider -O3 -fomit-frame-pointer overkill, but for me on GCC it has been a long-time favourite, as a general big hammer that yields very tight results. Individual programs can benefit from additional (and rarely different) flags, but in general, these seem to produce very good results every time I benchmark some code.
I haven't seen any compilation issues in a very long time in my own code, either. For others' code, I mostly rely on their choices (of compiler flags), at least until I've gone over the code myself.
It is true that high levels of optimization do bring up "bugs" in the code -- traditional behaviour that is more strict than the C89/C90 standard specifies --, especially for math expressions. Personally, I've found that the C99/C11 casting rules take care of all that I need to worry about. In short, C99/C11 casting rules state that (int)(a*[I]b[/]) can be optimized by the compiler, as long as it computes that result at most at integer precision (and not, say, at floating-point or infinite precision). Making use of this rule makes implementing algorithms like Kahan summation algorithm predictable and reliable regardless of optimization level. To tell GCC that you're using C99, add -std=c99 to the compiler flags.