So far I've noticed that:
-O3 and -fomit-frame-pointer significantly speed up my proggies.
Are there other goodies I missed for maxing speed?
Barring for now an assembly rewrite of the choke points. I'll get there later.
[edit]
Yes, I can RTFM, but what really works? vs. what's available.
[/edit]