Have you looked into higher-level optimisations first?
For example the trigonometric identity:
Code:
sin(arctan(x)) = x / sqrt(1 + x*x)
According to:
http://en.wikipedia.org/wiki/List_of...ric_identities
Of course that may well be slower, but it depends on how it fits with the rest of the surrounding code. E.g. if you're by chance calculating that exact denominator for some other reason anyway, then it would have to be faster. There's also Carmack's fast rsqrtf function to speed up such things.
There's more than one way to skin a cat!