Today I wrote a program to multiply two integers and wanted to see how fast I could make it run, and then test it against the on-board multiplication operator (*). When I compile with -O3 (even -O2) I find that in most cases its slightly faster than multiplying two numbers with *. I timed how long it took to run each function 1 billion times and averaged that out over 1 million runs. Is anyone able to explain to me what's going on? Is it simply that clock() isn't accurate enough for this test?

Here is my mult function.

Code:
inline unsigned long long int mult( unsigned long long m, unsigned long long n)
{
    /* Need to check for overflow */
    // x = min( m, n );
    // y = max( m, n);
    // a = product
    unsigned long long x = m ^ ((m ^ n) & -(m > n)),\
                       y = m ^ ((m ^ n) & -(m < n)),\
                       a = 0;

    int i = 0;

    do{
          a += (y & -(x&1)) << (i++);
    } while( x >>= 1);

    return a;
}
Unfortunately, I do not know any assembly so I am unable to get any meaningful information from looking at that.