Thread: Program to Multiply Integers as Fast as Hardware?

  1. #1
    Registered User
    Join Date
    Oct 2015
    Posts
    6

    Program to Multiply Integers as Fast as Hardware?

    Today I wrote a program to multiply two integers and wanted to see how fast I could make it run, and then test it against the on-board multiplication operator (*). When I compile with -O3 (even -O2) I find that in most cases its slightly faster than multiplying two numbers with *. I timed how long it took to run each function 1 billion times and averaged that out over 1 million runs. Is anyone able to explain to me what's going on? Is it simply that clock() isn't accurate enough for this test?

    Here is my mult function.

    Code:
    inline unsigned long long int mult( unsigned long long m, unsigned long long n)
    {
        /* Need to check for overflow */
        // x = min( m, n );
        // y = max( m, n);
        // a = product
        unsigned long long x = m ^ ((m ^ n) & -(m > n)),\
                           y = m ^ ((m ^ n) & -(m < n)),\
                           a = 0;
    
        int i = 0;
    
        do{
              a += (y & -(x&1)) << (i++);
        } while( x >>= 1);
    
        return a;
    }
    Unfortunately, I do not know any assembly so I am unable to get any meaningful information from looking at that.

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    What is your equivalent function that uses operator * in its implementation?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Oct 2015
    Posts
    6
    sorry I forgot to show my main function which does the testing:

    Code:
    int main( int argc, char** argv ){
        clock_t begin1, end1, begin2, end2;
        long double total;
    
    
        int trials, trials1, i=0, SIZE = 31, MAX = 1000000000;
    
        unsigned long long m = 2, n = 1;     // numbers to be multiplied
    
        double avg1, avg2, var=0;
        FILE *fp;
    
    
        fp = fopen(argv[1], "w");
    
    
        START=clock();
    
        // multiply a most a 31-bit with a 31-bit
        while( i++ < SIZE )
        {
            trials = 1000000;
            avg1 = avg2 = 0;
            while( trials-- )
            {
    
    
                trials1 = MAX;
                begin1 = clock();
    
                while( trials1-- ) m*n;       // timing m*n
    
                end1 = clock();
                avg1 += (double)(begin1 - end1);
    
    
                trials1 = MAX;
                begin2 = clock();
    
                while( trials1-- ) mult(m,n);   // timing mult
    
                end2 = clock();
                avg2 += (double)(begin2 - end2);
    
    
            }
            
            avg1 /= CLOCKS_PER_SEC;
            avg2 /= CLOCKS_PER_SEC;
            m = (m<<1) - 1;
            n <<= 1;
    
    
            fprintf( fp, "%d %g %g\n", i, avg1 / 1000000, avg2 / 1000000 );
        }
        END=clock();
        fclose(fp);
    #endif
    }
    As you can see, I am timing * directly instead of calling a function. If there is something wrong with my method, then please let me know.

    Additionally, I am outputting the data in such a way that I can plot it using gnuplot.

    [EDIT] Sorry, I just realised that in the fprintf() im am dividing by trials, which is now not 1000 000. Just fixed that - I still get the same results though..
    Last edited by plopperzz; 12-27-2015 at 01:27 PM. Reason: some clean up

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Well, one issue with these kind of tests is that you may end up with the compiler applying optimisations that you did not intend, e.g., perhaps it would just remove a loop that has no net effect, and if that is the loop whose body contains precisely the thing you are trying to time, you could well just be timing the resulting reduced code instead of what you wanted to time.

    Also, I noticed that before you tested for operator *, you wrote:
    Code:
    trials1 = 1000000000;
    whereas before you tested smult, you wrote:
    Code:
    trials1 = MAX;
    MAX is not shown, but if MAX happens to be much smaller than 1000000000, then the reason for the unexpected results could lie there.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User
    Join Date
    Oct 2015
    Posts
    6
    Yeah, sorry (I feel like a fool for this), I think I got that all fixed. I just had macros defined in my source for ease, but I forgot to fix all of that when I copied my code over to here. What you say may very well be true, however I take it I need to look at the assembly in order to make tat conclusion?

  6. #6
    Registered User
    Join Date
    Oct 2015
    Posts
    6
    Here is my full source code, just so there is no confusion - and by that I mean errors caused by me forgetting to make certain changes when copying the code over.

    Code:
    #include <stdio.h>
    #include <stdlib.h> 
    #include <limits.h>
    #include <time.h>
     
    #define MAX 1000000000
    #define NUM_FOR_AVG 1000000
    #define SIZE 31
    
    
    inline unsigned long long int smult( long long m, long long n )
    {
    
        unsigned long long x = m ^ ((m ^ n)& -(m > n)),\
                           y = m ^ ((m ^ n)& -(m<n)),\
                           a=0;
        int i = 0; 
    
    
        do a += (y&-(x&1))<<(i++); while(x>>=1); 
    
    
        return a;
    }
    
    
    int main( int argc, char** argv )
    {
        clock_t begin1, end1, begin2, end2;
        unsigned long long m = 2, n = 1;
        double avg1, avg2;
        int trials, trials1, i=0;
    
    
        FILE *fp;
        fp = fopen(argv[1], "w");
    
    
        while( i++ < SIZE )
        {
            trials = NUM_FOR_AVG;
            avg1 = avg2 = 0;
            while( trials-- )
            {   
    
    
                trials1 = MAX;
                begin1 = clock();
                while( trials1-- ) m*n;
                end1 = clock();
                avg1 += (double)(begin1 - end1);
         
                trials1 = MAX;
                begin2 = clock();
                while( trials1-- ) smult(m,n);
                end2 = clock();
                avg2 += (double)(begin2 - end2);
    
    
            }
            avg1 /= CLOCKS_PER_SEC;
            avg2 /= CLOCKS_PER_SEC;
            m = (m<<1) -1; 
            n <<= 1;
    
    
            fprintf( fp, "%d %g %g\n", i, avg1/NUM_FOR_AVG, avg2/NUM_FOR_AVG );
            printf("%.2g%%\n", (double)i*100/SIZE);
        }   
        fclose(fp);
    }
    Now running this code again with the NUM_FOR_AVG constant I seem to be getting more sensible results, such as mult() taking some constant time longer than the * operator
    Last edited by plopperzz; 12-27-2015 at 01:40 PM.

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by plopperzz
    What you say may very well be true, however I take it I need to look at the assembly in order to make tat conclusion?
    That would certainly be one way of checking. I believe that to avoid such optimisations, people typically do things like vary the input and/or ensure the output is used such that it cannot be optimised away, but doing these with something so fundamental as operator * may well cause these things to dominate instead of the operation.

    For all you know, just swapping the order in which you test may well result in operator * being faster "in most cases".
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    Try using the calculated values so the optimizer won't remove the calculations altogether. Something like the following.
    Code:
    #include <stdio.h>
    #include <stdlib.h> 
    #include <limits.h>
    #include <time.h>
      
    #define NUM_TRIALS    1000000
    #define NUM_REPS   1000000000
    #define SIZE               31
     
    inline unsigned long long int smult(long long m, long long n) {
      unsigned long long x = m ^ ((m ^ n) & -(m > n)),
        y = m ^ ((m ^ n) & -(m < n)),
        a = 0;
      int i = 0; 
      do
        a += (y & -(x & 1)) << (i++);
      while (x >>= 1);
      return a;
    }
    
    int main(int argc, char** argv) {
      clock_t begin;
      unsigned long long m = 2, n = 1, sum;
      double avg1, avg2;
      int trials, reps, i;
      FILE *fp = fopen(argv[1], "w");
      
      for(i = 0; i < SIZE; i++) {
        avg1 = avg2 = 0;
        sum = 0;
        for(trials = 0; trials < NUM_TRIALS; trials++) {   
    
          begin = clock();
          for(reps = 0; reps < NUM_REPS; reps++)
            sum += m * n;
          avg1 += begin - clock();
    
          begin = clock();
          for(reps = 0; reps < NUM_REPS; reps++)
            sum += smult(m, n);
          avg2 += begin - clock();
        }
        
        fprintf(fp, "%lld %d %g %g\n", sum, i,
            avg1 / CLOCKS_PER_SEC / NUM_TRIALS,
            avg2 / CLOCKS_PER_SEC / NUM_TRIALS);
        printf("%.2g%%\n", (double)i * 100 / SIZE);
        
        m = (m << 1) - 1;
        n <<= 1;
      }   
      
      fclose(fp);
      return 0;
    }
    Depending on how smart the optimizer is, this still might not work.
    BTW, how long does this take to run on your computer?
    Last edited by algorism; 12-27-2015 at 02:15 PM.

  9. #9
    Registered User
    Join Date
    Oct 2015
    Posts
    6
    Quote Originally Posted by algorism View Post
    Try using the calculated values so the optimizer won't remove the calculations altogether. Something like the following.


    Depending on how smart the optimizer is, this still might not work.
    BTW, how long does this take to run on your computer?
    Alright, so I've made the changes and I see what both of you mean now about useless calculations being optimized out. It was a classic facepalm moment when i saw the "sum += m*n".

    Yes, this is taking substantially more time to complete, and to be honest I didn't even bother waiting to see 0% pop up before stopping it and reducing the number of trials and reps down to 100 and 1000000, which now takes around 52 seconds to complete. It did seem terribly odd that I could run it that many times in such a short timeframe - not to mention I didn't see a linear increase in the time taken to perform smult. This is much betterProgram to Multiply Integers as Fast as Hardware?-benchmarks-jpg

    That being said, is it even possible to write a function to perform multiplication as quickly as the on-board multiplication operator?

  10. #10
    Registered User
    Join Date
    Oct 2015
    Posts
    6
    It seems that replacing the do while loop with

    Code:
    while( x > 1 )
    {
        (x & 1) && ( a += y );
        y <<= 1;
        x >>= 1;
    }
    I didn't think it would have been so I didn't even bother testing it, however its a fair bit faster.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. C program multiply matrices
    By YannB in forum C Programming
    Replies: 10
    Last Post: 11-15-2013, 12:06 AM
  2. Replies: 19
    Last Post: 10-01-2013, 10:54 PM
  3. Replies: 11
    Last Post: 08-05-2008, 01:42 PM
  4. program won't multiply right
    By Will_rookwood in forum C++ Programming
    Replies: 5
    Last Post: 04-11-2006, 07:28 PM
  5. program closes fast
    By MrSoy in forum C Programming
    Replies: 5
    Last Post: 04-10-2003, 08:00 PM

Tags for this Thread