Sorry.... In my implementation 44 overflows, not 43... the extreme case, where input is 43 takes 5639 clock cycles (~1.7 μs) on an i7-4770 @ 3.4 GHz:
Code:
$ for i in {2..43}; do ./lcm <<< $i; done
2 is lcm of the range [1..2]: 76 clock cycles.
6 is lcm of the range [1..3]: 199 clock cycles.
12 is lcm of the range [1..4]: 460 clock cycles.
60 is lcm of the range [1..5]: 644 clock cycles.
60 is lcm of the range [1..6]: 722 clock cycles.
420 is lcm of the range [1..7]: 891 clock cycles.
840 is lcm of the range [1..8]: 1064 clock cycles.
2520 is lcm of the range [1..9]: 1124 clock cycles.
2520 is lcm of the range [1..10]: 1148 clock cycles.
27720 is lcm of the range [1..11]: 1433 clock cycles.
27720 is lcm of the range [1..12]: 1490 clock cycles.
360360 is lcm of the range [1..13]: 1568 clock cycles.
360360 is lcm of the range [1..14]: 1701 clock cycles.
360360 is lcm of the range [1..15]: 1934 clock cycles.
720720 is lcm of the range [1..16]: 1916 clock cycles.
12252240 is lcm of the range [1..17]: 2095 clock cycles.
12252240 is lcm of the range [1..18]: 2146 clock cycles.
232792560 is lcm of the range [1..19]: 2608 clock cycles.
232792560 is lcm of the range [1..20]: 2668 clock cycles.
232792560 is lcm of the range [1..21]: 2762 clock cycles.
232792560 is lcm of the range [1..22]: 2796 clock cycles.
5354228880 is lcm of the range [1..23]: 2941 clock cycles.
5354228880 is lcm of the range [1..24]: 3191 clock cycles.
26771144400 is lcm of the range [1..25]: 3406 clock cycles.
26771144400 is lcm of the range [1..26]: 3542 clock cycles.
80313433200 is lcm of the range [1..27]: 3660 clock cycles.
80313433200 is lcm of the range [1..28]: 3406 clock cycles.
2329089562800 is lcm of the range [1..29]: 3760 clock cycles.
2329089562800 is lcm of the range [1..30]: 3895 clock cycles.
72201776446800 is lcm of the range [1..31]: 4146 clock cycles.
144403552893600 is lcm of the range [1..32]: 4246 clock cycles.
144403552893600 is lcm of the range [1..33]: 4358 clock cycles.
144403552893600 is lcm of the range [1..34]: 4276 clock cycles.
144403552893600 is lcm of the range [1..35]: 4470 clock cycles.
144403552893600 is lcm of the range [1..36]: 4346 clock cycles.
5342931457063200 is lcm of the range [1..37]: 4654 clock cycles.
5342931457063200 is lcm of the range [1..38]: 4733 clock cycles.
5342931457063200 is lcm of the range [1..39]: 5159 clock cycles.
5342931457063200 is lcm of the range [1..40]: 5165 clock cycles.
219060189739591200 is lcm of the range [1..41]: 4884 clock cycles.
219060189739591200 is lcm of the range [1..42]: 5280 clock cycles.
9419588158802421600 is lcm of the range [1..43]: 5639 clock cycles.
@christop approach using table lookup is even faster!
Here's the code:
Code:
/* lcm.c */
/*
Compile with:
gcc -O2 -o lcm lcm.c
*/
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include "cycle_counting.h"
uint64_t gcd( uint64_t, uint64_t );
uint64_t lcm( uint64_t, uint64_t );
int main( void )
{
int32_t max, i;
uint64_t n, oldn;
counter_T c;
/* Read from redirection (or keyboard) */
if ( scanf( "%" SCNi32, &max ) != 1 )
{
fputs( "\033[31;1mError\033[m: Cannot read upper limit.\n", stderr );
return EXIT_FAILURE;
}
if ( max < 2 )
{
fputs( "\033[31;1mError\033[m: Upper limit must be greater than 1.\n", stderr );
return EXIT_FAILURE;
}
/* I've checked if GCC is optimizing these calls and measuring
the wrong portion of code.... It is measuring correctly! */
c = BEGIN_TSC();
n = 2;
for ( i = 3; i <= max; i++ )
{
oldn = n;
n = lcm( n, i );
/* Need to test for overflow condition... */
if ( n < oldn )
{
fputs( "\033[31;1mError\033[m: Overflow!\n", stderr );
return EXIT_FAILURE;
}
}
END_TSC( &c );
printf( "%" PRIu64 " is lcm of the range [1..%" PRIi32"]: %" PRIu64 " clock cycles.\n",
n, max, c );
return EXIT_SUCCESS;
}
/* Euclidean algorithm (not the most efficient!) */
uint64_t gcd( uint64_t a, uint64_t b )
{
if ( ! b )
return a;
/* This will converge very fast to the right result, so
no harm to do it recursive!
But GCC will turn this code iterative anyways. */
return gcd( b, a % b );
}
uint64_t lcm( uint64_t a, uint64_t b ) { return ( a * b ) / gcd( a, b ); }
PS: Running a lot of processes right now, so 41 case were lucky. All others suffering from page faults and cache effects. But, notice, less than 2 μs in ALL cases (100000 times faster than the naive approach).