project Euler problem 5 solution

**christop** · 06-03-2019

Originally Posted by flp1969

@cristop. Hummmm... I didn't thought about using a lookup table (makes sense, since there are only 42 possible values for the upper range before a unsigned long long overflows!). My implementation uses the fact that the "least common multiple" can be calculates as:

lcm(a,b) = (a * b) / gcd(a, b);

Since we are checking for a range of values:

n = lcm(2, lcm(3, lcm(4, ... lcm(n-1, n)...)));

Using a binary implementation for gcd() I got a 2 ms runtime for max=42 (43 overflows).

My implementation uses an array of factors (with values from 1 to n). It first sets the result to 1. For each factor in the array, it multiplies the result by that factor, and then each following factor in the array is divided by the current factor, if that factor is divisible by the current factor.

In pseudo-code:

Code:

for i from 0 to n-1:
    factors[i] = i+1
result = 1
for i from 0 to n-1:
    result *= factors[i]
    for j from i+1 to n-1:
        if (factors[j] % factors[i]) == 0:
            factors[j] /= factors[i]

This has O(n^2) time complexity.

**Niccolo** · 06-03-2019

@cooper1200 gcd = greatest common denominator.

If you're focused on optimization you'll discover the largest benefits come from algorithmic changes, but once the algorithm is about as good as it can get then comes fine adjustments which must acknowledge the hardware's behavior like cache, branch prediction (or errors related), various CPU resources (can you avoid division - modulus is a form of division in the CPU, one of the slower instructions), etc.

I would argue that since there are 600+ problems on the Euler Project, it may not be the best choice to overfocus on high speed. All of us get a little OCD about it, but in particular it's tough while learning C/C++. You'll get through the algorithmic stage, which you're close to polishing off now, but then there's all those little, microscopic things to learn about which are actually a diversion from the subject at hand.

There's a two month stint where I can't say much else was accomplished as I studied variations on the quicksort/introsort algorithm, fashioning a policy based template class family which allows me to implement threaded sorting, using 2, 3 or more pivots. Much of that time was spent running benchmarks, recording data (making a decent scientific level study of the various adjustments, noting what was applicable to various purposes).

Now, I did end up with a policy based library that offers me sorting in many forms, specialized for various circumstances. Some are best when the comparison function is heavy, others are best when CPU cache is large (20Mbytes+), and some proved that over history, some points about quicksort taught on these algorithms was developed in previous epoch's hardware and require a bit of an update.

My point is that don't sweat much if you don't get a 1ms runtime or faster. Come back to it if you want to get that far.

**flp1969** · 06-03-2019

Originally Posted by cooper1200

whats gcd??

Greatest Common Divisor

**cooper1200** · 06-03-2019

ok give me a hint please flp is lcm recursive?

**cooper1200** · 06-03-2019

i wasn't i was just surprised that as the smallest number that is devisable by the numbers 1 though 19 is the same as the number devisable by the numbers 1 through 22 ie same computation etc etc i was supprised that working out n = 22 was quicker than n = 19.

ie if i wrote at a uniform rate and didn't take any breaks and wrote out all the numbers 1 to 232792560 and divided each number by 1 through 19 it would take x time therefore as 232792560 is also devisable by the numbers 1 though 22 it should also take me x time.

**flp1969** · 06-03-2019

No... lcm isn't recursive

**Niccolo** · 06-03-2019

@cooper1200

if i wrote at a uniform rate and didn't take any breaks and wrote out all the numbers 1 to 232792560 and divided each number by 1 through 19 it would take x time therefore as 232792560 is also devisable by the numbers 1 though 22 it should also take me x time.

Yet, the software is working from 22 (or 19) toward 1. If you were doing it by hand and the candidate /22 or 21 failed, you've move to the next one. It so happens that the pattern is what matters, not the digits involved. How many of those in the sequence fail with 21 (and thus move to the number candidate sooner) that pass with 19?

**Click_here** · 06-03-2019

It's generally frowned upon to publically post Project Euler answers - You wouldn't be the first through...

You've made me look up how I did it - It was on my Google Drive, done on my old computer!

I was able to do it with 2 for loops.

My strategy was to start at var = 20 -> Add 20 to it until you find a number that isdivisible by 19. Try 18, 17, 16, ... , 3, 2

If any of these were not divisible, increment by 20 and keep looking...

By adding 20 at a time I was able to move faster...

Looking back at it now, I could have sped it up...

One way - You only need to check if the number is dividible by the numbers 20 to 11.

The reason why is that if a number is dividible by 20, it is also dividible by 2 and 5 (20 = 2 x 2 x 5) - it is also divisible by 4 (2x2) and 10 (2x5)

Going on...
19..
18 (3x3x2) -> So this is divisible by 2, 3, 6 (2x3), 9 (3x3)
17..
16 (2x2x2x2) -> 2, 4, 8
15 (3x5) -> 3, 5
14 (2x7) -> 2, 7
13..
12 (2x2x3) -> 2, 3, 4, 6
11..

10 -> Already tested by 20
9 -> Already tested by 18
8 -> Already tested by 16
7 -> Already tested by 14
6 -> Already tested by 12
5 -> Already tested by 20
4 -> Already tested by 20
3 -> Already tested by 18
2 -> Already tested by 20

**cooper1200** · 06-03-2019

i did look at that and manged to exclude 15 as well as that made no difference however using an if like if (devisor == 10 || devisor ==9 etc it made the code slower by as much as 200ths of a second

**Click_here** · 06-03-2019

System - Windows 10, 64bit, i5

Average 0.078 Seconds (best 0.062)

**cooper1200** · 06-03-2019

i think your soloution is pretty much like my second

**Click_here** · 06-03-2019

Just quickly added in

Code:

            if (divisor == 15)
                continue;

And ran at 0.015!

It is averaging 0.032

Note I also have the compiler option -O3 (Max speed)

**Click_here** · 06-03-2019

Originally Posted by cooper1200

i think your soloution is pretty much like my second

Looking back is is near identical, except

Code:

for (devisor = 20; devisor >= 2; devisor--)
{
  ...
}

// Has been changed to
for (devisor = 20; devisor >= 11; devisor--)
{
  if (divisor == 15)
    continue; 
  ...
}

**flp1969** · 06-03-2019

Sorry.... In my implementation 44 overflows, not 43... the extreme case, where input is 43 takes 5639 clock cycles (~1.7 μs) on an i7-4770 @ 3.4 GHz:

Code:

$ for i in {2..43}; do ./lcm <<< $i; done
2 is lcm of the range [1..2]: 76 clock cycles.
6 is lcm of the range [1..3]: 199 clock cycles.
12 is lcm of the range [1..4]: 460 clock cycles.
60 is lcm of the range [1..5]: 644 clock cycles.
60 is lcm of the range [1..6]: 722 clock cycles.
420 is lcm of the range [1..7]: 891 clock cycles.
840 is lcm of the range [1..8]: 1064 clock cycles.
2520 is lcm of the range [1..9]: 1124 clock cycles.
2520 is lcm of the range [1..10]: 1148 clock cycles.
27720 is lcm of the range [1..11]: 1433 clock cycles.
27720 is lcm of the range [1..12]: 1490 clock cycles.
360360 is lcm of the range [1..13]: 1568 clock cycles.
360360 is lcm of the range [1..14]: 1701 clock cycles.
360360 is lcm of the range [1..15]: 1934 clock cycles.
720720 is lcm of the range [1..16]: 1916 clock cycles.
12252240 is lcm of the range [1..17]: 2095 clock cycles.
12252240 is lcm of the range [1..18]: 2146 clock cycles.
232792560 is lcm of the range [1..19]: 2608 clock cycles.
232792560 is lcm of the range [1..20]: 2668 clock cycles.
232792560 is lcm of the range [1..21]: 2762 clock cycles.
232792560 is lcm of the range [1..22]: 2796 clock cycles.
5354228880 is lcm of the range [1..23]: 2941 clock cycles.
5354228880 is lcm of the range [1..24]: 3191 clock cycles.
26771144400 is lcm of the range [1..25]: 3406 clock cycles.
26771144400 is lcm of the range [1..26]: 3542 clock cycles.
80313433200 is lcm of the range [1..27]: 3660 clock cycles.
80313433200 is lcm of the range [1..28]: 3406 clock cycles.
2329089562800 is lcm of the range [1..29]: 3760 clock cycles.
2329089562800 is lcm of the range [1..30]: 3895 clock cycles.
72201776446800 is lcm of the range [1..31]: 4146 clock cycles.
144403552893600 is lcm of the range [1..32]: 4246 clock cycles.
144403552893600 is lcm of the range [1..33]: 4358 clock cycles.
144403552893600 is lcm of the range [1..34]: 4276 clock cycles.
144403552893600 is lcm of the range [1..35]: 4470 clock cycles.
144403552893600 is lcm of the range [1..36]: 4346 clock cycles.
5342931457063200 is lcm of the range [1..37]: 4654 clock cycles.
5342931457063200 is lcm of the range [1..38]: 4733 clock cycles.
5342931457063200 is lcm of the range [1..39]: 5159 clock cycles.
5342931457063200 is lcm of the range [1..40]: 5165 clock cycles.
219060189739591200 is lcm of the range [1..41]: 4884 clock cycles.
219060189739591200 is lcm of the range [1..42]: 5280 clock cycles.
9419588158802421600 is lcm of the range [1..43]: 5639 clock cycles.

@christop approach using table lookup is even faster!

Here's the code:

Code:

/* lcm.c */
/*
  Compile with:

    gcc -O2 -o lcm lcm.c
*/
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include "cycle_counting.h"

uint64_t gcd( uint64_t, uint64_t );
uint64_t lcm( uint64_t, uint64_t );

int main( void )
{
  int32_t max, i;
  uint64_t n, oldn;
  counter_T c;

  /* Read from redirection (or keyboard) */
  if ( scanf( "%" SCNi32, &max ) != 1 )
  {
    fputs( "\033[31;1mError\033[m: Cannot read upper limit.\n", stderr );
    return EXIT_FAILURE;
  }

  if ( max < 2 )
  {
    fputs( "\033[31;1mError\033[m: Upper limit must be greater than 1.\n", stderr );
    return EXIT_FAILURE;
  }

  /* I've checked if GCC is optimizing these calls and measuring
     the wrong portion of code.... It is measuring correctly! */
  c = BEGIN_TSC();

  n = 2;

  for ( i = 3; i <= max; i++ )
  {
    oldn = n;
    n = lcm( n, i );

    /* Need to test for overflow condition... */
    if ( n < oldn )
    {
      fputs( "\033[31;1mError\033[m: Overflow!\n", stderr );
      return EXIT_FAILURE;
    }
  }

  END_TSC( &c );

  printf( "%" PRIu64 " is lcm of the range [1..%" PRIi32"]: %" PRIu64 " clock cycles.\n",
    n, max, c );

  return EXIT_SUCCESS;
}

/* Euclidean algorithm (not the most efficient!) */
uint64_t gcd( uint64_t a, uint64_t b )
{
  if ( ! b )
    return a;

  /* This will converge very fast to the right result, so
     no harm to do it recursive!
     But GCC will turn this code iterative anyways. */
  return gcd( b, a % b );
}

uint64_t lcm( uint64_t a, uint64_t b ) { return ( a * b ) / gcd( a, b ); }

PS: Running a lot of processes right now, so 41 case were lucky. All others suffering from page faults and cache effects. But, notice, less than 2 μs in ALL cases (100000 times faster than the naive approach).