Micro-optimizing Fibonacci n-th term generator: a journey to perfection

**Mindaugas** · 12-30-2014

Hi,

Going through MIT's Introduction to Algorithms I'm developing a Fibonacci n-th term generator and want to make it as fast and as compact memory-vise as possible. The story:

1. Developed a naive, dumb version that does recursive calls:

Code:

return (NaiveFibonacciNthTermGenerator (the_xth_fibonacci_num - 1) + \ 
        NaiveFibonacciNthTermGenerator (the_xth_fibonacci_num - 2));

2. I remembered the way recursion calls work and how a a binary-tree like structure having recursive calls work

Micro-optimizing Fibonacci n-th term generator: a journey to perfection-two_firbonacci_functions-jpg

Micro-optimizing Fibonacci n-th term generator: a journey to perfection-two_firbonacci_functions-jpg

so I switched the terms in the recursive call (my intuition was that it would help for a memoized solution as the base case is reached faster in the switched scenario, but it turned out to help even for the naive one). I have tested many times with different combinations and I gain up to 1 second for n > 45. The number of times the function is executed is the same in both cases (checked by adding a counter):

Code:

return (NaiveFibonacciNthTermGenerator_switchedTerms (the_xth_fibonacci_num - 2) + \
         NaiveFibonacciNthTermGenerator_switchedTerms (the_xth_fibonacci_num - 1));

3. Then I started working on the memoization but stumbled upon the array that will contain the memoized solutions to the subproblems. Trying to optimize it for space I declared it like this:

Code:

unsigned long long int cache[the_xth_fibonacci_num - 3];

the "-3" supposed to save me some bytes (it's an array of long longs after all, so 24 bytes (3*8)), this is allowed by the fact that the base cases should not be in the cache[]. But now I'm not sure how should I handle it as this is a variable lenght array VLA.

Questions:
1. Why does the naive case with switched terms work faster than the one with not-swotched ones? I have swithed the function calls places (thinking that some caching might be involved - not sure how) - it is still consistently faster when n > 15. The same amount of function calls is made (I checked it) - why is it faster?
2. What is the correct solution for the cache[] problem? I can't place it as global constant-int-sized array, because it gets the value in the main() as the user enters the "the_xth_fibonacci_num" as the value he wants. Should I malloc it? Is this the best solution for minimizing time and space?
3. Any other performance tips - for saving time and space? Any tips for saving time sacrificing space and vice versa? I would appreaceate it

- I have placed the base case for n=2 to appear on top of n=1 as it will be reached more times and so should be executed first (so you see what level of obsession we are talking about here

).

Anything like "check the aseambly generated for x", "change the terms", "reorder", "use different datatypes" - anything would help me a lot.

Thanks in advance

**laserlight** · 12-30-2014

If you only want the nth term, not the terms before it, then consider using the formula to approximate the result: with appropriate rounding you should get the correct integer result with constant space and constant time (okay, not really because you have to compute powers).

**nonpuz** · 12-30-2014

Optimization should be done only as much as necessary and only when it is determined to be needed in the first place. You should also determine where the best places to optimize are (longest running code sections). In this case your algorithm in general needs optimization (as laserlight suggests) far more than the expression of that algorithm in code.

**fendor** · 12-30-2014

if you really want n fib numbers, and not just the n'th,
then you can easily implement an iterative algorithm, just make a loop, that takes the last 2 numbers of your array, add it togehter and save it in the array.
First 2 elements i would init by hand, since this is easier to implement.

this has linear time and linear space

**MutantJohn** · 12-30-2014

Originally Posted by fendor

if you really want n fib numbers, and not just the n'th,
then you can easily implement an iterative algorithm, just make a loop, that takes the last 2 numbers of your array, add it togehter and save it in the array.
First 2 elements i would init by hand, since this is easier to implement.

this has linear time and linear space

But can we make the time logarithmic? Can we?!

**grumpy** · 12-30-2014

There is a fairly simply mathematical expression - known since about the 17th century - that can be used. Easily expressed using three standard functions since C99. Albeit making use of floating point in intermediate steps.

No recursion, no need for additional storage (except for some intermediate results).

The root is 5

**fendor** · 12-30-2014

Originally Posted by MutantJohn

But can we make the time logarithmic? Can we?!

no, not as far as i know
if you want an array with every fib number till to n'th, you have to iterate through the whole list which causes the linear time. If you want only the n'th number, then i'd recommend grumpy and nonpuz, there is an mathematical expression, which is pretty accurate

if you want to prove mentioned function, you could have some fun with difference equation

takes only about... 2 pages, if you have a large writing ^^

**Mindaugas** · 12-31-2014

Ok, thanks - but I'm doing this not to solve the problem of "getting the N-th term in a Fib sequence" - this is just a problem that I tackle to learn optimisation and the thinking that is involved in it. The explicit problem "compute the n-th Fib number" is secondary to the implicit problem of learning optimisation techniques...

Therefore I will stick with microptimisations for the time being (hence the name of the thread

) this means that the algorithm will be recursive.
As a follow up problem I might codeup the approximation approach (i) or the array (ii) approach - but that is for latter.

Any answers on the questions I enumerated?

Thanks

**grumpy** · 12-31-2014

You ever heard of premature optimisation?

**MutantJohn** · 12-31-2014

Originally Posted by grumpy

You ever heard of premature optimisation?

Yeah, I use to have that problem with my girlfriend when we first started dating. Oh wait... That was a different thing...

**nonpuz** · 12-31-2014

Originally Posted by Mindaugas

Ok, thanks - but I'm doing this not to solve the problem of "getting the N-th term in a Fib sequence" - this is just a problem that I tackle to learn optimisation and the thinking that is involved in it. The explicit problem "compute the n-th Fib number" is secondary to the implicit problem of learning optimisation techniques...

Therefore I will stick with microptimisations for the time being (hence the name of the thread

) this means that the algorithm will be recursive.
As a follow up problem I might codeup the approximation approach (i) or the array (ii) approach - but that is for latter.

Any answers on the questions I enumerated?

Thanks

You miss the point entirely. What you're doing has absolutely no value now or in the future. You are not learning any valuable skill from doing this, you are not learning anything other than how to waste time. That is why we keep telling you about premature optimization. Also this "micro-optimization" is made up and pointless. If you really want to get into low-level optimization then you should be examining the generated assembly from this program and working from there.

**iMalc** · 12-31-2014

Originally Posted by Mindaugas

Therefore I will stick with microptimisations for the time being (hence the name of the thread

) this means that the algorithm will be recursive.

You don't seem to have any idea how bad the exponential time taken by the naive recursive approach is.
Doubly-recursively calculating the Fib of N+1 takes more than twice the effort of calculating the Fib of N.
It's so bad that we're talking over a billion times slower to calculate even Fib(50) recursive rather than iteratively. Should you want to go to say Fib(100), it takes so long that you would literally grow old and die before it finished.
But if done using a simple loop with two variables, one can easily calculate Fib(100) in less than a millisecond.

Knowing that, I will also tell you that an iterative loop can be translated to a simple form of single-recursion, sort of the opposite of doing the tail-recursion elimination.
Tail recursion elimination is a useful skill to learn, but going the other way - not so much.

**Nominal Animal** · 12-31-2014

Laserlight, Grumpy and fendor refer to Binet's formula, which was actually found by Abraham de Moivre:

F_n = ϕⁿ/√5 - (-ϕ)ⁿ/√5

where ϕ is the golden ratio,

ϕ = (1 + √5) / 2

Since |(-ϕ)ⁿ/√5| is always less than one half for any nonnegative n, you can use

F_n = ⌊ ϕⁿ/√5 + 0.5 ⌋

where ⌊⌋ denotes rounding towards zero, i.e. floor function.

Using the standard types, both the recursive and the closed form (Binet's formula) are limited to a rather small sequence, as

F₄₆ < 2³¹, F₄₇ < 2³², but F₄₈ > 2³²
F₇₈ < 2⁵³, but F₇₉ > 2⁵³
F₉₂ < 2⁶³, F₉₃ < 2⁶⁴, but F₉₄ > 2⁶⁴

In other words, unsigned 32-bit integers can calculate up to and including F₄₇, and signed 32-bit integers up to and including F₄₆. Unsigned 64-bit integers can calculate up to and including F₉₃, and signed 64-bit integers up to and including F₉₂. You should use C99 uint32_t, int32_t, uint64_t, or int64_t types provided by inttypes.h or stdint.h for these, as the size of int and long and long long is up to the compiler to decide.

Most architectures nowadays implement the C99 double type using IEEE-754 Binary64 format which has 53-bit mantissa (bits of precision), so you cannot expect Binet's formula (or the round-towards-zero version derived from it) to be accurate beyond F₇₈ or so. x86 and x86_64 architectures' 387 FPU also provides an 80-bit floating-point format with 64 bits of precision, which should allow accurate evaluation up to F₉₃ or so. Some architectures may provide even more precise long double. In C99, you can use pow(FLT_RADIX, DBL_MANT_DIG) or powl(FLT_RADIX, LDBL_MANT_DIG) to find out how large integers the double resp. long double type can represent exactly. (On all machines I know of, except for IBM 360, FLT_RADIX is 2, and therefore DBL_MANT_DIG and LDBL_MANT_DIG directly define the number of bits of precision in the mantissa.)

For larger n, you need arbitrary-precision integers. For C, there are lots of libraries that provide all functionality needed -- I recommend GSL (manual) --, but implementing one yourself for nonnegative integers (and only for addition, subtraction, and multiplication operations, and conversions to/from decimal strings) in C99 is straightforward -- particularly using uint32_t as the base "word" type, and uint64_t as a high-precision intermediate type, to make carrying trivial and code portable.

If you write your own arbitrary-precision integer stuff, but wish to avoid having to write a divide-by-ten function for conversion to a string, you can pack nine decimal digits (000000000 to 999999999) in a 30-bit integer: your "words" are then groups of nine decimal digits. Although this typically wastes 2 out of every 32 bits, or 6.25% of memory used (and is slower time-wise too, because more "words" are needed to represent a number), it makes the string conversion code -- as well as any other operation that deals with specific decimal digits -- very fast and simple to implement. For addition, subtraction, and multiplication operations, only the numerical limits/constants are affected.

In practice, the relation

F_n = F_n-1 + F_n-2

is rarely used. Instead, the recurrence relations

F_m+n-1 = F_m F_n + F_m-1 F_n-1
F_m+n = F_m F_n+1 + F_m-1 F_n = F_m (F_n + F_n-1) + F_m-1 F_n

and

F_2n-1 = F_n² + F_n-1²
F_2n = F_n (F_n + 2 F_n-1)

are used to calculate the desired Fibonacci number; only the initial number for continuous sequences. Using the approach outlined below, this takes O(log n) arithmetic operations -- but note that each arithmetic operation on arbitrary-precision integers is now at least O(N) operations, where N is the number of digits (or "words") in the number.

The key is to realize you can write n as a (binary) sum,

n = b₀2⁰ + b₁2¹ + ... + b_k2^k

which means you can write

F_n = F_{b₀2⁰ + b₁2¹ + ... + b_k2^k}

In other words, you use the binary representation of n and powers-of-two and powers-of-two-less-one Fibonacci numbers (F₀, F₁, F₂, F₃, F₄, F₇, F₈, F₁₅, F₁₆, F₃₁, F₃₂, and so on) with the F_m+n and F_m+n-1 formulas to construct the desired Fibonacci number (and the number preceding it for the intermediate terms).

Here is the pseudocode for a generic Fibonacci number function:

Code:

function Fibonacci(n):

    if (n == 0) then
        return 0 /* fibonacci[0] */
    end if

    if (n < 0) then
        n = -n
        negate = (n is even)
    else
        negate = (false)
    end if

    if (n == 1) then
        return 1 /* fibonacci[1] == fibonacci[-1] */
    end if

    factor_prev = 0 /* fibonacci[0] */
    factor_curr = 1 /* fibonacci[1] */

    if (n is odd) then
        result_prev = 0 /* fibonacci[0] */
        result_curr = 1 /* fibonacci[1] */
    else
        result_prev = 1 /* fibonacci[-1] */
        result_curr = 0 /* fibonacci[0] */
    end if

    n = n / 2   /* This must be integer division. */

    while (n > 0)

        /* F(2*n-1) = F(n)*F(n) + F(n-1)*F(n-1) */
        helper_prev = factor_curr * factor_curr + factor_prev * factor_prev

        /* F(2*n) = F(n) * ( F(n) + F(n-1) + F(n-1) ) */
        helper_curr = factor_curr * (factor_curr + factor_prev + factor_prev)

        /* Double the factor */
        factor_prev = helper_prev
        factor_curr = helper_curr

        /* If n has LSB set, combine factor and result: F(result) = F(factor+result) */
        if (n is odd) then
            /* F(m+n-1) = F(m)*F(n) + F(m-1)*F(n-1) */
            helper_prev = factor_curr * result_curr + factor_prev * result_prev

            /* F(m+n) = F(m)*F(n+1) + F(m-1)*F(n)
                      = F(m)*(F(n-1) + F(n)) + F(m-1)*F(n) */
            helper_curr = factor_curr * (result_curr + result_prev) + factor_prev * result_prev

            /* Combine */
            result_prev = helper_prev
            result_curr = helper_curr
        end if

        n = n / 2   /* Integer division here too. */

    end while

    if (negate) then
        result_curr = -result_curr
    end if

    return result_curr
end function

You can precalculate F_2^k and F_2^k-1 for some range of k, but the memory use is an issue: F_2^k has 0.694242×2^k-1.160964 binary digits, and requires that many bits of memory. Precalculating larger k means you reserve a lot of memory for data that ends up being used only once per Fibonacci number calculated.

The slowest operation for large n is multiplication. This means that if each multiplication takes O(m(n)) time, the arbitrary-precision integer implementation that calculates F_n takes O(m(n) log n) time. Simple naive multiplication -- normal long multiplication, done by hand for example -- is O(N²), so a very straightforward implementation in C99 should be able to calculate an arbitrary Fibonacci number in O(n² log n) time.

In practice, most current libraries and applications use naive multiplication for small arbitrary-precision integers, Toom-Cook multiplication for intermediate-size arbitrary-precision integers, and Schönhage-Strassen algorithm for largest arbitrary-precision integers: a discrete Fourier transform is taken from the multiplicands' digits ("words"), multiplied together digit-wise ("word"-wise), converted back using the inverse transform, followed by a carrying pass over the digits ("words"). This has only O(N log N log log N) time complexity (where N is the number of digits in each multiplicand, the multiplicands assumed to be of the same size), and is noticeably faster than the O(N²) for naive multiplication for larger N. The cutoff points depends on the implementation, and hardware cache architecture tends to add another wrinkle to serious optimization efforts.

Calculating and printing a sequence of N Fibonacci numbers starting with

previous = F_k-1
current = F_k

is trivial; in pseudocode,

Code:

repeat N times:
    print current
    temporary = previous
    previous = current
    current = current + temporary

In my opinion, the above pseudocode should be even more obvious than the recursive function.

My point: the journey to "perfection" mentioned in the thread title is pretty misleading. Each algorithm and each implementation is a balance between multiple factors, many of which are unknown beforehand. I'm hoping the OP and any others reading this thread realize that full understanding of the strengths and limitations of the implementation (and of the algorithm implemented) is much more important, and much more valuable than any warm-and-fuzzy feelings about perfection.

(In a perfect world, those insights about strengths and limitations would be documented in comments in the implementation, but I am myself still struggling to learn to do that...)

**grumpy** · 12-31-2014

Originally Posted by MutantJohn

Yeah, I use to have that problem with my girlfriend when we first started dating. Oh wait... That was a different thing...

That depends on the girlfriend, and how early she starts trying to improve you.

**the_jackass** · 01-01-2015

Eh I'd like to add one other thing to what Nominal Animal said. Multiplication by single "digit" (as in a digit with the base you're using...pow(10,9) in his example) is faster and easier in many cases than when both numbers have multiple "digits" (you would need less intermediate arbritary precision numbers for one) if the numbers youre working with fit within sane range of O(n^2) multiplication. (That's upto about 10 to 20 thousand base 10 digits.)

Thread: Micro-optimizing Fibonacci n-th term generator: a journey to perfection

Thread Tools

Search Thread

Display

Micro-optimizing Fibonacci n-th term generator: a journey to perfection

Similar Threads

Lagged Fibonacci generator

Looking for source code of Lagged Fibonacci generator

A journey to nowhere

Magneto's journey to becoming a C++ expert.

Fibonacci How should I make a generator

Tags for this Thread