Thread: Micro-optimizing Fibonacci n-th term generator: a journey to perfection

  1. #1
    Registered User Mindaugas's Avatar
    Join Date
    Dec 2014
    Location
    Kaunas, Lithuania
    Posts
    4

    Micro-optimizing Fibonacci n-th term generator: a journey to perfection

    Hi,

    Going through MIT's Introduction to Algorithms I'm developing a Fibonacci n-th term generator and want to make it as fast and as compact memory-vise as possible. The story:

    1. Developed a naive, dumb version that does recursive calls:
    Code:
    return (NaiveFibonacciNthTermGenerator (the_xth_fibonacci_num - 1) + \ 
            NaiveFibonacciNthTermGenerator (the_xth_fibonacci_num - 2));
    2. I remembered the way recursion calls work and how a a binary-tree like structure having recursive calls work

    Micro-optimizing Fibonacci n-th term generator: a journey to perfection-two_firbonacci_functions-jpg

    so I switched the terms in the recursive call (my intuition was that it would help for a memoized solution as the base case is reached faster in the switched scenario, but it turned out to help even for the naive one). I have tested many times with different combinations and I gain up to 1 second for n > 45. The number of times the function is executed is the same in both cases (checked by adding a counter):
    Code:
    return (NaiveFibonacciNthTermGenerator_switchedTerms (the_xth_fibonacci_num - 2) + \
             NaiveFibonacciNthTermGenerator_switchedTerms (the_xth_fibonacci_num - 1));
    3. Then I started working on the memoization but stumbled upon the array that will contain the memoized solutions to the subproblems. Trying to optimize it for space I declared it like this:

    Code:
    unsigned long long int cache[the_xth_fibonacci_num - 3];
    the "-3" supposed to save me some bytes (it's an array of long longs after all, so 24 bytes (3*8)), this is allowed by the fact that the base cases should not be in the cache[]. But now I'm not sure how should I handle it as this is a variable lenght array VLA.

    Questions:
    1. Why does the naive case with switched terms work faster than the one with not-swotched ones? I have swithed the function calls places (thinking that some caching might be involved - not sure how) - it is still consistently faster when n > 15. The same amount of function calls is made (I checked it) - why is it faster?
    2. What is the correct solution for the cache[] problem? I can't place it as global constant-int-sized array, because it gets the value in the main() as the user enters the "the_xth_fibonacci_num" as the value he wants. Should I malloc it? Is this the best solution for minimizing time and space?
    3. Any other performance tips - for saving time and space? Any tips for saving time sacrificing space and vice versa? I would appreaceate it - I have placed the base case for n=2 to appear on top of n=1 as it will be reached more times and so should be executed first (so you see what level of obsession we are talking about here ).

    Anything like "check the aseambly generated for x", "change the terms", "reorder", "use different datatypes" - anything would help me a lot.

    Thanks in advance

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    27,730
    If you only want the nth term, not the terms before it, then consider using the formula to approximate the result: with appropriate rounding you should get the correct integer result with constant space and constant time (okay, not really because you have to compute powers).
    Last edited by laserlight; 12-30-2014 at 03:50 PM.

  3. #3
    Ultraviolence Connoisseur
    Join Date
    Mar 2004
    Posts
    555
    Optimization should be done only as much as necessary and only when it is determined to be needed in the first place. You should also determine where the best places to optimize are (longest running code sections). In this case your algorithm in general needs optimization (as laserlight suggests) far more than the expression of that algorithm in code.

  4. #4
    Registered User
    Join Date
    Dec 2014
    Posts
    11
    if you really want n fib numbers, and not just the n'th,
    then you can easily implement an iterative algorithm, just make a loop, that takes the last 2 numbers of your array, add it togehter and save it in the array.
    First 2 elements i would init by hand, since this is easier to implement.

    this has linear time and linear space

  5. #5
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Quote Originally Posted by fendor View Post
    if you really want n fib numbers, and not just the n'th,
    then you can easily implement an iterative algorithm, just make a loop, that takes the last 2 numbers of your array, add it togehter and save it in the array.
    First 2 elements i would init by hand, since this is easier to implement.

    this has linear time and linear space
    But can we make the time logarithmic? Can we?!

  6. #6
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    There is a fairly simply mathematical expression - known since about the 17th century - that can be used. Easily expressed using three standard functions since C99. Albeit making use of floating point in intermediate steps.

    No recursion, no need for additional storage (except for some intermediate results).

    The root is 5
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  7. #7
    Registered User
    Join Date
    Dec 2014
    Posts
    11
    Quote Originally Posted by MutantJohn View Post
    But can we make the time logarithmic? Can we?!
    no, not as far as i know
    if you want an array with every fib number till to n'th, you have to iterate through the whole list which causes the linear time. If you want only the n'th number, then i'd recommend grumpy and nonpuz, there is an mathematical expression, which is pretty accurate

    if you want to prove mentioned function, you could have some fun with difference equation takes only about... 2 pages, if you have a large writing ^^

  8. #8
    Registered User Mindaugas's Avatar
    Join Date
    Dec 2014
    Location
    Kaunas, Lithuania
    Posts
    4
    Ok, thanks - but I'm doing this not to solve the problem of "getting the N-th term in a Fib sequence" - this is just a problem that I tackle to learn optimisation and the thinking that is involved in it. The explicit problem "compute the n-th Fib number" is secondary to the implicit problem of learning optimisation techniques...

    Therefore I will stick with microptimisations for the time being (hence the name of the thread ) this means that the algorithm will be recursive.
    As a follow up problem I might codeup the approximation approach (i) or the array (ii) approach - but that is for latter.

    Any answers on the questions I enumerated? Thanks
    Last edited by Mindaugas; 12-31-2014 at 04:53 AM.

  9. #9
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    You ever heard of premature optimisation?
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  10. #10
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Quote Originally Posted by grumpy View Post
    You ever heard of premature optimisation?
    Yeah, I use to have that problem with my girlfriend when we first started dating. Oh wait... That was a different thing...

  11. #11
    Ultraviolence Connoisseur
    Join Date
    Mar 2004
    Posts
    555
    Quote Originally Posted by Mindaugas View Post
    Ok, thanks - but I'm doing this not to solve the problem of "getting the N-th term in a Fib sequence" - this is just a problem that I tackle to learn optimisation and the thinking that is involved in it. The explicit problem "compute the n-th Fib number" is secondary to the implicit problem of learning optimisation techniques...

    Therefore I will stick with microptimisations for the time being (hence the name of the thread ) this means that the algorithm will be recursive.
    As a follow up problem I might codeup the approximation approach (i) or the array (ii) approach - but that is for latter.

    Any answers on the questions I enumerated? Thanks
    You miss the point entirely. What you're doing has absolutely no value now or in the future. You are not learning any valuable skill from doing this, you are not learning anything other than how to waste time. That is why we keep telling you about premature optimization. Also this "micro-optimization" is made up and pointless. If you really want to get into low-level optimization then you should be examining the generated assembly from this program and working from there.

  12. #12
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Mindaugas View Post
    Therefore I will stick with microptimisations for the time being (hence the name of the thread ) this means that the algorithm will be recursive.
    You don't seem to have any idea how bad the exponential time taken by the naive recursive approach is.
    Doubly-recursively calculating the Fib of N+1 takes more than twice the effort of calculating the Fib of N.
    It's so bad that we're talking over a billion times slower to calculate even Fib(50) recursive rather than iteratively. Should you want to go to say Fib(100), it takes so long that you would literally grow old and die before it finished.
    But if done using a simple loop with two variables, one can easily calculate Fib(100) in less than a millisecond.

    Knowing that, I will also tell you that an iterative loop can be translated to a simple form of single-recursion, sort of the opposite of doing the tail-recursion elimination.
    Tail recursion elimination is a useful skill to learn, but going the other way - not so much.
    Last edited by iMalc; 12-31-2014 at 03:16 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  13. #13
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Laserlight, Grumpy and fendor refer to Binet's formula, which was actually found by Abraham de Moivre:
    Fn = ϕn/√5 - (-ϕ)n/√5
    where ϕ is the golden ratio,
    ϕ = (1 + √5) / 2
    Since |(-ϕ)n/√5| is always less than one half for any nonnegative n, you can use
    Fn = ⌊ ϕn/√5 + 0.5 ⌋
    where ⌊⌋ denotes rounding towards zero, i.e. floor function.

    Using the standard types, both the recursive and the closed form (Binet's formula) are limited to a rather small sequence, as
    F46 < 231, F47 < 232, but F48 > 232
    F78 < 253, but F79 > 253
    F92 < 263, F93 < 264, but F94 > 264
    In other words, unsigned 32-bit integers can calculate up to and including F47, and signed 32-bit integers up to and including F46. Unsigned 64-bit integers can calculate up to and including F93, and signed 64-bit integers up to and including F92. You should use C99 uint32_t, int32_t, uint64_t, or int64_t types provided by inttypes.h or stdint.h for these, as the size of int and long and long long is up to the compiler to decide.

    Most architectures nowadays implement the C99 double type using IEEE-754 Binary64 format which has 53-bit mantissa (bits of precision), so you cannot expect Binet's formula (or the round-towards-zero version derived from it) to be accurate beyond F78 or so. x86 and x86_64 architectures' 387 FPU also provides an 80-bit floating-point format with 64 bits of precision, which should allow accurate evaluation up to F93 or so. Some architectures may provide even more precise long double. In C99, you can use pow(FLT_RADIX, DBL_MANT_DIG) or powl(FLT_RADIX, LDBL_MANT_DIG) to find out how large integers the double resp. long double type can represent exactly. (On all machines I know of, except for IBM 360, FLT_RADIX is 2, and therefore DBL_MANT_DIG and LDBL_MANT_DIG directly define the number of bits of precision in the mantissa.)

    For larger n, you need arbitrary-precision integers. For C, there are lots of libraries that provide all functionality needed -- I recommend GSL (manual) --, but implementing one yourself for nonnegative integers (and only for addition, subtraction, and multiplication operations, and conversions to/from decimal strings) in C99 is straightforward -- particularly using uint32_t as the base "word" type, and uint64_t as a high-precision intermediate type, to make carrying trivial and code portable.

    If you write your own arbitrary-precision integer stuff, but wish to avoid having to write a divide-by-ten function for conversion to a string, you can pack nine decimal digits (000000000 to 999999999) in a 30-bit integer: your "words" are then groups of nine decimal digits. Although this typically wastes 2 out of every 32 bits, or 6.25% of memory used (and is slower time-wise too, because more "words" are needed to represent a number), it makes the string conversion code -- as well as any other operation that deals with specific decimal digits -- very fast and simple to implement. For addition, subtraction, and multiplication operations, only the numerical limits/constants are affected.

    In practice, the relation
    Fn = Fn-1 + Fn-2
    is rarely used. Instead, the recurrence relations
    Fm+n-1 = Fm Fn + Fm-1 Fn-1
    Fm+n = Fm Fn+1 + Fm-1 Fn = Fm (Fn + Fn-1) + Fm-1 Fn
    and
    F2n-1 = Fn2 + Fn-12
    F2n = Fn (Fn + 2 Fn-1)
    are used to calculate the desired Fibonacci number; only the initial number for continuous sequences. Using the approach outlined below, this takes O(log n) arithmetic operations -- but note that each arithmetic operation on arbitrary-precision integers is now at least O(N) operations, where N is the number of digits (or "words") in the number.

    The key is to realize you can write n as a (binary) sum,
    n = b020 + b121 + ... + bk2k
    which means you can write
    Fn = Fb020 + b121 + ... + bk2k
    In other words, you use the binary representation of n and powers-of-two and powers-of-two-less-one Fibonacci numbers (F0, F1, F2, F3, F4, F7, F8, F15, F16, F31, F32, and so on) with the Fm+n and Fm+n-1 formulas to construct the desired Fibonacci number (and the number preceding it for the intermediate terms).

    Here is the pseudocode for a generic Fibonacci number function:
    Code:
    function Fibonacci(n):
    
        if (n == 0) then
            return 0 /* fibonacci[0] */
        end if
    
        if (n < 0) then
            n = -n
            negate = (n is even)
        else
            negate = (false)
        end if
    
        if (n == 1) then
            return 1 /* fibonacci[1] == fibonacci[-1] */
        end if
    
        factor_prev = 0 /* fibonacci[0] */
        factor_curr = 1 /* fibonacci[1] */
    
        if (n is odd) then
            result_prev = 0 /* fibonacci[0] */
            result_curr = 1 /* fibonacci[1] */
        else
            result_prev = 1 /* fibonacci[-1] */
            result_curr = 0 /* fibonacci[0] */
        end if
    
        n = n / 2   /* This must be integer division. */
    
        while (n > 0)
    
            /* F(2*n-1) = F(n)*F(n) + F(n-1)*F(n-1) */
            helper_prev = factor_curr * factor_curr + factor_prev * factor_prev
    
            /* F(2*n) = F(n) * ( F(n) + F(n-1) + F(n-1) ) */
            helper_curr = factor_curr * (factor_curr + factor_prev + factor_prev)
    
            /* Double the factor */
            factor_prev = helper_prev
            factor_curr = helper_curr
    
            /* If n has LSB set, combine factor and result: F(result) = F(factor+result) */
            if (n is odd) then
                /* F(m+n-1) = F(m)*F(n) + F(m-1)*F(n-1) */
                helper_prev = factor_curr * result_curr + factor_prev * result_prev
    
                /* F(m+n) = F(m)*F(n+1) + F(m-1)*F(n)
                          = F(m)*(F(n-1) + F(n)) + F(m-1)*F(n) */
                helper_curr = factor_curr * (result_curr + result_prev) + factor_prev * result_prev
    
                /* Combine */
                result_prev = helper_prev
                result_curr = helper_curr
            end if
    
            n = n / 2   /* Integer division here too. */
    
        end while
    
        if (negate) then
            result_curr = -result_curr
        end if
    
        return result_curr
    end function
    You can precalculate F2k and F2k-1 for some range of k, but the memory use is an issue: F2k has 0.694242×2k-1.160964 binary digits, and requires that many bits of memory. Precalculating larger k means you reserve a lot of memory for data that ends up being used only once per Fibonacci number calculated.

    The slowest operation for large n is multiplication. This means that if each multiplication takes O(m(n)) time, the arbitrary-precision integer implementation that calculates Fn takes O(m(n) log n) time. Simple naive multiplication -- normal long multiplication, done by hand for example -- is O(N2), so a very straightforward implementation in C99 should be able to calculate an arbitrary Fibonacci number in O(n2 log n) time.

    In practice, most current libraries and applications use naive multiplication for small arbitrary-precision integers, Toom-Cook multiplication for intermediate-size arbitrary-precision integers, and Schönhage-Strassen algorithm for largest arbitrary-precision integers: a discrete Fourier transform is taken from the multiplicands' digits ("words"), multiplied together digit-wise ("word"-wise), converted back using the inverse transform, followed by a carrying pass over the digits ("words"). This has only O(N log N log log N) time complexity (where N is the number of digits in each multiplicand, the multiplicands assumed to be of the same size), and is noticeably faster than the O(N2) for naive multiplication for larger N. The cutoff points depends on the implementation, and hardware cache architecture tends to add another wrinkle to serious optimization efforts.

    Calculating and printing a sequence of N Fibonacci numbers starting with
    previous = Fk-1
    current = Fk
    is trivial; in pseudocode,
    Code:
    repeat N times:
        print current
        temporary = previous
        previous = current
        current = current + temporary
    In my opinion, the above pseudocode should be even more obvious than the recursive function.

    My point: the journey to "perfection" mentioned in the thread title is pretty misleading. Each algorithm and each implementation is a balance between multiple factors, many of which are unknown beforehand. I'm hoping the OP and any others reading this thread realize that full understanding of the strengths and limitations of the implementation (and of the algorithm implemented) is much more important, and much more valuable than any warm-and-fuzzy feelings about perfection.

    (In a perfect world, those insights about strengths and limitations would be documented in comments in the implementation, but I am myself still struggling to learn to do that...)

  14. #14
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by MutantJohn View Post
    Yeah, I use to have that problem with my girlfriend when we first started dating. Oh wait... That was a different thing...
    That depends on the girlfriend, and how early she starts trying to improve you.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  15. #15
    Citizen of Awesometown the_jackass's Avatar
    Join Date
    Oct 2014
    Location
    Awesometown
    Posts
    269
    Eh I'd like to add one other thing to what Nominal Animal said. Multiplication by single "digit" (as in a digit with the base you're using...pow(10,9) in his example) is faster and easier in many cases than when both numbers have multiple "digits" (you would need less intermediate arbritary precision numbers for one) if the numbers youre working with fit within sane range of O(n^2) multiplication. (That's upto about 10 to 20 thousand base 10 digits.)
    "Highbrow philosophical truth: Everybody is an ape in monkeytown" --Oscar Wilde

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Lagged Fibonacci generator
    By drew99 in forum General Discussions
    Replies: 3
    Last Post: 10-21-2012, 07:47 AM
  2. Looking for source code of Lagged Fibonacci generator
    By user_name_void in forum C Programming
    Replies: 5
    Last Post: 02-28-2010, 02:29 PM
  3. A journey to nowhere
    By VirtualAce in forum A Brief History of Cprogramming.com
    Replies: 7
    Last Post: 03-31-2007, 11:55 AM
  4. Magneto's journey to becoming a C++ expert.
    By Magneto in forum C++ Programming
    Replies: 21
    Last Post: 07-17-2005, 08:16 PM
  5. Fibonacci How should I make a generator
    By cprog in forum C Programming
    Replies: 12
    Last Post: 10-06-2002, 04:04 PM

Tags for this Thread