Laserlight, Grumpy and fendor refer to Binet's formula, which was actually found by Abraham de Moivre:
Fn = ϕn/√5 - (-ϕ)n/√5
where ϕ is the golden ratio,
ϕ = (1 + √5) / 2
Since |(-ϕ)n/√5| is always less than one half for any nonnegative n, you can use
Fn = ⌊ ϕn/√5 + 0.5 ⌋
where ⌊⌋ denotes rounding towards zero, i.e. floor function.
Using the standard types, both the recursive and the closed form (Binet's formula) are limited to a rather small sequence, as
F46 < 231, F47 < 232, but F48 > 232
F78 < 253, but F79 > 253
F92 < 263, F93 < 264, but F94 > 264
In other words, unsigned 32-bit integers can calculate up to and including F47, and signed 32-bit integers up to and including F46. Unsigned 64-bit integers can calculate up to and including F93, and signed 64-bit integers up to and including F92. You should use C99 uint32_t, int32_t, uint64_t, or int64_t types provided by inttypes.h or stdint.h for these, as the size of int and long and long long is up to the compiler to decide.
Most architectures nowadays implement the C99 double type using IEEE-754 Binary64 format which has 53-bit mantissa (bits of precision), so you cannot expect Binet's formula (or the round-towards-zero version derived from it) to be accurate beyond F78 or so. x86 and x86_64 architectures' 387 FPU also provides an 80-bit floating-point format with 64 bits of precision, which should allow accurate evaluation up to F93 or so. Some architectures may provide even more precise long double. In C99, you can use pow(FLT_RADIX, DBL_MANT_DIG) or powl(FLT_RADIX, LDBL_MANT_DIG) to find out how large integers the double resp. long double type can represent exactly. (On all machines I know of, except for IBM 360, FLT_RADIX is 2, and therefore DBL_MANT_DIG and LDBL_MANT_DIG directly define the number of bits of precision in the mantissa.)
For larger n, you need arbitrary-precision integers. For C, there are lots of libraries that provide all functionality needed -- I recommend GSL (manual) --, but implementing one yourself for nonnegative integers (and only for addition, subtraction, and multiplication operations, and conversions to/from decimal strings) in C99 is straightforward -- particularly using uint32_t as the base "word" type, and uint64_t as a high-precision intermediate type, to make carrying trivial and code portable.
If you write your own arbitrary-precision integer stuff, but wish to avoid having to write a divide-by-ten function for conversion to a string, you can pack nine decimal digits (000000000 to 999999999) in a 30-bit integer: your "words" are then groups of nine decimal digits. Although this typically wastes 2 out of every 32 bits, or 6.25% of memory used (and is slower time-wise too, because more "words" are needed to represent a number), it makes the string conversion code -- as well as any other operation that deals with specific decimal digits -- very fast and simple to implement. For addition, subtraction, and multiplication operations, only the numerical limits/constants are affected.
In practice, the relation
Fn = Fn-1 + Fn-2
is rarely used. Instead, the recurrence relations
Fm+n-1 = Fm Fn + Fm-1 Fn-1
Fm+n = Fm Fn+1 + Fm-1 Fn = Fm (Fn + Fn-1) + Fm-1 Fn
and
F2n-1 = Fn2 + Fn-12
F2n = Fn (Fn + 2 Fn-1)
are used to calculate the desired Fibonacci number; only the initial number for continuous sequences. Using the approach outlined below, this takes O(log n) arithmetic operations -- but note that each arithmetic operation on arbitrary-precision integers is now at least O(N) operations, where N is the number of digits (or "words") in the number.
The key is to realize you can write n as a (binary) sum,
n = b020 + b121 + ... + bk2k
which means you can write
Fn = Fb020 + b121 + ... + bk2k
In other words, you use the binary representation of n and powers-of-two and powers-of-two-less-one Fibonacci numbers (F0, F1, F2, F3, F4, F7, F8, F15, F16, F31, F32, and so on) with the Fm+n and Fm+n-1 formulas to construct the desired Fibonacci number (and the number preceding it for the intermediate terms).
Here is the pseudocode for a generic Fibonacci number function:
Code:
function Fibonacci(n):
if (n == 0) then
return 0 /* fibonacci[0] */
end if
if (n < 0) then
n = -n
negate = (n is even)
else
negate = (false)
end if
if (n == 1) then
return 1 /* fibonacci[1] == fibonacci[-1] */
end if
factor_prev = 0 /* fibonacci[0] */
factor_curr = 1 /* fibonacci[1] */
if (n is odd) then
result_prev = 0 /* fibonacci[0] */
result_curr = 1 /* fibonacci[1] */
else
result_prev = 1 /* fibonacci[-1] */
result_curr = 0 /* fibonacci[0] */
end if
n = n / 2 /* This must be integer division. */
while (n > 0)
/* F(2*n-1) = F(n)*F(n) + F(n-1)*F(n-1) */
helper_prev = factor_curr * factor_curr + factor_prev * factor_prev
/* F(2*n) = F(n) * ( F(n) + F(n-1) + F(n-1) ) */
helper_curr = factor_curr * (factor_curr + factor_prev + factor_prev)
/* Double the factor */
factor_prev = helper_prev
factor_curr = helper_curr
/* If n has LSB set, combine factor and result: F(result) = F(factor+result) */
if (n is odd) then
/* F(m+n-1) = F(m)*F(n) + F(m-1)*F(n-1) */
helper_prev = factor_curr * result_curr + factor_prev * result_prev
/* F(m+n) = F(m)*F(n+1) + F(m-1)*F(n)
= F(m)*(F(n-1) + F(n)) + F(m-1)*F(n) */
helper_curr = factor_curr * (result_curr + result_prev) + factor_prev * result_prev
/* Combine */
result_prev = helper_prev
result_curr = helper_curr
end if
n = n / 2 /* Integer division here too. */
end while
if (negate) then
result_curr = -result_curr
end if
return result_curr
end function
You can precalculate F2k and F2k-1 for some range of k, but the memory use is an issue: F2k has 0.694242×2k-1.160964 binary digits, and requires that many bits of memory. Precalculating larger k means you reserve a lot of memory for data that ends up being used only once per Fibonacci number calculated.
The slowest operation for large n is multiplication. This means that if each multiplication takes O(m(n)) time, the arbitrary-precision integer implementation that calculates Fn takes O(m(n) log n) time. Simple naive multiplication -- normal long multiplication, done by hand for example -- is O(N2), so a very straightforward implementation in C99 should be able to calculate an arbitrary Fibonacci number in O(n2 log n) time.
In practice, most current libraries and applications use naive multiplication for small arbitrary-precision integers, Toom-Cook multiplication for intermediate-size arbitrary-precision integers, and Schönhage-Strassen algorithm for largest arbitrary-precision integers: a discrete Fourier transform is taken from the multiplicands' digits ("words"), multiplied together digit-wise ("word"-wise), converted back using the inverse transform, followed by a carrying pass over the digits ("words"). This has only O(N log N log log N) time complexity (where N is the number of digits in each multiplicand, the multiplicands assumed to be of the same size), and is noticeably faster than the O(N2) for naive multiplication for larger N. The cutoff points depends on the implementation, and hardware cache architecture tends to add another wrinkle to serious optimization efforts.
Calculating and printing a sequence of N Fibonacci numbers starting with
previous = Fk-1
current = Fk
is trivial; in pseudocode,
Code:
repeat N times:
print current
temporary = previous
previous = current
current = current + temporary
In my opinion, the above pseudocode should be even more obvious than the recursive function.
My point: the journey to "perfection" mentioned in the thread title is pretty misleading. Each algorithm and each implementation is a balance between multiple factors, many of which are unknown beforehand. I'm hoping the OP and any others reading this thread realize that full understanding of the strengths and limitations of the implementation (and of the algorithm implemented) is much more important, and much more valuable than any warm-and-fuzzy feelings about perfection.
(In a perfect world, those insights about strengths and limitations would be documented in comments in the implementation, but I am myself still struggling to learn to do that...)