1 Attachment(s)
Micro-optimizing Fibonacci n-th term generator: a journey to perfection
Hi,
Going through MIT's Introduction to Algorithms I'm developing a Fibonacci n-th term generator and want to make it as fast and as compact memory-vise as possible. The story:
1. Developed a naive, dumb version that does recursive calls:
Code:
return (NaiveFibonacciNthTermGenerator (the_xth_fibonacci_num - 1) + \
NaiveFibonacciNthTermGenerator (the_xth_fibonacci_num - 2));
2. I remembered the way recursion calls work and how a a binary-tree like structure having recursive calls work
Attachment 13839
so I switched the terms in the recursive call (my intuition was that it would help for a memoized solution as the base case is reached faster in the switched scenario, but it turned out to help even for the naive one). I have tested many times with different combinations and I gain up to 1 second for n > 45. The number of times the function is executed is the same in both cases (checked by adding a counter):
Code:
return (NaiveFibonacciNthTermGenerator_switchedTerms (the_xth_fibonacci_num - 2) + \
NaiveFibonacciNthTermGenerator_switchedTerms (the_xth_fibonacci_num - 1));
3. Then I started working on the memoization but stumbled upon the array that will contain the memoized solutions to the subproblems. Trying to optimize it for space I declared it like this:
Code:
unsigned long long int cache[the_xth_fibonacci_num - 3];
the "-3" supposed to save me some bytes (it's an array of long longs after all, so 24 bytes (3*8)), this is allowed by the fact that the base cases should not be in the cache[]. But now I'm not sure how should I handle it as this is a variable lenght array VLA.
Questions:
1. Why does the naive case with switched terms work faster than the one with not-swotched ones? I have swithed the function calls places (thinking that some caching might be involved - not sure how) - it is still consistently faster when n > 15. The same amount of function calls is made (I checked it) - why is it faster?
2. What is the correct solution for the cache[] problem? I can't place it as global constant-int-sized array, because it gets the value in the main() as the user enters the "the_xth_fibonacci_num" as the value he wants. Should I malloc it? Is this the best solution for minimizing time and space?
3. Any other performance tips - for saving time and space? Any tips for saving time sacrificing space and vice versa? I would appreaceate it :) - I have placed the base case for n=2 to appear on top of n=1 as it will be reached more times and so should be executed first (so you see what level of obsession we are talking about here :tongue: ).
Anything like "check the aseambly generated for x", "change the terms", "reorder", "use different datatypes" - anything would help me a lot.
Thanks in advance :)