The logic is actually quite simple.

**Each binary tree structure is defined by the sizes of subtrees at each node.**

The values at the nodes are irrelevant.

Consider a binary tree with six values: ◯ ◯ ◯ ◯ ◯ ◯. The overall structure of the tree depends on how many of them are on the left side of the root, and how many on the right side.

Say you pick three to the left subtree. Since one is at root, then you'll have two to put into the right subtree. Using numbers for the number of nodes in subtrees, you get

That defined just the overall structure. To get a definite structure, you need to define the shape of the rest of the tree. So, you recurse into each subtree, choosing how many of the leftover nodes go to the left subtree (and therefore how many to the right) at each node.

If you pick "one to left" for the left subtree, and "zero to left" for the right subtree, you end up with
Code:

◯
╱ ╲
◯ ◯
╱ ╲ ╲
◯ ◯ ◯

This is one of the possible structures for a binary tree containing *N*=6 values.

Let's concentrate on what happens at each node.

Say you have *n* nodes to choose from. One of them must be the current node, so you have *n*-1 to put into the left and right subtrees. Because you can put anywhere from 0 to *n*-1 into the left subtree, you have *n* possible choices.

To count the number of structures possible for a binary tree of this size, you loop over i = 0 .. n-1, and find out the number of structures for the left subtree, L_{i}=countTrees(i) and the number of structures for the right subtree, R_{i}=countTrees(n-i-1) . Since each left side structure can be paired with each right side structure, the number of structures for each i is L_{i}·R_{i} .

(If you loop over i = 1 .. n, then L_{i}=countTrees(i-1) and R_{i}=countTrees(n-i).)

Let's look at actual C code at hand.

Code:

int countTrees(int numKeys)
{
if (numKeys <= 1) {
return 1;
} else ..{

An empty subtree (zero nodes) has only one possible structure. A solitary node has likewise exactly one possible structure. Since a subtree with zero or one nodes has only one possible structure, the function will return one for those cases.

Code:

int sum = 0;
int left, right, root;

Above, sum is the total number of possible structures, left is the number of possible structures for each left subtree, right is the number of possible structures for each right subtree, and root is the loop index variable.

Code:

for (root=1; root <= numKeys; root++) {
left = countTrees(root - 1);
right = countTrees(numKeys - root);

The above loop is written so that root is one plus the number of nodes in the left subtree. The point is that it will do numKeys iterations, just like I said earlier.

Remember, the number of combinations for a tree with a specific number of nodes in the left and right subtrees, is the number of combinations in the left subtree multiplied with the number of combinations in the right subtree, because each left subtree structure can be paired with each right subtree structure.

After the loop, the function simply returns the sum of the number of possible structures.

If you were to write the above function mathematically, you'd get something like

Code:

N_{0} = 1
N_{1} = 1
N_{n} = sum( N_{i-1} N_{n-i}, i = 1..n)

Adjust the indexing to

Code:

⇔ N_{n+1} = sum( N_{i-1} N_{n+1-i}, i = 1..n+1)
⇔ N_{n+1} = sum( N_{i} N_{n-i}, i = 0..n)

and you get the recurrence relation for Catalan numbers:

Code:

C_{n+1} = sum( C_{i} C_{n-i}, i = 0..n)

This is why the countTrees(numKeys) function actually just computes the numKeys'th Catalan number.

Because Catalan numbers also satisfy

Code:

C_{n+1} = C_{n} (4n + 2)/(n + 2)

you could rewrite the function as

Code:

int countTrees(const int numKeys)
{
int total = 1;
int n = 1;
/* Return the numKeys'th Catalan number. */
while (n < numKeys) {
total = (4*n + 2) * total / (n + 2);
n++;
}
return total;
}

This is an excellent example of proper optimization: not at the code level, but at the algorithmic level. This version is easy to implement using just about any bignum library, giving you the result as a very large integer, and works even when n is large.

As it is, this only works for numKeys <= 19 (assuming 32-bit ints); up to numKeys <= 33 if you use uint64_t instead. (The first 30 are listed here, so you could just put them into an array if you only need it for small numKeys.)

Of course, the final value can be calculated directly using binomial coefficients,

Code:

C_{n} = ( (2n)! ) / ( (n+1)! n! )

i.e.

Code:

double catalan(const double n)
{
return tgamma(2.0*n + 1.0) / tgamma(n + 2.0) / tgamma(n + 1.0);
}

but the dividend and divisor are very large numbers, and even double-precision floating-point numbers overflowing at n=86, and only n=0..26 are correct when rounded to the nearest integer, limited precision affecting the larger results.

Fun stuff.