Maybe I invented a new type of Balanced Binary Search Tree?

Printable View

• 04-27-2011
RichSelian
Maybe I invented a new type of Balanced Binary Search Tree?
I wrote this code just for fun, but then I found that it's performance is excellent in both speed (faster then std::map) and memory space, and it's very easy to implement. But I lack some mathemetic knowledge and cannot give any prove or theorem. Can someone have a look on my code?
Code:

```// ############################################################################# // # FileName: topdown.c // # Author:  Zhang Li, CS0901, HUST // ############################################################################# #include <stdio.h> #include <stdlib.h> #include <time.h> // * my topdown balanced tree implementation // ***************************************************************************** typedef struct topdown_tree_node_struct {     size_t m_size;     size_t m_data;     struct topdown_tree_node_struct* m_child[2]; } topdown_tree_node_type; static topdown_tree_node_type nullnode = { 0, 0, { &nullnode, &nullnode }}; static topdown_tree_node_type* destroy(topdown_tree_node_type* root) {     if(root->m_size > 0)     {         destroy(root->m_child[0]);         destroy(root->m_child[1]);         free(root);     }     return &nullnode; } static topdown_tree_node_type* topdown_tree_maintain(         topdown_tree_node_type* root) {     topdown_tree_node_type* r;     topdown_tree_node_type* c;     topdown_tree_node_type* m;     size_t index = (root->m_child[1]->m_size >= root->m_child[0]->m_size);     // balance requirement:     //  child_A.size <= child_B.size * 2 + 1     // once the condition is broke, do a single rotation to repair it     while(root->m_child[index]->m_size > root->m_child[!index]->m_size * 2 + 1)     {         // single rotate         r = root;         c = root->m_child[index];         m = root->m_child[index]->m_child[!index];         c->m_child[!index] = r;         r->m_child[index] = m;         r->m_size = 1 + r->m_child[0]->m_size + r->m_child[1]->m_size;         c->m_size = 1 + c->m_child[0]->m_size + c->m_child[1]->m_size;         root = c;     }     return root; } static topdown_tree_node_type* topdown_tree_insert(         topdown_tree_node_type* root, size_t data) {     topdown_tree_node_type* new_node = malloc(sizeof(topdown_tree_node_type));     topdown_tree_node_type* current_node;     size_t child_index = 0;     new_node->m_data = data;     new_node->m_size = 1;     new_node->m_child[0] = &nullnode;     new_node->m_child[1] = &nullnode;     if(root != NULL)     {         // maintain root (to avoid root changing in the insertion part)         current_node = root = topdown_tree_maintain(root);         while(current_node != new_node)         {             // normal binary search tree's insertion operation             child_index = (data > current_node->m_data);             current_node->m_size += 1;             if(current_node->m_child[child_index]->m_size == 0)             {                 current_node->m_child[child_index] = new_node;               // break;             }             // maintain             current_node->m_child[child_index] = topdown_tree_maintain(                     current_node->m_child[child_index]);             current_node = current_node->m_child[child_index];         }         return root;     }     return new_node; } static topdown_tree_node_type* topdown_tree_search(         topdown_tree_node_type* root, size_t data) {     topdown_tree_node_type* current_node = root;     size_t child_index = 2;     while(current_node->m_size > 0 && current_node->m_data != data)     {         // maintain         child_index = (data > current_node->m_data);         current_node->m_child[child_index] = topdown_tree_maintain(                 current_node->m_child[child_index]);         current_node = current_node->m_child[child_index];     }     if(current_node->m_size > 0)         return current_node;     return NULL; } static topdown_tree_node_type* topdown_tree_remove(         topdown_tree_node_type* root, topdown_tree_node_type* node) {     topdown_tree_node_type* current_node = root;     topdown_tree_node_type* parent_node = &nullnode;     size_t index;     // the tree have only one node, directly delete it     if(root->m_size == 1)     {         free(root);         return &nullnode;     }     // decrease size from root to node     while(current_node != node)     {         current_node->m_size -= 1;         parent_node = current_node;         current_node = parent_node->m_child[node->m_data > parent_node->m_data];     }     // normal deletion operation     while(current_node->m_size > 1)     {         current_node->m_size -= 1;         parent_node = current_node;         for(index = 0; index < 2; index++)         {             if(current_node->m_child[index]->m_size > 0)             {                 current_node = current_node->m_child[index];                 while(current_node->m_child[!index]->m_size > 0)                 {                     parent_node = current_node;                     current_node->m_size -= 1;                     current_node = current_node->m_child[!index];                 }             }         }         node->m_data = current_node->m_data;         node = current_node;     }     // delete node     free(current_node);     parent_node->m_child[(parent_node->m_child[1] == current_node)] = &nullnode;     return root; } // ############################################################################# // # EOF```
• 04-27-2011
RichSelian
This is a simple test code:
Code:

```// ############################################################################# // # FileName: topdown.c // # Author:  Zhang Li, CS0901, HUST // ############################################################################# #include "topdown.h" int main() {     topdown_tree_node_type* root = NULL;     topdown_tree_node_type* node;     size_t i;     size_t test_size = 200000;     // insert random data     for(i = 0; i < test_size; i++)         root = topdown_tree_insert(root, rand() % test_size);     // insert nearly ordered data     for(i = 0; i < test_size; i++)         root = topdown_tree_insert(root, i + rand() % (test_size - i));     // search/delete     for(i = 0; i < test_size; i++)     {         node = topdown_tree_search(root, i);         if(node)         {             root = topdown_tree_remove(root, node);         }     }     // destroy     destroy(root);     return 0; } // ############################################################################# // # EOF```
• 04-27-2011
iMalc
It looks to me to be just a different way of doing an AVL tree, as edivenced by the balance requirement:
Quote:

child_A.size <= child_B.size * 2 + 1
The number of nodes in a subtree is over twice the number of nodes in the other subtree at more or less the same time that the heights of those subtrees differ by more than one.
Therefore the balancing condition is just changed from directly looking for a height imbalance to slightly indirectly looking for a height imbalance. This does mean that instead of only requiring two bits for the height, you potentially require a lot more. However that has negligible practical difference since you usually use more than 2 bits to store the height anyway.
• 04-27-2011
RichSelian
Quote:

Originally Posted by iMalc
It looks to me to be just a different way of doing an AVL tree, as edivenced by the balance requirement:The number of nodes in a subtree is over twice the number of nodes in the other subtree at more or less the same time that the heights of those subtrees differ by more than one.
Therefore the balancing condition is just changed from directly looking for a height imbalance to slightly indirectly looking for a height imbalance. This does mean that instead of only requiring two bits for the height, you potentially require a lot more. However that has negligible practical difference since you usually use more than 2 bits to store the height anyway.

But it's quite faster than a normal avl tree. My balancing operation is top-down and it doesn't require any recursion or parent pointers, so it can save a lot of time and memory space.
The condition 'child_A.size <= child_B.size * 2 + 1' is only satisfied in the insert/search path, That means the height of tree can be O(n) (like splay tree). This implementation provides amorted O(logn) time.

BTW, the size field requires more memory space, but it is very useful in many operations like ranking and selecting, while the avl's height can do little.
• 04-28-2011
iMalc
Quote:

Originally Posted by RichSelian
But it's quite faster than a normal avl tree. My balancing operation is top-down and it doesn't require any recursion or parent pointers, so it can save a lot of time and memory space..

AVL tree implementations do not require a parent pointer. Some implementations do, some don't. My own one does not.
Quote:

The condition 'child_A.size <= child_B.size * 2 + 1' is only satisfied in the insert/search path, That means the height of tree can be O(n) (like splay tree). This implementation provides amorted O(logn) time..
I think you mean "amortised", and you did not state what operations you believe are amortised O(log n).
What could account for this being faster is that it isn't generic. It isn't designed to work with any data type, it doesn't call constructors (heck it isn't even in C++) so you cant really compare them. It doesn't have next / prev operations implemented. The performance test is not very comprehensive, and you haven't listed the corresponding C++ equivalent tests you wrote, so we can't tell what mistakes you may have made in that, and cannot verify your results.
Quote:

BTW, the size field requires more memory space, but it is very useful in many operations like ranking and selecting, while the avl's height can do little.
I don't understand what useful container operation you could be describing with the terms "ranking" and "selecting", perhaps you could elaborate there a little.
The balance factor has the advantage that it is directly related to the search performance and maintaining O(log n) search times. Note that an AVL tree may be implemented by storing balance factor on each node, or by storing the height on each node. The height tends to be used for ease of explanation, and the balance factor tends to be used for the most efficient (space/speed) implementation. Either will work fine.
Storing the node count however, is significantly different. In your case, due to deletions that may have occurred already, a node may have 30 left children and 30 right children, but those on the left may match a perfectly balanced tree, whereas those on the right may have degenerated to something pretty close to just a linked-list. That doesn't normally make for very consistent search performance, but you seem to be doing some rebalancing during the search to counter this. If this were C++ that would involve the need to make the child pointers mutable to maintain the search operation's constness. IMHO it is nicer if the search operation doesn't modify the tree at all. Obviously this isn't possible for a splay tree, and it's probably fine to bend the rules in such cases.

Don't take this all too harshly. I'm hoping that you were expecting close scrutiny.
• 04-28-2011
RichSelian
Quote:

AVL tree implementations do not require a parent pointer. Some implementations do, some don't. My own one does not.
Yes, but you use recursion, that will slow down the speed.

My tree will become unbalanced after times of deletion and one search operation will cost O(n), but searching a deep node will restore most nodes' balance, so it gives O(logn) amortised time.
I can also change the maintain() method and keep balance in deletion, but I found that slower and the code become more complex.
If the O(logn) amortised time can be proved, I think this tree structure can be useful for it simplicity and performance.

Quote:

I don't understand what useful container operation you could be describing with the terms "ranking" and "selecting", perhaps you could elaborate there a little.
Sorry for my poor English. I mean selecting the Nth node from the tree, and given an index, return the corresponding node.
• 04-28-2011
iMalc
Okay I understand, yes those could certainly be useful operations.
Interesting.