Maybe I invented a new type of Balanced Binary Search Tree?

This is a discussion on Maybe I invented a new type of Balanced Binary Search Tree? within the C Programming forums, part of the General Programming Boards category; I wrote this code just for fun, but then I found that it's performance is excellent in both speed (faster ...

  1. #1
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74

    Cool Maybe I invented a new type of Balanced Binary Search Tree?

    I wrote this code just for fun, but then I found that it's performance is excellent in both speed (faster then std::map) and memory space, and it's very easy to implement. But I lack some mathemetic knowledge and cannot give any prove or theorem. Can someone have a look on my code?
    Code:
    // #############################################################################
    // # FileName: topdown.c
    // # Author:   Zhang Li, CS0901, HUST
    // #############################################################################
    #include <stdio.h>
    #include <stdlib.h>
    #include <time.h>
    
    // * my topdown balanced tree implementation
    // *****************************************************************************
    typedef struct topdown_tree_node_struct
    {
        size_t m_size;
        size_t m_data;
        struct topdown_tree_node_struct* m_child[2];
    } topdown_tree_node_type;
    
    static topdown_tree_node_type nullnode = { 0, 0, { &nullnode, &nullnode }};
    
    static topdown_tree_node_type* destroy(topdown_tree_node_type* root)
    {
        if(root->m_size > 0)
        {
            destroy(root->m_child[0]);
            destroy(root->m_child[1]);
            free(root);
        }
        return &nullnode;
    }
    
    static topdown_tree_node_type* topdown_tree_maintain(
            topdown_tree_node_type* root)
    {
        topdown_tree_node_type* r;
        topdown_tree_node_type* c;
        topdown_tree_node_type* m;
        size_t index = (root->m_child[1]->m_size >= root->m_child[0]->m_size);
    
        // balance requirement:
        //  child_A.size <= child_B.size * 2 + 1
        // once the condition is broke, do a single rotation to repair it
        while(root->m_child[index]->m_size > root->m_child[!index]->m_size * 2 + 1)
        {
            // single rotate
            r = root;
            c = root->m_child[index];
            m = root->m_child[index]->m_child[!index];
    
            c->m_child[!index] = r;
            r->m_child[index] = m;
            r->m_size = 1 + r->m_child[0]->m_size + r->m_child[1]->m_size;
            c->m_size = 1 + c->m_child[0]->m_size + c->m_child[1]->m_size;
            root = c;
        }
        return root;
    }
    
    static topdown_tree_node_type* topdown_tree_insert(
            topdown_tree_node_type* root, size_t data)
    {
        topdown_tree_node_type* new_node = malloc(sizeof(topdown_tree_node_type));
        topdown_tree_node_type* current_node;
        size_t child_index = 0;
    
        new_node->m_data = data;
        new_node->m_size = 1;
        new_node->m_child[0] = &nullnode;
        new_node->m_child[1] = &nullnode;
    
        if(root != NULL)
        {
            // maintain root (to avoid root changing in the insertion part)
            current_node = root = topdown_tree_maintain(root);
            while(current_node != new_node)
            {
                // normal binary search tree's insertion operation
                child_index = (data > current_node->m_data);
                current_node->m_size += 1;
                if(current_node->m_child[child_index]->m_size == 0)
                {
                    current_node->m_child[child_index] = new_node;
                   // break;
                }
                // maintain
                current_node->m_child[child_index] = topdown_tree_maintain(
                        current_node->m_child[child_index]);
                current_node = current_node->m_child[child_index];
            }
            return root;
        }
        return new_node;
    }
    
    static topdown_tree_node_type* topdown_tree_search(
            topdown_tree_node_type* root, size_t data)
    {
        topdown_tree_node_type* current_node = root;
        size_t child_index = 2;
    
        while(current_node->m_size > 0 && current_node->m_data != data)
        {
            // maintain
            child_index = (data > current_node->m_data);
            current_node->m_child[child_index] = topdown_tree_maintain(
                    current_node->m_child[child_index]);
    
            current_node = current_node->m_child[child_index];
        }
        if(current_node->m_size > 0)
            return current_node;
        return NULL;
    }
    
    static topdown_tree_node_type* topdown_tree_remove(
            topdown_tree_node_type* root, topdown_tree_node_type* node)
    {
        topdown_tree_node_type* current_node = root;
        topdown_tree_node_type* parent_node = &nullnode;
        size_t index;
    
        // the tree have only one node, directly delete it
        if(root->m_size == 1)
        {
            free(root);
            return &nullnode;
        }
    
        // decrease size from root to node
        while(current_node != node)
        {
            current_node->m_size -= 1;
            parent_node = current_node;
            current_node = parent_node->m_child[node->m_data > parent_node->m_data];
        }
    
        // normal deletion operation
        while(current_node->m_size > 1)
        {
            current_node->m_size -= 1;
            parent_node = current_node;
            for(index = 0; index < 2; index++)
            {
                if(current_node->m_child[index]->m_size > 0)
                {
                    current_node = current_node->m_child[index];
                    while(current_node->m_child[!index]->m_size > 0)
                    {
                        parent_node = current_node;
                        current_node->m_size -= 1;
                        current_node = current_node->m_child[!index];
                    }
                }
            }
            node->m_data = current_node->m_data;
            node = current_node;
        }
    
        // delete node
        free(current_node);
        parent_node->m_child[(parent_node->m_child[1] == current_node)] = &nullnode;
        return root;
    }
    // #############################################################################
    // # EOF

  2. #2
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74
    This is a simple test code:
    Code:
    // #############################################################################
    // # FileName: topdown.c
    // # Author:   Zhang Li, CS0901, HUST
    // #############################################################################
    #include "topdown.h"
    
    int main()
    {
        topdown_tree_node_type* root = NULL;
        topdown_tree_node_type* node;
        size_t i;
        size_t test_size = 200000;
    
        // insert random data
        for(i = 0; i < test_size; i++)
            root = topdown_tree_insert(root, rand() % test_size);
    
        // insert nearly ordered data
        for(i = 0; i < test_size; i++)
            root = topdown_tree_insert(root, i + rand() % (test_size - i));
    
        // search/delete
        for(i = 0; i < test_size; i++)
        {
            node = topdown_tree_search(root, i);
            if(node)
            {
                root = topdown_tree_remove(root, node);
            }
        }
    
        // destroy
        destroy(root);
        return 0;
    }
    // #############################################################################
    // # EOF

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,307
    It looks to me to be just a different way of doing an AVL tree, as edivenced by the balance requirement:
    child_A.size <= child_B.size * 2 + 1
    The number of nodes in a subtree is over twice the number of nodes in the other subtree at more or less the same time that the heights of those subtrees differ by more than one.
    Therefore the balancing condition is just changed from directly looking for a height imbalance to slightly indirectly looking for a height imbalance. This does mean that instead of only requiring two bits for the height, you potentially require a lot more. However that has negligible practical difference since you usually use more than 2 bits to store the height anyway.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74
    Quote Originally Posted by iMalc View Post
    It looks to me to be just a different way of doing an AVL tree, as edivenced by the balance requirement:The number of nodes in a subtree is over twice the number of nodes in the other subtree at more or less the same time that the heights of those subtrees differ by more than one.
    Therefore the balancing condition is just changed from directly looking for a height imbalance to slightly indirectly looking for a height imbalance. This does mean that instead of only requiring two bits for the height, you potentially require a lot more. However that has negligible practical difference since you usually use more than 2 bits to store the height anyway.
    But it's quite faster than a normal avl tree. My balancing operation is top-down and it doesn't require any recursion or parent pointers, so it can save a lot of time and memory space.
    The condition 'child_A.size <= child_B.size * 2 + 1' is only satisfied in the insert/search path, That means the height of tree can be O(n) (like splay tree). This implementation provides amorted O(logn) time.

    BTW, the size field requires more memory space, but it is very useful in many operations like ranking and selecting, while the avl's height can do little.

  5. #5
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,307
    Quote Originally Posted by RichSelian View Post
    But it's quite faster than a normal avl tree. My balancing operation is top-down and it doesn't require any recursion or parent pointers, so it can save a lot of time and memory space..
    AVL tree implementations do not require a parent pointer. Some implementations do, some don't. My own one does not.
    The condition 'child_A.size <= child_B.size * 2 + 1' is only satisfied in the insert/search path, That means the height of tree can be O(n) (like splay tree). This implementation provides amorted O(logn) time..
    I think you mean "amortised", and you did not state what operations you believe are amortised O(log n).
    What could account for this being faster is that it isn't generic. It isn't designed to work with any data type, it doesn't call constructors (heck it isn't even in C++) so you cant really compare them. It doesn't have next / prev operations implemented. The performance test is not very comprehensive, and you haven't listed the corresponding C++ equivalent tests you wrote, so we can't tell what mistakes you may have made in that, and cannot verify your results.
    BTW, the size field requires more memory space, but it is very useful in many operations like ranking and selecting, while the avl's height can do little.
    I don't understand what useful container operation you could be describing with the terms "ranking" and "selecting", perhaps you could elaborate there a little.
    The balance factor has the advantage that it is directly related to the search performance and maintaining O(log n) search times. Note that an AVL tree may be implemented by storing balance factor on each node, or by storing the height on each node. The height tends to be used for ease of explanation, and the balance factor tends to be used for the most efficient (space/speed) implementation. Either will work fine.
    Storing the node count however, is significantly different. In your case, due to deletions that may have occurred already, a node may have 30 left children and 30 right children, but those on the left may match a perfectly balanced tree, whereas those on the right may have degenerated to something pretty close to just a linked-list. That doesn't normally make for very consistent search performance, but you seem to be doing some rebalancing during the search to counter this. If this were C++ that would involve the need to make the child pointers mutable to maintain the search operation's constness. IMHO it is nicer if the search operation doesn't modify the tree at all. Obviously this isn't possible for a splay tree, and it's probably fine to bend the rules in such cases.

    Don't take this all too harshly. I'm hoping that you were expecting close scrutiny.
    Last edited by iMalc; 04-28-2011 at 12:51 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  6. #6
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74
    AVL tree implementations do not require a parent pointer. Some implementations do, some don't. My own one does not.
    Yes, but you use recursion, that will slow down the speed.

    My tree will become unbalanced after times of deletion and one search operation will cost O(n), but searching a deep node will restore most nodes' balance, so it gives O(logn) amortised time.
    I can also change the maintain() method and keep balance in deletion, but I found that slower and the code become more complex.
    If the O(logn) amortised time can be proved, I think this tree structure can be useful for it simplicity and performance.

    I don't understand what useful container operation you could be describing with the terms "ranking" and "selecting", perhaps you could elaborate there a little.
    Sorry for my poor English. I mean selecting the Nth node from the tree, and given an index, return the corresponding node.

  7. #7
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,307
    Okay I understand, yes those could certainly be useful operations.
    Interesting.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 0
    Last Post: 11-04-2006, 10:07 AM
  2. A Binary Search Tree of... Binary Search Trees...
    By SlyMaelstrom in forum C++ Programming
    Replies: 5
    Last Post: 12-10-2005, 01:12 PM
  3. BST (Binary search tree)
    By praethorian in forum C++ Programming
    Replies: 3
    Last Post: 11-13-2005, 08:11 AM
  4. Search Engine - Binary Search Tree
    By Gecko2099 in forum C Programming
    Replies: 9
    Last Post: 04-17-2005, 02:56 PM
  5. How do you create a balanced binary tree?
    By Mr_Jack in forum C++ Programming
    Replies: 3
    Last Post: 01-13-2004, 02:02 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21