Request for comments

This is a discussion on Request for comments within the A Brief History of Cprogramming.com forums, part of the Community Boards category; Comments, suggestions, criticism, whatever. You can tell I was getting tired of writing just after I started the deletion section, ...

  1. #1
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796

    Request for comments

    Comments, suggestions, criticism, whatever. You can tell I was getting tired of writing just after I started the deletion section, but it seems to have turned out well.

    Binary Search Trees

    Arrays are useful in that they allow quick access to any item in the data structure, but they fall flat for insertion and deletion of an item anywhere except at the tail end. Linked lists solve this problem by removing the restriction that every item be contiguous in memory, thus introducing the concept if a linked data structure. With a linked list you can insert and delete items anywhere in the list with relative ease, but then the problem of efficient searching arises. With arrays you could at least sort the data and use a binary search to find an item in logarithmic time, but the binary search algorithm requires random access to the data structure. A linked list does not support random access, therefore binary search is not practical. Instead of being restricted to linear search times and severely limiting the usefulness of linked data structures, a linked data structure corresponding to the binary search can be used. These data structures are called, not surprisingly, binary search trees.

    The idea behind a binary search tree is that the median value is the head of the data structure, the item you begin searches with. This item then links to two others, a "left" link, and a "right" link, both of which follow the same structure. The resulting representation looks like this:
    Code:
    Diagram 1:
    
                    5
             3             7
         2       4     6       8
    The item at the head of this tree is 5, which links to 3 and 7, both of which link to two others, 2, 4, and 6, 8. One can easily see the concept behind a binary search tree, and now we can look at some definitions:

    Definition 1: A binary tree is a node with links to two subtrees, called the "left" subtree and "right" subtree, respectively.

    Definition 2: A binary search tree is a binary tree whose left subtree contains values less than itself, and whose right subtree contains values greater than itself.

    A tree node is similar to a double linked list node in that it contains at least a data item, and two links:
    Code:
    Source 1:
    
    struct dlnode {
      struct dlnode *prev;
      struct dlnode *next;
      int data;
    };
    
    struct tree_node {
      struct tree_node *left;
      struct tree_node *right;
      int data;
    };
    As you can see, the two declarations are exactly the same except for different names. The difference between a double linked list and a binary search tree is how the links are interpreted. In all of our code, we will be starting from the following binary search tree definition:
    Code:
    Source 2:
    
    struct pretree_node {
      struct pretree_node *link[2];
      int data;
    };
    
    struct pretree {
      struct pretree_node *header;
      size_t size;
    };
    Notice that instead of two unique links I have chosen to use an array of two links. This is a useful technique as we can now forgo several if..else tests and simply use the boolean value from a comparison to choose whether to chase the left or right link. pretree_node's link[0] is equivalent to tree_node's left pointer, and pretree_node's link[1] is equivalent to tree_node's right pointer. Also notice that I have used a second structure for a single tree. This simplifies some of the algorithms we will be looking at, and also allows us to keep track of useful information that would be too expensive or illogical to keep in each individual node.

    Searching

    As the name implies, binary search trees are designed to accelerate the searching process, so it stands to reason that we should examine the searching algorithm first. Searching a binary search tree is straightforward: Examine the value of a node, if the value being searched for is less then move to the left link, if the value being searched for is greater then move to the right link. If the value is equal to the value being searched for, the search has succeeded. Some readers will have noticed that we are missing a case; what if the value being searched for is not in the tree? The answer depends on how the tree is structured, but for the most part, empty links will be marked by null pointers. If the next link to follow is a null pointer, it is safe to assume that the value does not exist.

    Using the above description, an algorithm is immediate:
    Code:
    Source 3:
    
    1.  int pretree_find (
    2.    struct pretree *tree,
    3.    int search_key )
    4.  {
    5.    struct pretree_node *walk;
    6.    int dir;
    7.
    8.    walk = tree->header;
    9.    for ( ; ; ) {
    10.     /* Fail if a dead link is found */
    11.     if ( walk == NULL )
    12.       return 0;
    13.     /* Succeed if the item is found */
    14.     if ( walk->data == search_key )
    15.       return 1;
    16.     /* Pick a direction and go there */
    17.     dir = walk->data < search_key;
    18.     walk = walk->link[dir];
    19.   }
    20. }
    Some points must be made about this code. First we must guarantee that what we expect to happen actually does happen. We begin by setting out walk pointer to reference the header of the tree. Then we enter an infinite loop because we have no idea how tall the tree is. On the first iteration, an empty tree is tested for at line 11, so we are safe for that boundary condition. We test the header for a successful search too on line 14, so another boundary condition is met. Notice that a null pointer is tested for first because dereferencing a null pointer is a grievous error in C/C++. Provided we walk in the correct direction, we can safely assume that these two tests will handle all possible cases for every node. On line 17 we initialize a boolean value with a simple test. This takes a little getting used to because we want to move to the left link if the test fails, and to the right link if it succeeds, so you will find that I use the opposite test that you would expect. In this case, if walk->data is less than the search key, we move to the right because dir is given the value of 1 for true, otherwise dir is set to 0 for false and we move to the left. I highly recommend you study this algorithm and make sure you understand it completely before moving on, searching is a fundamental part of most binary search tree algorithms.

    Because the definition of a binary search tree is itself recursive (ie. Each link is its own binary search tree), we can use a recursive approach to searching:
    Code:
    Source 4:
    
    1.  static int find_subtree (
    2.    struct pretree_node *header,
    3.    int search_key )
    4.  {
    5.    if ( header == NULL )
    6.      return 0;
    7.    else if ( header->data != search_key ) {
    8.      int dir = header->data < search_key;
    9.      return find_subtree ( header->link[dir], search_key );
    10.   }
    11.
    12.   return 1;
    13. }
    14.
    15. int pretree_find (
    16.   struct pretree *tree,
    17.   int search_key )
    18. {
    19.   return find_subtree ( tree->header, search_key );
    20. }
    This algorithm is less obvious than the one we looked at previously. The first thing you will notice is that I have chosen to use two functions. The first is the client interface, it simply calls a helper function to recursively search the header of tree. This function is used so that a client can simply pass a pretree pointer instead of the header itself. Such a feature helps clients with only a little extra work for the author of the code. While we will examine recursive solutions, iterative solutions are to be preferred for the usual reasons. Our recursive solutions will typically follow a similar pattern of a client interface that calls a recursive helper.

    The recursive algorithm is simple, if not immediately transparent to readers. If the node is a null pointer, we return a failure code. Otherwise, if the value is not equal to the value being searched for, we find a link to follow as with the iterative algorithm, and then return the result of a recursive call to the function. Note that if the node's value is the same as the search key, the if..else construct is terminated and a success code is returned. A recursive call is only made if the node is not null and the value is not found. Be sure to study this algorithm as it is more difficult to follow than the previous one. See Exercise 1 before using these two functions.

    Insertion

    A binary search tree is useless until you have a method for inserting items into it, but before we get into that a little bit of terminology is in order.

    Term 1: The parent of a node is the node which links to it. In the following example, C is the parent of B because B is C's left link (Note: When possible I will use / and \ between two nodes to emphasize the relationship):
    Code:
    Diagram 2:
    
       C
      /
     B
    Term 2: The child of a node is a node that it links to. In the above example, B is the child of C because C links to B.

    Term 3: The root of a tree is a node that has no parent.

    Term 4: A leaf is a node that has no children.

    Insertion into a binary search tree occurs at a leaf node. This suggests that we begin by searching for a null link, then we insert the new item at that location. What follows is an example construction of a binary search tree using the values 4, 7, 1, 2, 5, 6, 3 to show how this works:
    Code:
    Diagram 3:
    
    4:
         4
    
    7:
         4
          \
           7
    
    1:
          4
         / \
        1   7
    
    2:     4
         /   \
       1       7
        \
         2
    
    5:
           4
         /   \
       1       7
        \     /
         2   5
    
    6:
           4
         /   \
       1       7
        \     /
         2   5
              \
               6
    
    3:
           4
         /   \
       1       7
        \     /
         2   5
          \   \
           3   6
    Notice how each insertion follows the guarantee that every value is greater than the values in the left subtree and less than the values in the right subtree. Once again, an algorithm is immediate when we use a similar searching method as before:
    Code:
    Source 5:
    
    1.  int pretree_insert (
    2.    struct pretree *tree,
    3.    struct pretree_node *new_item,
    4.    int search_key )
    5.  {
    6.    struct pretree_node *walk;
    7.    struct pretree_node **save;
    8.    int dir;
    9.
    10.   /*
    11.    * Search for an empty link
    12.    * Return failure if a duplicate is found
    13.    */
    14.   save = &tree->header;
    15.   walk = tree->header;
    16.   for ( ; ; ) {
    17.     if ( walk == NULL )
    18.       break;
    19.     else if ( walk->data == search_key )
    20.       return 0;
    21.     dir = walk->data < search_key;
    22.     save = &walk->link[dir];
    23.     walk = walk->link[dir];
    24.   }
    25.   /*
    26.    * Insert the new item
    27.    */
    28.   *save = new_item;
    29.   tree->size++;
    30.
    31.   return 1;
    32. }
    The first thing you should notice is that we use an extra variable, and double pointer called save. This variable is used to hold the address of the link that we followed to get to the current node. Without this pointer, we would not be able to change the tree because walk would be a null pointer. When walk is a null pointer, we must have a way to afix the new node to the tree, and that is what save does for us. On line 14 we set save to point to the address of the header and walk to the header itself. We then enter the search algorithm that should be familiar by now, the only difference is that if walk is a null pointer, we break instead of return, and we always reset save to point to the address of the link we intend to follow. Note that the assignment to save must be made before the assignment to walk, or we will have lost the address of the link that we followed. If the search key is found, we have encountered a duplicate node. How duplicate nodes are handled depends on the application, but
    we will be ignoring them, so we return failure. When walk is null, we break out of the loop and assign the new node to the link that points there. Note that if we assigned directly to walk, the new node would not be afixed to the tree, it would simply be floating in limbo, and that isn't very useful since upon returning we would lose it. Next we add one to the size of the tree and return success. Study this code very carefully as it introduces the rather difficult to grasp technique of using a double pointer to save the link followed.

    As with simple searching, insertion can be easily converted to a recursive solution:
    Code:
    Source 6:
    
    1.  struct pretree_node *insert_subtree (
    2.    struct pretree_node *header,
    3.    struct pretree_node *new_item,
    4.    int search_key )
    5.  {
    6.    if ( header == NULL )
    7.      header = new_item;
    8.    else if ( header->data != search_key ) {
    9.      int dir = header->data < search_key;
    10.     header->link[dir] = insert_subtree ( header->link[dir], new_item, search_key );
    11.   }
    12.  
    13.   return header;
    14. }
    15.
    16. void pretree_insert (
    17.   struct pretree *tree,
    18.   struct pretree_node *new_item,
    19.   int search_key )
    20. {
    21.   tree->header = insert_subtree ( tree->header, new_item, search_key );
    22.   tree->size++;
    23. }
    The similarities with searching are striking. Once again the algorithm is not as transparent as with an iterative approach, but it is certainly more compact. pretree_insert is careful to reset the header if that needs to be done, resetting the header was implicit in our iterative approach. It also handles adding one to the size, however, it assumes that the item was inserted. This is a problem if a duplicate value is found. insert_subtree is very similar to find_subtree except that when it finds a null link, it assigns the new item to it and returns that pointer instead of success code. The recursive call must be careful to assign its return value to the link that it passed to itself. Aside from these few points, it should be clear how recursive insertion works.

    Traversal

    Now that we have the code necessary to build a binary search tree, we can do things with it. Naturally, you can use the search functions defined in Source 3 and Source 4 to determine if an item is in the tree, but it would be nice to verify that the insertion functions work properly. It would also be nice to be able to perform an operation on every item in the list, such as displaying them. And of course, since we are most likely using dynamic memory to allocate each node in the tree, it would be wise to release that memory at some point. This brings us to the field of tree traversal.

    One would think that visiting every node in a tree would be simple and consist of one, maybe two different cases. Surprisingly, this is not the case. There are actually N! different ways to traverse a binary search tree of N nodes, but most of them are useless. Of the many ways to traverse a binary search tree, we will be looking at two categories: breadth-first, where we will use the level-order traversal as an example, and depth-first, where we will use preorder, inorder, and postorder traversals as examples. We will then go on to discuss more exotic traversal techniques.

    Breadth-First Traversal

    Breadth-first traversal is visiting each node in a binary search tree, starting from the root and moving down level by level. Traditionally, breadth-first traversal visits each node on the same level from left to right. In the following diagram, the breadth-first traversal would visit nodes in the order 5, 3, 7, 2, 6, 8.
    Code:
    Diagram 4:
    
                    5
             3             7
         2             6       8
    This definition gives us another few terms to play with:

    Term 5: The height of a node is the number of nodes in its highest subtree. The height of node 3 in Diagram 4 is 1, while the height of node 5 is 2.

    Term 6: Each node with the same height is on the same level. In Diagram 4, nodes 3 and 7 both have a height of 1, so they are on the same level.

    A breadth-first traversal algorithm is often implemented as the level-order traversal. While there can be differences, most often level-order traversal matches the definition of breadth-first traversal. This is one of the algorithms on a binary search tree that cannot be written recursively. For those that are familiar with recursion, you know that a recursive solution can be simulated through the use of a stack. Level-order traversal requires the use of a queue, however, so recursion is not a practical option.

    The algorithm itself is very simple, visit the first node saved, then save its left and right links. Continue to do this until there are no more links to visit. An example using Diagram 4 follows:
    Code:
    Diagram 5:
    
    save 5,  queue = { 5 }
    visit 5, queue = {}
    save 3,  queue = { 3 }
    save 7,  queue = { 7, 3 }
    visit 3, queue = { 7 }
    save 2,  queue = { 2, 7 }
    visit 7, queue = { 2 }
    save 6,  queue = { 6, 2 }
    save 8,  queue = { 8, 6, 2 }
    visit 2, queue = { 8, 6 }
    visit 6, queue = { 8 }
    visit 8, queue = {}
    The algorithm itself is suprisingly simple:
    Code:
    Source 7:
    
    void pretree_levelorder (
      struct pretree_node *header,
      void (*action) ( struct pretree_node *node ) )
    {
      struct pretree_node *save[50];
      int front = 0;
      int back = 0;
    
      if ( header == NULL )
        return;
      save[front++] = header;
      while ( front != back ) {
        header = save[back++];
        action ( header );
        if ( header->link[0] != NULL )
          save[front++] = header->link[0];
        if ( header->link[1] != NULL )
          save[front++] = header->link[1];
      }
    }
    The queue that I have used is naive and trivial, but it serves its purpose. Since this is a tutorial about trees and not queues, I will refrain from describing how it works and we can simply assume that it does. The code follows the algorithm that I described in Diagram 5 exactly, but you should still study it to make sure that you understand how it works.

    Depth-First Traversal

    Depth-first traversals begin by moving as far left or right as possible, as opposed to breadth-first traversals which begin with the root. These traversals then move up one link and move left or right. This process is repeated until all nodes have been visited. The question of course, is when do the nodes get visited? Since we can only move in one of two directions, we can determine that there are only six possible ways to traverse and visit in the depth-first variation. Recognizing that you can move left, move right, or visit with every operation, the following variants are available:

    1: visit, move left, move right
    2: visit, move right, move left
    3: move left, visit, move right
    4: move right, visit, move left
    5: move left, move right, visit
    6: move right, move left, visit

    Of these six, three are common enough to give standardized names: Traversal 1 is referred to as preorder because the visitation is performed before any movement is made, Traversal 3 is referred to as inorder bcaus the visitation is performed in between the two movements and results in a traversal in the order of the values, and finally, Traversal 5 is called postorder because the visitation is performed after both movements. Each of these can be written using a short and elegant recursive scheme:
    Code:
    Source 8:
    
    void pretree_preorder (
      struct pretree_node *header,
      void (*action) ( struct pretree_node *node ) )
    {
      if ( header == NULL )
        return;
      action ( header );
      pretree_preorder ( header->link[0], action );
      pretree_preorder ( header->link[1], action );
    }
    
    void pretree_inorder (
      struct pretree_node *header,
      void (*action) ( struct pretree_node *node ) )
    {
      if ( header == NULL )
        return;
      pretree_inorder ( header->link[0], action );
      action ( header );
      pretree_inorder ( header->link[1], action );
    }
    
    void pretree_postorder (
      struct pretree_node *header,
      void (*action) ( struct pretree_node *node ) )
    {
      if ( header == NULL )
        return;
      pretree_postorder ( header->link[0], action );
      pretree_postorder ( header->link[1], action );
      action ( header );
    }
    Using yet again Diagram 4, we can follow the execution of these three traversals. The preorder traversal visits a node and then moves, so the nodes would be visited in the order of 5, 3, 2, 7, 6, 8. An inorder traversal moves as far left as possible before visiting the node, so the order would be 2, 3, 5, 6, 7, 8. Postorder traversal would result in 2, 3, 5, 6, 8, 7.

    Despite these wonderfully concise solutions, it is often prudent to take the iterative approach to traversal. These traversals are more difficult because of the dual recursive calls. If you direct your attention to the definition of a level-order traversal you may notice similarities with the preorder traversal if the queue were replaced with a stack. Believe it or not, that is precisely what needs to be done to write an iterative preorder traversal:
    Code:
    Source 9:
    
    void pretree_preorder (
      struct pretree_node *header,
      void (*action) ( struct pretree_node *node ) )
    {
      struct pretree_node *save[50];
      int top = 0;
    
      if ( header == NULL )
        return;
      save[top++] = header;
      while ( top != 0 ) {
        header = save[--top];
        action ( header );
        if ( header->link[1] != NULL )
          save[top++] = header->link[1];
        if ( header->link[0] != NULL )
          save[top++] = header->link[0];
      }
    }
    Inorder traversal is more difficult. We need to walk to the left without losing any of the right links or any of the parents. This implies at least several loops, one to save the links to walk back, one to visit the saved links, and another to manage successive branches. Fortunately, while the logic is rather complex, the code is fairly simple:
    Code:
    Source 10:
    
    void pretree_inorder (
      struct pretree_node *header,
      void (*action) ( struct pretree_node *node ) )
    {
      struct pretree_node *save[50];
      int top = 0;
    
      while ( header != NULL ) {
        while ( header != NULL ) {
          if ( header->link[1] != NULL )
            save[top++] = header->link[1];
          save[top++] = header;
          header = header->link[0];
        }
        header = save[--top];
        while ( top != 0 && header->link[1] == NULL ) {
          action ( header );
          header = save[--top];
        }
        action ( header );
        header = ( top != 0 ) ? save[--top] : NULL;
      }
    }
    Threaded Trees And Traversal

    It is possible to implement depth-first traversals without the aid of a stack or recursion. The biggest concern with either an implicit stack (through recursion) or an explicit stack is that of the extra time and space required to maintain the stack. If the tree is skewed so that the height is equivalent to the number of nodes, the potential exists for stack overflow in recursive and fixed size stack solutions, and for explicit variable size stack solutions, the potential for taking up almost the same amount of space and time as the tree itself is uncomfortably likely.

    To counter this, the concept of threads was developed. Threads are pointers to the successor and predecessor of a node in a predefined sequence, such as inorder traversal. Of course, this would add two pointers to each node of the tree, often an unacceptable increase in size. This problem can be solved by overloading existing unused pointers. While each node would still need a flag of some sort to determine if a link is normal or overloaded, this flag need not take up nearly as much valuable space as a full link. Using threads, tree traversal is simple. Here is some pseudocode for one possible implementation of inorder traversal using threads:
    Code:
    Source 11:
    
    sub threaded_inorder ( tree )
      node walk := tree
      node prev
    
      if walk = null then
        return
      endif
      while walk.left != null do
        walk := walk.left
      loop
      while walk != null do
        visit ( walk )
        prev := walk
        walk := walk.right
        if walk != null then
          if prev.successor = 0 then
            while walk.left != nul do
              walk := walk.left
            loop
          endif
        endif
      loop
    end sub
    Another option for a stackless depth-first traversal is to transform the tree as it is traversed. The obvious disadvantage to this is that several more operations are performed on the tree. One such algorithm can be easily written for an inorder traversal by realizing that inorder traversal is easy if none of the nodes have left subtrees. If a node does have a left subtree, we can fix it so that it does not if we are willing to temporarily destroy the structure of the tree. The idea is that if a node has a left subtree, we find the predecessor of that node (the predecessor is the node with the next smallest value) and make the node the right child of the predecessor. This is not a simple operation, so an example follows:
    Code:
    Diagram 6:
    
        5
       / \
      2   7
     / \
    1   3
    In this diagram, the current node is 5, but 5 has a left subtree. So we find 5's predecessor, which is 3. This is done by moving to 5's left link, then moving as far as possible to the right. We then set 3's right link to point to 5, resulting in the following tree:
    Code:
    Diagram 7:
    
       2
      / \
     1   3
          \
           5
            \
             7
    This would be trivial to write a function for if not for the problem that the tree is in the worst possible structure after completely traversing with such transformations. What we need is a way to return the tree to its original state without using a stack. This can be done easily by using the predecessor pointer and reversing the previous operations:
    Code:
    Source 12:
    
    void xform_inorder (
      struct pretree *tree,
      void (*action) ( struct pretree_node *node ) ) 
    {
      struct pretree_node *walk = tree->header;
      struct pretree_node *heir;
    
      while ( walk != NULL ) {
        if ( walk->link[0] == NULL ) {
          action ( walk );
          walk = walk->link[1];
        }
        else {
          heir = walk->link[0];
          while ( heir->link[1] != NULL
            && heir->link[1] != walk )
          {
            heir = heir->link[1];
          }
          if ( heir->link[1] == NULL ) {
            heir->link[1] = walk;
            walk = walk->link[0];
          }
          else {
            action ( walk );
            heir->link[1] = NULL;
            walk = walk->link[1];
          }
        }
      }
    }
    Deletion

    At some point in a tree's life, it would be nice to be able to remove an item. Item removal is one of the primary reasons that a developer will choose not to use a binary search tree. Deletion is not a trivial task, especially when more advanced trees are used to guarantee logarithmic performance. The problem with deletion is that there are several cases of varying complexity. How difficult it is to remove a node depends heavily on where the node exists in the tree structure. There are three cases:

    Case 1: The node is a leaf, simply set the link followed to it to null.
    Case 2: The node has one child, set the link followed to it to point to the child.
    Case 3: The node has two children. This is the hardest case and can be performed several ways, we will discuss two of them.

    Deletion By Merging.

    This is the most common method of deletion in books and other educational text. It begins by finding the successor (or predecessor) of the node to be removed. Then the node's left (or right) link is spliced into the successor's (or predecessor's) matching link. Reset the link that was followed and the merging is complete. An example follows:
    Code:
    Diagram 8:
    
    (Remove 5)
         |              |
         7              9
        / \            /
       4   9   -->    4
      / \            / \
     2   5          2   5
    Here are all of the cases put together into one working function:
    Code:
    Source 13:
    
    int pretree_remove (
      struct pretree *tree,
      struct pretree_node **old_item,
      int search_key )
    {
      struct pretree_node *walk;
      struct pretree_node **save;
      int dir;
    
      /*
       * Search for a matching link
       * Return failure if not found
       */
      save = &tree->header;
      walk = tree->header;
      for ( ; ; ) {
        if ( walk == NULL )
          return 0;
        else if ( walk->data == search_key )
          break;
        dir = walk->data < search_key;
        save = &walk->link[dir];
        walk = walk->link[dir];
      }
      /*
       * Remove the old item
       */
      if ( walk->link[1] == NULL ) {
        *save = walk->link[0];
        *old_item = walk;
      }
      else if ( walk->link[0] == NULL ) {
        *save = walk->link[1];
        *old_item = walk;
      }
      else {
        struct pretree_node *heir = walk->link[1];
    
        while ( heir->link[0] != NULL )
          heir = heir->link[0];
        heir->link[0] = walk->link[0];
        *save = walk->link[1];
        *old_item = walk;
      }
      tree->size--;
    
      return 1;
    }
    Notice the comfortable old searching logic. Almost every algorithm performed on binary search trees involves searching, so aren't you glad you took the time to fully understand it? Next we move to the three cases, the first two being simple. If either of walk's links are null then we can perform a simple case, just replace walk with it's non-null link. The third case should be easy to find, it involves the heir pointer and a typical search for a node's successor. Once the successor is found, the splicing is done in two lines. Be sure to study this code very carefully and make sure you understand it by working through the logic on sample trees. Deletion will become a thorn in our sides shortly, and any lack of understanding of the basics will be a problem.

    Deletion By Copying

    Deletion by merging is not an optimal solution because it can cause the height of the tree to increase. If at all possible, your tree code should keep a tree at the smallest height possible to accelerate searching. In some cases deletion by merging can result in a highly unbalanced tree, so we should work toward something a little better. Fortunately, such an algorithm has already been devised. If the node has two children, we can reduce the case to one of the simpler cases by replacing the node with its immediate successor or predecessor and removing the successor (or predecessor):
    Code:
    Source 14:
    
    int pretree_remove (
      struct pretree *tree,
      struct pretree_node **old_item,
      int search_key )
    {
      struct pretree_node *walk;
      struct pretree_node **save;
      int dir;
    
      /*
       * Search for a matching link
       * Return failure if not found
       */
      save = &tree->header;
      walk = tree->header;
      for ( ; ; ) {
        if ( walk == NULL )
          return 0;
        else if ( walk->data == search_key )
          break;
        dir = walk->data < search_key;
        save = &walk->link[dir];
        walk = walk->link[dir];
      }
      /*
       * Remove the old item
       */
      if ( walk->link[1] == NULL ) {
        *save = walk->link[0];
        *old_item = walk;
      }
      else if ( walk->link[0] == NULL ) {
        *save = walk->link[1];
        *old_item = walk;
      }
      else {
        struct pretree_node *heir = walk->link[1];
        struct pretree_node *prev = walk;
    
        while ( heir->link[0] != NULL ) {
          prev = heir;
          heir = heir->link[0];
        }
        walk->data = heir->data;
        prev->link[prev == walk] = heir->link[1];
        *old_item = heir;
      }
      tree->size--;
    
      return 1;
    }
    The use of a pointer to the previous node replaces the save pointer used in the other two cases. Of primary interest is the assignment of heir's data to walk's data, this is the swap that transforms a case of having two children to a case of having one or no children. The link we followed is then set to heir's right subtree, completing the splice.

    Deletion by copying does not increase the height of the tree, but it can still result in an unbalanced tree if used frequently with insertions. The reason is because the algorithm is asymmetric, it always removes the successor. The correct number and type of insertions and deletions can result in a left-unbalanced tree. See exercise 16 for a solution.

    Parent Pointers

    A nice feature of any data structure is that you can make it as complex as you want to simplify certain operations. Using a stack seems cumbersome for some iterative tree operations, and recursion should be avoided whenever possible and/or sensible. Recall how adding a pointer to the previous node in linked lists simplified everything so much? Perhaps the same thing would be as effective in binary trees, a pointer to the parent perhaps? It only requires a slight change to our node declaration to get started:
    Code:
    Source 15:
    
    struct pretree_node {
      struct pretree_node *link[2];
      struct pretree_node *parent;
      int data;
    };
    Simple, no? Well then, let's try and do an insertion with parent pointers. The only real difference is fixing the parent pointer of the new node. But, posting all of that code for the same thing you've seen before would be pointless, so here is another variation of insertion.
    Code:
    Source 16:
    
    int pretree_insert (
      struct pretree *tree,
      struct pretree_node *new_item,
      int search_key )
    {
      struct pretree_node *walk;
      int dir;
    
      /*
       * Handle empty tree case
       */
      if ( tree->header == NULL ) {
        tree->header = new_item;
        tree->header->parent = NULL;
        return 1;
      }
      /*
       * Search for an empty link
       * Return failure if a duplicate is found
       */
      walk = tree->header;
      for ( ; ; ) {
        if ( walk->data == search_key )
          return 0;
        dir = walk->data < search_key;
        if ( walk->link[dir] == NULL )
          break;
        walk = walk->link[dir];
      }
      /*
       * Insert the new item
       */
      walk->link[dir] = new_item;
      new_item->parent = walk;
      tree->size++;
    
      return 1;
    }
    This time instead of using a save pointer, we handle the case of an empty tree explicitly and test for the next link to check instead of the current node. This gives us a path back into the tree without using a double pointer, but can cause readers to do a double take. Study, understand, yadda yadda. You know the drill by now. :-)

    So far, parent pointers appear immensely useful and also easy to maintain. Well, unfortunately, any further maintenance such as simple rotations (You will learn about this later) and the splicing required by deletion operations requires considerable thought and care in resetting the parent pointers. As an example, here is one possible deletion function using parent pointers:
    Code:
    Source 17:
    
    int pretree_remove (
      struct pretree *tree,
      struct pretree_node **old_item,
      int search_key )
    {
      struct pretree_node *walk;
      struct pretree_node **save;
      int dir;
    
      /*
       * Search for a matching link
       * Return failure if not found
       */
      save = &tree->header;
      walk = tree->header;
      for ( ; ; ) {
        if ( walk == NULL )
          return 0;
        else if ( walk->data == search_key )
          break;
        dir = walk->data < search_key;
        save = &walk->link[dir];
        walk = walk->link[dir];
      }
      /*
       * Remove the old item
       */
      if ( walk->link[1] == NULL || walk->link[0] == NULL ) {
        dir = walk->link[0] == NULL;
        /* Fix parent pointers */
        if ( walk->link[dir] != NULL ) {
          walk->link[dir]->parent = (*save)->parent;
          if ( walk->link[dir]->link[dir] != NULL )
            walk->link[dir]->link[dir]->parent = *save;
          if ( walk->link[dir]->link[!dir] != NULL )
            walk->link[dir]->link[!dir]->parent = *save;
        }
        *save = walk->link[dir];
        *old_item = walk;
      }
      else {
        struct pretree_node *heir = walk->link[1];
        struct pretree_node *prev = walk;
        /* Find walk's successor */
        while ( heir->link[0] != NULL ) {
          prev = heir;
          heir = heir->link[0];
        }
        walk->data = heir->data;
        prev->link[prev == walk] = heir->link[1];
        /* Fix parent pointers */
        if ( heir->link[1] != NULL )
          heir->link[1]->parent = prev;
        *old_item = heir;
      }
      tree->size--;
    
      return 1;
    }
    This is the same as the delete by copying solution we came up with earlier with two changes. First, the two separate simple cases have been merged into one if statement to avoid duplicating the, rather extensive, code to fix our parent pointers. To follow this, it helps to split the cases up again. The use of dir and !dir is one of the confusing boolean tricks I was talking about when we began. Useful, but it can be difficult to understand at first glance. Rest assured that when you study and attempt to understand the logic of this function, the meaning will become clear. The next change is, of course, the code to fix any changed parent pointers. The logic is obvious, but awkward.

    As we saw in Source 17, parent pointers can be useful, but they are also difficult to maintain properly. As an example of the advantages and disadvantages of parent pointers, we will now look at the concept of root insertions into a binary search tree.

    Root Insertion

    Inserting a new node at the root of a tree at first seems like an exercise in the trivial, but it isn't as easy as one might think at first because there may be a node in one of the subtrees that simply splicing into the root would violate the first rule of a binary search tree, that every node in the left subtree is less than the item and every node in the right subtree is greater than the item.

    To properly insert a node at the root of a tree, we need to consider the concept of rotating nodes in a binary search tree. The concept is simple, the code is simple, but in practice it can be frustrating (Trust me, I have the scars to prove it. ;-)). There are two types of rotations, left and right. When rotating left, the right node becomes the parent of its parent and the parent becomes the left subtree of the right node:
    Code:
    Diagram 9 (left rotation where 1 is C and 7 is B):
    
     C->right = B->left;
     B->left = C;
    
       1               7
      / \             / \
     0   7    -->    1   8
        / \         / \
       4   8       0   4
    Right rotation is the exact opposite of left rotation, the left node becomes the parent of its parent and the parent becomes the right subtree of the left node.
    Code:
    Diagram 10 (right rotation):
    
     C->left = B->right;
     B->right = C;
    
         7           1
        / \         / \
       1   8  -->  0   7
      / \             / \
     0   4           4   8
    Note that the code given in the above diagrams is all you need to rotate, provided C and B are both non-null. To insert at the root of a tree, we can insert as we normally would, then walk back up the tree, performing the necessary rotations to bring the new node higher until it becomes the root node. The key point to remember is that if the new node is the left child of its parent, we rotate right, otherwise we rotate left. A recursive solution to this problem is trivial, but an iterative solution is more difficult. The problem is in managing rotations while still being able to move up the tree, a direction that traditional binary search trees are no designed for.

    Here is an idea, why don't we use our tree implementation with parent pointers? This is a good example of the power of links to the parent, but as described earlier, the maintenance can be tricky, far more than a stack based approach:
    Code:
    Source 18:
    
    int pretree_insert (
      struct pretree *tree,
      struct pretree_node *new_item,
      int search_key )
    {
      struct pretree_node *walk;
      int dir;
    
      /*
       * Handle empty tree case
       */
      if ( tree->header == NULL ) {
        tree->header = new_item;
        tree->header->parent = NULL;
        return 1;
      }
      /*
       * Search for an empty link
       * Return failure if a duplicate is found
       */
      walk = tree->header;
      for ( ; ; ) {
        if ( walk->data == search_key )
          return 0;
        dir = walk->data < search_key;
        if ( walk->link[dir] == NULL )
          break;
        walk = walk->link[dir];
      }
      /*
       * Insert the new item
       */
      walk->link[dir] = new_item;
      new_item->parent = walk;
      tree->size++;
      /*
       * Walk back up and rotate
       */
      while ( new_item != tree->header ) {
        if ( new_item == walk->link[0] ) {
          /* Rotate right */
          walk->link[0] = new_item->link[1];
          new_item->link[1] = walk;
        }
        else {
          /* Rotate left */
          walk->link[1] = new_item->link[0];
          new_item->link[0] = walk;
        }
        /* Notify the rest of the tree of changes */
        if ( walk->parent != NULL ) {
          dir = ( walk == walk->parent->link[1] );
          walk->parent->link[dir] = new_item;
        }
        /* Fix parent pointers */
        new_item->parent = walk->parent;
        walk->parent = new_item;
        if ( walk->link[0] != NULL )
          walk->link[0]->parent = walk;
        if ( walk->link[1] != NULL )
          walk->link[1]->parent = walk;
        /* Reseat header if needed */
        if ( tree->header == walk )
          tree->header = new_item;
        /* Move up and start over */
        walk = new_item->parent;
      }
    
      return 1;
    }
    Notice that the insertion is the same as with our regular parent pointer insertion function. The difference is what we do after inserting the new node. We walk up the tree by looping until walk is pointing to the header, and with every iteration we assign walk to point to the new node's parent, which after all of our voodoo magic, is precisely where we want it to be. The only really tricky parts of this algorithm are making sure that the rest of the tree is aware of our changes, and properly adjusting the parent pointers. Study this code very carefully and analyze its execution in test cases to ensure that everything works properly. For a good test case, I would suggest a sequence of 4, 7, 1, 2, 5, 6, 3.

    As a counterpoint for the use of parent pointers, a stack can be used instead of parent pointers to simplify the logic, and instead of recursion for efficiency and safety reasons. Without the parent maintenance, the logic of root insertion becomes obvious:
    Code:
    Source 19:
    
    int pretree_insert (
      struct pretree *tree,
      struct pretree_node *new_item,
      int search_key )
    {
      struct pretree_node *walk;
      struct pretree_node *up[50];
      int dir;
      int top = 0;
    
      /*
       * Handle empty tree case
       */
      if ( tree->header == NULL ) {
        tree->header = new_item;
        return 1;
      }
      /*
       * Search for an empty link
       * Return failure if a duplicate is found
       */
      walk = tree->header;
      for ( ; ; ) {
        if ( walk->data == search_key )
          return 0;
        dir = walk->data < search_key;
        if ( walk->link[dir] == NULL )
          break;
        up[top++] = walk;
        walk = walk->link[dir];
      }
      /*
       * Insert the new item
       */
      walk->link[dir] = new_item;
      tree->size++;
      /*
       * Walk back up and rotate
       */
      while ( new_item != tree->header ) {
        if ( new_item == walk->link[0] ) {
          /* Rotate right */
          walk->link[0] = new_item->link[1];
          new_item->link[1] = walk;
        }
        else {
          /* Rotate left */
          walk->link[1] = new_item->link[0];
          new_item->link[0] = walk;
        }
        /* Notify the rest of the tree of changes */
        if ( up[top - 1] != NULL ) {
          dir = ( walk == up[top - 1]->link[1] );
          up[top - 1]->link[dir] = new_item;
        }
        /* Reseat header if needed */
        if ( tree->header == walk )
          tree->header = new_item;
        /* Move up and start over */
        walk = up[--top];
      }
      return 1;
    }
    As you can see, the difference is signifigant. Now, instead of cluttering the algorithm with unnecessary link maintenance, we have a cleaner solution. Of course, the usual problems with an explicit stack apply, but so do the usual problems with extra links in every node. Which approach you take depends on the application. Root insertion is a useful technique as many applications will search for recently inserted items. It makes more sense to keep new items closer to the root in such cases.

    Conclusion

    Binary search trees are an excellent data structure for efficient insertion, deletion, and searching operations without a great deal of complexity. The average case for a binary search tree created with random values is logarithmic (O(logN)), which is quite good. Unfortunately, the worst case of binary search trees is linear (O(N)), but there are ways to modify the insertion, deletion, and search algorithms to guarantee logarithmic performance. Such techniques are far more complex and will be covered in future installments of this tutorial.

    Exercises

    Exercise 1: In our recursive insertion function (Source 6), I noted that the size of the tree would increase even if a duplicate value was found and ignored. This would cause the actual size of the tree to differ from the recorded size. Rewrite the two functions to fix this.

    Exercise 2: In Source 7, I used an array based queue to implement level-order traversal. Verify that this queue does what it is supposed to. If it does not, rewrite the function to fix it.

    Exercise 3: Using the code developed in Exercise 2, determine any cases where the function would fail unexpectedly. Rewrite the function to fix any problems found.

    Exercise 4: Verify that Source 9 works as expected.

    Exercise 5: Explain the similarities between level-order traversal and preorder traversal that they can be implemented exactly the same iteratively with only the difference between use of a stack and use of a queue.

    Exercise 6: Show the contents of the stack during the execution of Source 10.

    Exercise 7: Describe any flaws in the implementation of Source 10, make changes as required and verify that the algorithm still works.

    Exercise 8: Write a function that implements postorder traversal iteratively. Verify that it is correct.

    Exercies 9: Write an insertion function for a threaded tree.

    Exercise 10: Write a C/C++ function to search a threaded tree in preorder, inorder, and postorder.

    Exercise 11: Verify that following a node's left link all the way to the right results in finding the predecessor. Does this algorithm also find a node's successor by following the right link all the way to the left? Why?

    Exercise 12: Verify that the algorithm described in Diagrams 6 and 7 works as expected.

    Exercise 13: Analyze Source 12 and diagram its execution on Diagram 4. Can this algorithm be improved? If so, determine how and make the necessary modifications.

    Exercise 14: In Source 13, the first two cases handle a node with a single link. Determine if these two cases can also handle a leaf node and explain how.

    Exercise 15: Analyze the execution of Source 14 and verify that it is indeed an improvement over deletion by merging.

    Exercise 16: To solve the problem of asymmetric deletion by copying, write a function that alternates between replacing the successor and replacing the predecessor. Verify that this is an improvement, and explain why.

    Exercise 17: Can Source 17 be improved? If so, rewrite the function to do so.

    Exercise 18: Verify that simply splicing a new node at the root may violate the definition of a binary search tree.

    Exercise 19: Write a recursive function to insert an item at the root of a binary search tree.

    Exercise 20: Compare and contrast Source 19 and Source 18. Which is the better solution for a general binary search tree library?
    My best code is written with the delete key.

  2. #2
    Registered User
    Join Date
    Jul 2003
    Posts
    450
    I compliment you on this informative post and hope it is stickied. I am currently implementing a binary tree and have posted several questions regarding basic syntax I'll need for my implementation. I did not read the entire post yet, because I want to discover from my own work, but will read further after I attempt traversal and then searching algorithims.

    You show a great deal of initiative in helping inform us more inexperienced programmers educate ourselves. This will simplify research. For instance I currently have three books open regarding the subject and this coallates the information. Again I think you and will bookmark this post. The funny thing is I will not be studying data structures or even C/C++ at the university until next year, so will have a tremendous leap compared to the rest of the students.

  3. #3
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >For instance I currently have three books open regarding the subject
    When I was first learning binary search trees and balancing techniques, I ended up buying and burning hard-stare holes in eleven data structure books. I highly recommend getting as many different perspectives as you can. Everyone describes things differently, and usually you have to hear three or four of them to really "get" it. As for different perspectives, not you have another to add to the mix.

    >The funny thing is I will not be studying data structures or even C/C++ at the university until next year
    Who cares? Data structures are a blast! You don't have to take a class to learn fun new things.
    My best code is written with the delete key.

  4. #4
    ¡Amo fútbol!
    Join Date
    Dec 2001
    Posts
    2,136
    Originally posted by Prelude
    Data structures are a blast
    Errrr.... yeah.

    NOTE TO SELF: Prelude is on LSD again.
    Last edited by golfinguy4; 12-31-2003 at 12:56 PM.

  5. #5
    Registered User sean345's Avatar
    Join Date
    Mar 2002
    Posts
    346
    Very nice tutorial. It seems to explain everything well and is easy to follow.

    - Sean
    If cities were built like software is built, the first woodpecker to come along would level civilization.
    Black Frog Studios

  6. #6
    carry on JaWiB's Avatar
    Join Date
    Feb 2003
    Location
    Seattle, WA
    Posts
    1,972
    Didn't quite get through reading the whole thing but it was pretty good. I would prefer C++, though
    "Think not but that I know these things; or think
    I know them not: not therefore am I short
    Of knowing what I ought."
    -John Milton, Paradise Regained (1671)

    "Work hard and it might happen."
    -XSquared

  7. #7
    l'Anziano DavidP's Avatar
    Join Date
    Aug 2001
    Location
    Plano, Texas, United States
    Posts
    2,738
    well it is already in C, so it should compile for any C++ compiler.

    Excellent tutorial. I already know binary search trees (and have known them for several years), so I just scanned through it to see what topics it covered.

    You seemed to cover quite a lot of topics, and by the looks of it you explained everything with diagrams and code very well. Good job.

    I also see that you covered more traversals than just the pre-order, post-order, and in-order traversals. Most people don't go beyond those 3. Good job.
    My Website

    "Circular logic is good because it is."

  8. #8
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >Prelude is on LSD again.
    If I am, it's because my bosses slipped it into my food. They like when I have fun writing code.

    >I would prefer C++, though
    I prefer not to use proper C++ (ie. a good container class) for data structure tutorials. The scaffolding of the class tends to obscure the logic of the data structure. This makes things harder to follow in general, a problem that simple functions don't have. Then of course there is the advantage of catering to both C and C++ users. Anyone familiar with C++ and the desire to do so can encapsulate the functions into a container class with little difficulty.

    >You seemed to cover quite a lot of topics
    I realized earlier today that I forgot to mention the problem of using deletion by copying when the data items are very large. I'll add that when I have a chance.

    >Most people don't go beyond those 3.
    An annoying tendency to be sure. I got complaints about a previous binary search tree tutorial because it only had the three. I made sure not to make the same mistake twice after I found out how sharp my readers are.
    My best code is written with the delete key.

  9. #9
    Registered User
    Join Date
    Jul 2003
    Posts
    450
    This post is not to be digested in one sitting. I have read up to the point where you implement the iterative algorithim of the Depth first inorder traversal routine. I am going to have to sit down and draw out what is happening before I fully understand it. I thought I would just be able to read the rest of the post at this point, but now realize I will have to do some work before I fill confident. A little better explanation of the logic behind the iterative approach of the inorder routine would have been nice, but this would be a redundant explanation and giving everything to the reader encourages a lazy coder!

  10. #10
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >A little better explanation of the logic behind the iterative approach of the inorder routine would have been nice
    In my experience, no amount of explanation will truly drive the point home. You have to work with the logic yourself to understand completely. Explanation just guides you on the big picture.
    My best code is written with the delete key.

  11. #11
    C++ Developer XSquared's Avatar
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    2,718
    From Source 3,
    >> 17. dir = walk->data < search_key;
    >> 18. walk = walk->link[dir];

    Are you sure that dir is guaranteed to be 1 or 0? I thought that true is defined as anything but 0.

    In Diagram 8 (Deletion), you say remove 5, but you proceed to remove the 7. Also, it'd be nice if you provided an example where both of the subtrees of the node to be deleted had a left and a right subtree, to demonstrate what happens then.
    Naturally I didn't feel inspired enough to read all the links for you, since I already slaved away for long hours under a blistering sun pressing the search button after typing four whole words! - Quzah

    You. Fetch me my copy of the Wall Street Journal. You two, fight to the death - Stewie

  12. #12
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,006
    >Are you sure that dir is guaranteed to be 1 or 0?
    6.5.8 Relational operators
    6 Each of the operators < (less than), > (greater than), <= (less than or equal to), and >= (greater than or equal to) shall yield 1 if the specified relation is true and 0 if it is false.89) The result has type int.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  13. #13
    C++ Developer XSquared's Avatar
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    2,718
    OK, cool. I wasn't sure about that, but now I am. Thanks.

    BTW, does this apply to C++ too?
    Naturally I didn't feel inspired enough to read all the links for you, since I already slaved away for long hours under a blistering sun pressing the search button after typing four whole words! - Quzah

    You. Fetch me my copy of the Wall Street Journal. You two, fight to the death - Stewie

  14. #14
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >Are you sure that dir is guaranteed to be 1 or 0?
    Quite.

    >In Diagram 8 (Deletion), you say remove 5, but you proceed to remove the 7.
    I sure did, didn't I? I imagine that would cause some confusion, and probably some really strange code for those that followed the diagrams exactly.

    >Also, it'd be nice if you provided an example where both of the subtrees of the node to be deleted had a left and a right subtree, to demonstrate what happens then.
    Good idea.

    >BTW, does this apply to C++ too?
    It does indeed.
    My best code is written with the delete key.

  15. #15
    5|-|1+|-|34|) ober's Avatar
    Join Date
    Aug 2001
    Posts
    4,429
    Maybe this would be most helpful if added to the FAQ or help section of the site?

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Trouble with client GET request path
    By NuNn in forum C Programming
    Replies: 1
    Last Post: 02-25-2009, 02:34 PM
  2. Can you check what is wrong with this code
    By Ron in forum C++ Programming
    Replies: 4
    Last Post: 08-01-2008, 10:59 PM
  3. my HTTP request handler code correct?
    By George2 in forum C# Programming
    Replies: 0
    Last Post: 04-25-2008, 04:01 AM
  4. denied request
    By Unregistered in forum C++ Programming
    Replies: 1
    Last Post: 09-20-2001, 11:35 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21