![]() |
| | #1 |
| Code Goddess Join Date: Sep 2001
Posts: 9,661
| Tutorial review ). Enjoy, and then rip it to shreds so that I can improve it.Part II - Randomized Binary Search Trees In part I you learned that binary search trees are a powerful data structure for storage and retrieval of data. However, you also learned that basic binary search trees have exceptionally bad worst case performance. There are three worst cases corresponding to data being input in ascending sorted order: Code: Diagram 1:
A
\
B
\
C
\
D
\
E
Code: Diagram 2:
E
/
D
/
C
/
B
/
A
Code: Diagram 3:
A
\
E
/
B
\
D
/
C
A performance of a basic binary search tree is O(log n) on average. The average case is when the data being inserted is sufficiently random to create what is called a balanced tree, where at any given node that is not a leaf, there are two choices of direction: Code: Diagram 4:
B
/ \
A D
/ \
C E
Let us look at this a different way. An average case binary search tree can be created if every item in the data has an equal probability of being the root item. This suggests the use of the root insertion routines we wrote in the last tutorial. At every point in the insertion, make a random selection of root insertion or basic insertion. The end result will be an average case tree. This algorithm is sound, but in this tutorial we will be going with a variation on binary search trees and heaps called a treap. The treap is simple to write, results in the average case with a high probability, and has the added benefit that we have both a binary search tree and a heap (two useful data structures) at our disposal. A treap follows a few rules: Code: Binary search tree rules: For a node u with a left child v and a right child w: v->data < u->data < w->data Heap rules: For a node u with a left child v and a right child w: w->priority < v->priority < u->priority Exercise 1: Can you think of any other combinations of input that would create a degenerate tree? Exercise 2: Verify that by meeting both the binary search tree conditions and the heap conditions, the result is an average case binary search tree. Exercise 3: Determine the probability of a sufficiently degenerate tree using a treap. Bonus Exercise: Write an insertion function that randomly uses root insertion or basic insertion at every node. This algorithm may be difficult because only an informal description is given above. Insertion Insertion into a treap is straightforward. With a new item that has a random priority, simply perform a normal binary search tree insertion and then walk back up the tree performing rotations until the heap condition is met. By using rotations we can guarantee that the binary tree condition will remain valid. The recursive algorithm to do this is trivial: Code: Source 1:
treap_insert ( T, n, k ) begin
if T == null then
T = n
else
dir = k > T.data
T.link[dir] = treap_insert ( T.link[dir], n, k )
if T.priority < T.link[dir].priority then
save = T.link[dir];
T.link[dir] = save.link[!dir];
save.link[!dir] = T;
T = save;
endif
return T
end
Code: Source 2:
struct pretree_node {
struct pretree_node *link[2];
int priority;
int data;
} NullItem = { &NullItem, &NullItem, RAND_MAX, 0 }, *Null = &NullItem;
The pretree type is the same as it has been (the size member has been removed for brevity): Code: Source 3:
struct pretree {
struct pretree_node *header;
};
Code: Source 4:
void pretree_insert (
struct pretree *tree,
struct pretree_node *new_item,
int search_key )
{
tree->header = insert_subtree ( tree->header, new_item, search_key );
}
Code: Source 5:
struct pretree_node *makenode ( int data )
{
struct pretree_node *rn = malloc ( sizeof *rn );
rn->data = data;
rn->priority = rand();
rn->link[0] = rn->link[1] = Null;
return rn;
}
Code: Source 6: pretree_insert ( &t, makenode ( i ), i ); Code: Source 7:
struct pretree_node *insert_subtree (
struct pretree_node *header,
struct pretree_node *new_item,
int search_key )
{
if ( header == Null )
header = new_item;
else {
int dir = search_key > header->data;
header->link[dir] = insert_subtree ( header->link[dir], new_item, search_key );
if ( header->priority < header->link[dir]->priority ) {
struct pretree_node *save = header->link[dir];
header->link[dir] = save->link[!dir];
save->link[!dir] = header;
header = save;
}
}
return header;
}
Code: Source 8:
if ( search_key < header->data ) {
header->link[0] = insert_subtree ( header->link[0], new_item, search_key );
if ( header->priority < header->link[0]->priority ) {
struct pretree_node *save = header->link[0];
header->link[0] = save->link[1];
save->link[1] = header;
header = save;
}
}
else {
header->link[1] = insert_subtree ( header->link[1], new_item, search_key );
if ( header->priority < header->link[1]->priority ) {
struct pretree_node *save = header->link[1];
header->link[1] = save->link[0];
save->link[0] = header;
header = save;
}
}
The internal workings of this algorithm may not be immediately obvious to anyone not familiar with recursion. The nonrecursive solution is more verbose, but it provides insight into what is really going on. It also has a striking similarity to the root insertion function from the last tutorial: Code: Source 9:
int pretree_insert(
struct pretree *tree,
struct pretree_node *new_item,
int search_key )
{
struct pretree_node *walk;
struct pretree_node *up[50];
int dir;
int top = 0;
/*
* Handle empty tree case
*/
if ( tree->header == Null ) {
tree->header = new_item;
return 1;
}
/*
* Search for an empty link
* Return failure if a duplicate is found
*/
walk = tree->header;
for ( ; ; ) {
if ( walk->data == search_key )
return 0;
dir = walk->data < search_key;
if ( walk->link[dir] == Null )
break;
up[top++] = walk;
walk = walk->link[dir];
}
/*
* Insert the new item
*/
walk->link[dir] = new_item;
/*
* Walk back up and rotate
*/
while ( new_item != tree->header ) {
if (walk->priority > new_item->priority)
break;
dir = new_item == walk->link[1];
walk->link[dir] = new_item->link[!dir];
new_item->link[!dir] = walk;
/* Notify the rest of the tree of changes */
if ( top > 0 && up[top - 1] != Null ) {
dir = ( walk == up[top - 1]->link[1] );
up[top - 1]->link[dir] = new_item;
}
/* Reseat header if needed */
if ( tree->header == walk )
tree->header = new_item;
/* Move up and start over */
walk = up[--top];
}
return 1;
}
Exercise 4: Rewrite the treap to use left and right pointers instead of the link array. Is the clarity gained worth the increase in code size? Exercise 5: What possible use could Null be? Try to guess what the reason for using a sentinel over a null pointer is. Exercise 6: Rewrite source 5 and 6 such that the code is more robust and better suited for production code. Exercise 7: Trace the execution of source 7. Verify that it does what you expect. Exercise 8: Rewrite source 4 and 7 to return an appropriate error code on any failure. Exercise 9: Compare source 9 with source 19 of the part I. Note the similarities. Bonus Exercise: Is rand the best way to find a good priority for our treap? Determine a more effective way of calculating random priorities. Removal Removal of a node is not as important as insertion, and many texts omit the code for it. While I can sympathize, I prefer to give at least a little working code whenever it would be of benefit. The algorithm for removing from a treap is exactly the opposite of insertion. Find the item and perform rotations (walking the item down the tree) until it is a leaf, then remove it. The algorithm is short: Code: Source 10:
treap_remove ( T, o, k ) begin
if T != Null then
if k < T.data then
T.link[0] = treap_remove ( T.link[0], o, k )
else if T.data < k then
T.link[1] = treap_remove ( T.link[1], o, k )
else
dir = T.link[0]->priority > T.link[1]->priority
save = T.link[dir]
T.link[dir] = save.link[!dir]
save.link[!dir] = T
T = save;
if T != Null then
T = treap_remove ( T, o, k )
else
o = T.link[1] # Save the old item
T.link[1] = Null;
endif
endif
endif
return T
end
Once again we need a helper function to prepare and finalize the recursive function: Code: Source 11:
void pretree_remove (
struct pretree *tree,
struct pretree_node **old_item,
int search_key )
{
tree->header = remove_subtree ( tree->header, old_item, search_key );
}
The recursive function in C/C++ is a straight translation of the pseudocode: Code: Source 12:
struct pretree_node *remove_subtree (
struct pretree_node *header,
struct pretree_node **old_item,
int search_key )
{
if ( header != Null ) {
if ( search_key < header->data )
header->link[0] = remove_subtree ( header->link[0], old_item, search_key );
else if ( header->data < search_key )
header->link[1] = remove_subtree ( header->link[1], old_item, search_key );
else {
int dir = header->link[0]->priority > header->link[1]->priority;
struct pretree_node *save = header->link[dir];
header->link[dir] = save->link[!dir];
save->link[!dir] = header;
header = save;
if ( header != Null )
header = remove_subtree ( header, old_item, search_key );
else {
*old_item = header->link[1];
header->link[1] = Null;
}
}
}
return header;
}
Code: Source 13:
if ( header != Null )
header = remove_subtree ( header, old_item, search_key );
else {
*old_item = header->link[1];
header->link[1] = Null;
}
Exercise 10: Trace the execution of source 12. Verify that it performs as expected. Exercise 11: Rewrite all source up to this point to use a null pointer instead of a sentinel node. Explain why I chose to use a sentinel for this tutorial. Exercise 12: Rewrite source 12 to use a nonrecursive solution. Exercise 13: Use a walk-down-rotation scheme for basic binary tree removal instead of deletion-by-copying. Explain whether or not this would be a suitable alternative to the deletion methods given in part I. Conclusion Binary search trees are a good start, but they have a nasty worst case that happens to be very common. By using a randomized algorithm to restructure the tree as data are inserted, we can avoid those worst cases. While I chose to describe only one variation of randomized binary search trees (the treap), there are many others. The similarity to the quicksort algorithm should not go unnoticed either. The basic quicksort is like the basic binary search tree: fast, but with a common worst case that degenerates the algorithm badly. With the benefit of randomization, both quicksort and binary search trees can minimize the probability of encountering a worst case situation. In the next installment, we will discuss ways of guaranteeing near optimal balance of a binary search tree at all times.
__________________ My best code is written with the delete key. |
| Prelude is offline |
| | #2 |
| Rad Join Date: Mar 2003
Posts: 942
| I'd be amazed if I understood even one word of it. :P
__________________ -Stephen Cope smc42190@gmail.com http://purevolume.com/step http://myspace.com/stephencope |
| gcn_zelda is offline |
| | #3 |
| Code Goddess Join Date: Sep 2001
Posts: 9,661
| >I'd be amazed if I understood even one word of it. :P That's why there's a part I, this is part II.
__________________ My best code is written with the delete key. |
| Prelude is offline |
| | #4 | |
| Rad Join Date: Mar 2003
Posts: 942
| Quote:
__________________ -Stephen Cope smc42190@gmail.com http://purevolume.com/step http://myspace.com/stephencope | |
| gcn_zelda is offline |
| | #5 |
| pronounced 'fib' Join Date: Aug 2002
Posts: 2,289
| I can't comment on the literary merits as I am not good for much aside from coding, but it is a good read. The worst case of a binary tree (or a quick sort) is something that programmers should be aware of, no question and I think you are doing a service in teaching it. I do commend you for the dedication you have to teaching what you know. I don't have that kind of patience. Where is this going to be posted by the way? I seem to remember a good way of weighting the branches of a binary tree based on how many nodes each branch had. This allowed you to "rotate" left or right depending on how well weighted you were. Or perhaps the difficulty of the rotation makes this one of the painful methods you spoke of.
__________________ "You are stupid! You are stupid! Oh, and don't forget, you are STUPID!" - Dexter Last edited by FillYourBrain; 03-21-2004 at 04:48 PM. |
| FillYourBrain is offline |
| | #6 |
| Unleashed Join Date: Sep 2001
Posts: 1,755
| > Where is this going to be posted by the way? Probably here - http://faq.cprogramming.com/cgi-bin/...ect=1073086407
__________________ The world is waiting. I must leave you now. |
| Shadow is offline |
| | #7 |
| Code Goddess Join Date: Sep 2001
Posts: 9,661
| >Where is this going to be posted by the way? Definitely in the FAQ once I get a final draft finished. >Or perhaps the difficulty of the rotation makes this one of the painful methods you spoke of. Not as much difficulty as inefficiency. Weight-balanced methods typically require more rotations on average than height-balanced solutions. The bookkeeping is also a real pain (worse than balance factors in an AVL tree!), thus more error prone. As far as my knowledge extends to weight-balanced trees, they don't offer enough advantage over height-balanced trees or randomized trees to warrant using them. You'll see this conclusion a lot. It's hard finding good text on them, much less a workable implementation. ![]() However, I would be remiss if I didn't mention that splay trees are close to weight-balanced trees. They maintain balance based on probability of access; more heavily searched items are bubbled up to the root. This is one defining feature of many weight-balanced algorithms, so the similarities are there. The biggest difference is that splay trees have an amortized performance guarantee instead of an optimized guarantee as height or weight-balanced algorithms. And the recursive code for splaying is really easy, so if you want to make the leap to saying that a splay tree is weight-balanced based on probability of access, it isn't all bad.
__________________ My best code is written with the delete key. |
| Prelude is offline |
| | #8 |
| Perverted Join Date: Oct 2001
Posts: 336
| Nice tutorial, but thats what I expected from you, I also like the nice diagrams, everything is better with diagrams. I have actually never heard of Prelude's Corner but it seems pretty cool. This wouldn't happen to be the corner that kids have to standin when they get in trouble, would it?
__________________ Give me a bad reputation!!! |
| unanimous is offline |
| | #9 |
| Code Goddess Join Date: Sep 2001
Posts: 9,661
| >This wouldn't happen to be the corner that kids have to standin when they get in trouble, would it? Yes, they're forced to do the exercises.
__________________ My best code is written with the delete key. |
| Prelude is offline |
| | #10 |
| l'Anziano Join Date: Aug 2001
Posts: 2,573
| Nice tutorial, Prelude. It is very good. I need to start writing tutorials myself to put on my own website....i'm thinking AVL trees is a good one... |
| DavidP is offline |
| | #11 |
| Code Goddess Join Date: Sep 2001
Posts: 9,661
| >i'm thinking AVL trees is a good one... I can give you some recursive insertion and deletion code if you want. Pretty standard, nothing tricky as long as you don't consider the entire algorithm tricky. ![]() I'm still toying with using it in my own AVL tutorial, but it just strikes me as too hackish, and I'm too lazy to write something more elegant. I'll probably end up using Knuth's insertion algorithm and leaving deletion as an exercise.
__________________ My best code is written with the delete key. |
| Prelude is offline |
| | #12 |
| l'Anziano Join Date: Aug 2001
Posts: 2,573
| deletion is horrid...i worked for a few hours trying to get my deletion code to work |
| DavidP is offline |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| My new website | joeprogrammer | General Discussions | 19 | 03-17-2006 07:38 PM |
| Please review my first tutorial | Stan100 | C++ Programming | 13 | 06-22-2005 03:06 PM |
| Cprog tutorial: Design Patterns | maes | C++ Programming | 7 | 10-11-2004 01:41 AM |
| Problem with tutorial (Vector class) | OdyTHeBear | C++ Programming | 4 | 12-18-2002 02:49 PM |
| My DirectInput tutorial.... | jdinger | A Brief History of Cprogramming.com | 1 | 06-18-2002 11:32 PM |