Thread: Custom Tree Iterator design help

  1. #1
    Use this: dudeomanodude's Avatar
    Join Date
    Jan 2008
    Location
    Hampton, VA
    Posts
    391

    Custom Tree Iterator design help

    I'd like to write my own iterator for a 234-tree, but I'm a little confused about how to go about this.

    The purpose of having an iterator for it is particularly useful for searching the tree. If you have a Tree_iterator then a user can do more than just display the path it took to get to a particular Key value, like put the information in any data-type they wish (being able to put it into a list, vector, queue, etc. ).

    The problem I'm having is that each Node has a vector in it. This is what Node looks like:
    Code:
    struct Node{
    		
    			std::vector<T> key;
    			
    			Node* parent;
    			Node* one;
    			Node* two;
    			Node* three;
    			Node* four;
    
    			Node( const T& d, Node* p ) 
    			: parent( p ), one( 0 ), two( 0 ), three( 0 ), four( 0 ) {
    					
    				key.push_back( d );
    			};
    			
    			private:
    			
    				/*  NODE IS NON-COPYABLE:
    				*/
    				Node( const Node& rhs );
    				Node& operator=( const Node& rhs );
    
    
    		}; /* END STURCT NODE */
    So far this is what I've got for the Tree_iterator:
    Code:
    class Tree_iterator{
    			
    			public:
    			
    				Tree_iterator( Node* c )
    				: curr( c ), position( c->key.begin() ){};
    				
    				const T& operator*() const { return *position; };
    				T& operator*(){ return *position; };
    				
    				bool operator!=( const Tree_iterator& i ) { return i.position != curr.position; };
                    bool operator==( const Tree_iterator& i ) { return i.position == curr.position; };
                    bool operator<( const Tree_iterator& i ) { return i.position < curr.position; };
                    
                    Tree_iterator& operator++( ){ position = ++position; return *this; };
    			
    			private:
    			
    				Node* curr;
    				typename std::vector<T>::iterator position; 
    			
    			
    		}; /* END CLASS TREE_ITERATOR */
    The way it is above, every time I want to go down a level of the tree, I have to re-assign the iterator, something like this:
    Code:
    Tree_iterator it = begin();
    
    // Do something with the iterator, then when we have to go down a level:
    it = another_Node;
    
    // Where "another_Node is a Node* that also must be incremented in some fashion.
    However this is not the behavior I'd like the iterator to have. I'd like for when a user does this:
    Code:
    ++a_tree_iterator;
    It checks "position", whether its less then or greater than, and navigates not only the internal vector of keys, but will also go to the appropriate Node* (one,two,three or four)

    Any ideas?

    Also, begin() is very straight forward, it's implemented like this:
    Code:
    const Tree_iterator begin() const { return Tree_iterator( root ); };
    where root is a Node* to the highest level Node of the tree.

    but how might you write an end() function? By the very nature of a tree, end() should have a particular key value that you're searching for and also some terminating value that signifies you've reached the end of the tree and haven't found the value. The fact that end() would have to fulfill both of these things is a problem for me.

    Is there a way to write the end() in this manner, or can any of you suggest a better approach to this problem?

    Thanks in advance!
    Ubuntu Desktop
    GCC/G++
    Geany (for quick projects)
    Anjuta (for larger things)

  2. #2
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by dudeomanodude View Post
    but how might you write an end() function? By the very nature of a tree, end() should have a particular key value that you're searching for and also some terminating value that signifies you've reached the end of the tree and haven't found the value. The fact that end() would have to fulfill both of these things is a problem for me.

    Is there a way to write the end() in this manner, or can any of you suggest a better approach to this problem?
    There is no reason the iterator should contain a value -- it need only point at the corresponding node. There is no need for end() to contain anything in particular, since it is not dereferenceable anyway. I usually deal with end() by having the iterator contain a bool which simply indicates whether it is the "end iterator" or not.

    That implies that all end() iterators, even those from different trees, will compare equal to each other, but that's fine. The standard says it's undefined to compare iterators from two different containers, anyway.

  3. #3
    Use this: dudeomanodude's Avatar
    Join Date
    Jan 2008
    Location
    Hampton, VA
    Posts
    391
    There is no reason the iterator should contain a value -- it need only point at the corresponding node.
    So, I'm guessing from this statement that dereferencing the Tree_iterator should return the vector that the Node contains?

    like this:
    Code:
    const T& operator*() const { return curr->key; };
    T& operator*() { return curr->key; };
    Is that what you're saying? And that the Tree_iterator should also only contain a Node* and not worry about having the iterator? like this:
    Code:
    class Tree_iterator{
    
        public:
    
            Tree_iterator( Node* c ) : curr( c ){};
    
            // overloaded functions
    
        private:
    
             Node* curr;
    };
    ???

    Also, how would I increment the Tree_iterator then since it has 2 - 4 options when going to the next Node down the tree?
    Ubuntu Desktop
    GCC/G++
    Geany (for quick projects)
    Anjuta (for larger things)

  4. #4
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Before implementing an iterator, you need to consider these points:
    1) What is the value type of the container elements? For example, for std::vector<T>, the value type is T. For std::map<K, V>, the value type is std:air<const K, V> - a pair consisting of the key (unmodifiable) and the value of the entry. What do you consider an element of your 234 tree, and what is the type of that element? (It's just a B tree, so the the type ought to be std:air<const K, V> - or, if your tree is more set-like, simply a T.)
    2) What is the iteration order? std::vector iterates in index order. std::map iterates in key sort order. std::tr1::unordered_map iterates in unspecified order, as becomes a hash map. Since you're implementing something like a std::set/map, you should iterate in key sort order.
    3) What sort of iteration can I support? Do I support resetting the position (forward iteration)? For containers, the answer to this should always be yes. Do I support going backwards (bidirectional iteration)? For the tree, the answer should be yes, too. Do I support random jumps in constant time (random access iteration)? The answer to that one is no.

    For simplicity, let's assume your interface is a set, not a map. You'll want bidirectional iterators over T in T's sort order.

    The first thing to implement for an iterator is always incrementation. It's the most complex. (Except for output iterators, but they're different.)
    What's the order of the elements in the 234 tree? I'm not sure of the details, but it it can be either value-subnode-value or subnode-value-subnode. Let's assume the first. So when you have a node, the first value you return is the first value of the node. Then you reach the subnode, so you go down into that and return its first value. Recurse. Eventually, the subtree will be exhausted, and you can return the next value. Then there's another subtree. And so on.

    The tricky part is that the iterator must hold a stack of the indexes it's at in the nodes it has partially traversed. Well, no, it doesn't have to. There's also the option of comparing pointers. In a 234 tree, that's probably more efficient. Basically, when you've exhausted a node, you follow its parent pointer. Then you need to find the place in the parent node where you've been. You can do that simply by comparing the pointer of the node you've just been to one, two, three and four. (You might want to make these an array, by the way.)
    Last edited by CornedBee; 04-09-2008 at 11:22 AM.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #5
    Use this: dudeomanodude's Avatar
    Join Date
    Jan 2008
    Location
    Hampton, VA
    Posts
    391
    Thank you CornedBee,

    1) What is the value type of the container elements? For example, for std::vector<T>, the value type is T. For std::map<K, V>, the value type is std:air<const K, V> - a pair consisting of the key (unmodifiable) and the value of the entry. What do you consider an element of your 234 tree, and what is the type of that element? (It's just a B tree, so the the type ought to be std:air<const K, V> - or, if your tree is more set-like, simply a T.)
    I'm not sure I understand this part of your reply.

    The first thing to implement for an iterator is always incrementation. It's the most complex.
    Agreed. This is indeed very complex since my intuition tells me that a useful iterator for this tree must not only iterate through the key ( std::vector<T> key in my case ) but also down the tree to the next node ( one, two, three, or four ).

    The tricky part is that the iterator must hold a stack of the indexes it's at in the nodes it has partially traversed. Well, no, it doesn't have to. There's also the option of comparing pointers. In a 234 tree, that's probably more efficient. Basically, when you've exhausted a node, you follow its parent pointer. Then you need to find the place in the parent node where you've been. You can do that simply by comparing the pointer of the node you've just been to one, two, three and four. (You might want to make these an array, by the way.)
    Looking over the source of other 234-trees, I do see the use of an array, however i rely on key.size() for much of the condition checking in my insertion/promotion functions. So I'm not really sure how to implement with an array...

    For example:
    If I use an array of T's (which has a predefined size of 3 ), what if it has only one T value? How do "invalidate" or "nullify" the array at the other two elements? That's what confuses me about the array design.
    Ubuntu Desktop
    GCC/G++
    Geany (for quick projects)
    Anjuta (for larger things)

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    I'm not sure I understand this part of your reply.
    Well, what do you intend to store in this tree? Just values that you can check for existence, or do you have clearly distinct keys and values?

    So I'm not really sure how to implement with an array...
    Instead of
    Code:
    node *one, *two, *three, *four;
    you simply have
    Code:
    node *subs[4];
    Whenever you have variables named after numbers, that's a very strong sign that you should be using an array instead.

    The values can stay in a vector. Its resizing is indeed quite useful in this case.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #7
    Use this: dudeomanodude's Avatar
    Join Date
    Jan 2008
    Location
    Hampton, VA
    Posts
    391
    Well, what do you intend to store in this tree? Just values that you can check for existence, or do you have clearly distinct keys and values?
    Perhaps the name I gave for the vector was misleading... they are indeed simply values that can be checked for existence as you put it.

    The values can stay in a vector. Its resizing is indeed quite useful in this case.
    I whole-heartedly agree! 234-trees are pretty darned confusing in and of themselves, certainly adding the complexity of dynamic arrays is not something I'd like to tackle! Now that I think about it, the other 234-Tree sources I looked at were in Java, and I imagine an array of objects in Java is quite different than a C++ dynamic array (though I can't really claim to know the difference between Java and C++ arrays...)

    So like you said in your first reply, I need to implement the complex increment operator, but that's where I'm having issues. I want users to be able to use the iterator like this:
    Code:
    Tree_iterator it = begin() // initialized to root->key.begin()
     while( it != some_condition ){
    
        *it; // stick the value T into some container or display it, or whatever.
    
        ++it;
    }
    That's it. That's all I want to do. And so what's difficult is that within the increment operator I can say things like:
    Code:
    Tree_iterator& operator++(){
    
        if( curr->one != 0 ){ // indicates you're at a bottom level Node
    
            if( *position < curr->key[0] ){
    
                position = curr->one->key.begin(); // Takes you to the next Node at curr->one
            }
    
            else if( *position > *( ++position ) ){ // dereferenced value less than the "next" value in the vector
    
                position = ++position;
            }
    
            // etc, etc, etc.
    
    }
    
    return *this;
    };
    I realize that's not complete, and I don't know how to get around saying something like:
    Code:
    else if( *position > *( ++position ) )
    but you get the idea.

    It's the actual "path" of "values" that the iterator follows, that I want to get at. I could easily write a search( const T& d ) function that returns a list or vector of those values without messing around with a custom iterator, but I'd rather have an intuitive iterator that user's could use to stick that path information into whatever container/function they might find useful.

    So my requirement is that it is able to "return" (through dereference) each "value" it visits and checks it's (T's) operator< against.
    Ubuntu Desktop
    GCC/G++
    Geany (for quick projects)
    Anjuta (for larger things)

  8. #8
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    I imagine it would need to look something like this:
    Code:
    class Tree_iterator
    {
      node *m_node;
      int m_idx;
    
      void increment()
      {
        if(m_idx == m_node->keys.size() - 1) {
          // That was the last key in the node, go back up.
          node *oldnode = m_node;
          m_node = m_node->parent;
          for(m_idx = 0; m_node->subs[m_idx] != oldnode; ++m_idx) {
            // This loop finds the index of the child we were at.
          }
          // But we want the next one.
          ++m_idx;
        } else {
          // Not the last, go down the subtree.
          m_node = m_node->subs[m_idx];
          m_idx = 0;
        }
      }
    
      T &dereference()
      {
        return m_node->keys[m_idx];
      }
    };
    This works for forward iteration. Bidirectional iteration is more complicated, because the end iterator is more complex.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  9. #9
    Use this: dudeomanodude's Avatar
    Join Date
    Jan 2008
    Location
    Hampton, VA
    Posts
    391
    So are you suggesting I keep the Node pointers ( one, two, three and four ) in an array as denoted by:
    Code:
    m_node->subs[]
    in your example?

    So the Node class should look like:
    Code:
    class Node{
    
      private:
    
        std::vector<T> key;
        Node* subs[4];
    
        // etc.
    
    };
    I'm guessing that's what you're saying and haha, I guess I didn't listen correctly to one of your previous replies where you told me to store them in an array. Duh!

    Well heck, that's really cool andit should be fun playing around with that.

    Muchos gracias CornedBee!
    Ubuntu Desktop
    GCC/G++
    Geany (for quick projects)
    Anjuta (for larger things)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Interpreter.c
    By moussa in forum C Programming
    Replies: 4
    Last Post: 05-28-2008, 05:59 PM
  2. Binary Tree, couple questions
    By scoobasean in forum C Programming
    Replies: 3
    Last Post: 03-12-2005, 09:09 PM
  3. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  4. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 10:33 AM
  5. BST/Red and Black Tree
    By ghettoman in forum C++ Programming
    Replies: 0
    Last Post: 10-24-2001, 10:45 PM