# Thread: Custom Tree Iterator design help

1. ## Custom Tree Iterator design help

The purpose of having an iterator for it is particularly useful for searching the tree. If you have a Tree_iterator then a user can do more than just display the path it took to get to a particular Key value, like put the information in any data-type they wish (being able to put it into a list, vector, queue, etc. ).

The problem I'm having is that each Node has a vector in it. This is what Node looks like:
Code:
struct Node{

std::vector<T> key;

Node* parent;
Node* one;
Node* two;
Node* three;
Node* four;

Node( const T& d, Node* p )
: parent( p ), one( 0 ), two( 0 ), three( 0 ), four( 0 ) {

key.push_back( d );
};

private:

/*  NODE IS NON-COPYABLE:
*/
Node( const Node& rhs );
Node& operator=( const Node& rhs );

}; /* END STURCT NODE */
So far this is what I've got for the Tree_iterator:
Code:
class Tree_iterator{

public:

Tree_iterator( Node* c )
: curr( c ), position( c->key.begin() ){};

const T& operator*() const { return *position; };
T& operator*(){ return *position; };

bool operator!=( const Tree_iterator& i ) { return i.position != curr.position; };
bool operator==( const Tree_iterator& i ) { return i.position == curr.position; };
bool operator<( const Tree_iterator& i ) { return i.position < curr.position; };

Tree_iterator& operator++( ){ position = ++position; return *this; };

private:

Node* curr;
typename std::vector<T>::iterator position;

}; /* END CLASS TREE_ITERATOR */
The way it is above, every time I want to go down a level of the tree, I have to re-assign the iterator, something like this:
Code:
Tree_iterator it = begin();

// Do something with the iterator, then when we have to go down a level:
it = another_Node;

// Where "another_Node is a Node* that also must be incremented in some fashion.
However this is not the behavior I'd like the iterator to have. I'd like for when a user does this:
Code:
++a_tree_iterator;
It checks "position", whether its less then or greater than, and navigates not only the internal vector of keys, but will also go to the appropriate Node* (one,two,three or four)

Any ideas?

Also, begin() is very straight forward, it's implemented like this:
Code:
const Tree_iterator begin() const { return Tree_iterator( root ); };
where root is a Node* to the highest level Node of the tree.

but how might you write an end() function? By the very nature of a tree, end() should have a particular key value that you're searching for and also some terminating value that signifies you've reached the end of the tree and haven't found the value. The fact that end() would have to fulfill both of these things is a problem for me.

Is there a way to write the end() in this manner, or can any of you suggest a better approach to this problem?

2. Originally Posted by dudeomanodude
but how might you write an end() function? By the very nature of a tree, end() should have a particular key value that you're searching for and also some terminating value that signifies you've reached the end of the tree and haven't found the value. The fact that end() would have to fulfill both of these things is a problem for me.

Is there a way to write the end() in this manner, or can any of you suggest a better approach to this problem?
There is no reason the iterator should contain a value -- it need only point at the corresponding node. There is no need for end() to contain anything in particular, since it is not dereferenceable anyway. I usually deal with end() by having the iterator contain a bool which simply indicates whether it is the "end iterator" or not.

That implies that all end() iterators, even those from different trees, will compare equal to each other, but that's fine. The standard says it's undefined to compare iterators from two different containers, anyway.

3. There is no reason the iterator should contain a value -- it need only point at the corresponding node.
So, I'm guessing from this statement that dereferencing the Tree_iterator should return the vector that the Node contains?

like this:
Code:
const T& operator*() const { return curr->key; };
T& operator*() { return curr->key; };
Is that what you're saying? And that the Tree_iterator should also only contain a Node* and not worry about having the iterator? like this:
Code:
class Tree_iterator{

public:

Tree_iterator( Node* c ) : curr( c ){};

private:

Node* curr;
};
???

Also, how would I increment the Tree_iterator then since it has 2 - 4 options when going to the next Node down the tree?

4. Before implementing an iterator, you need to consider these points:
1) What is the value type of the container elements? For example, for std::vector<T>, the value type is T. For std::map<K, V>, the value type is std:air<const K, V> - a pair consisting of the key (unmodifiable) and the value of the entry. What do you consider an element of your 234 tree, and what is the type of that element? (It's just a B tree, so the the type ought to be std:air<const K, V> - or, if your tree is more set-like, simply a T.)
2) What is the iteration order? std::vector iterates in index order. std::map iterates in key sort order. std::tr1::unordered_map iterates in unspecified order, as becomes a hash map. Since you're implementing something like a std::set/map, you should iterate in key sort order.
3) What sort of iteration can I support? Do I support resetting the position (forward iteration)? For containers, the answer to this should always be yes. Do I support going backwards (bidirectional iteration)? For the tree, the answer should be yes, too. Do I support random jumps in constant time (random access iteration)? The answer to that one is no.

For simplicity, let's assume your interface is a set, not a map. You'll want bidirectional iterators over T in T's sort order.

The first thing to implement for an iterator is always incrementation. It's the most complex. (Except for output iterators, but they're different.)
What's the order of the elements in the 234 tree? I'm not sure of the details, but it it can be either value-subnode-value or subnode-value-subnode. Let's assume the first. So when you have a node, the first value you return is the first value of the node. Then you reach the subnode, so you go down into that and return its first value. Recurse. Eventually, the subtree will be exhausted, and you can return the next value. Then there's another subtree. And so on.

The tricky part is that the iterator must hold a stack of the indexes it's at in the nodes it has partially traversed. Well, no, it doesn't have to. There's also the option of comparing pointers. In a 234 tree, that's probably more efficient. Basically, when you've exhausted a node, you follow its parent pointer. Then you need to find the place in the parent node where you've been. You can do that simply by comparing the pointer of the node you've just been to one, two, three and four. (You might want to make these an array, by the way.)

5. Thank you CornedBee,

1) What is the value type of the container elements? For example, for std::vector<T>, the value type is T. For std::map<K, V>, the value type is std:air<const K, V> - a pair consisting of the key (unmodifiable) and the value of the entry. What do you consider an element of your 234 tree, and what is the type of that element? (It's just a B tree, so the the type ought to be std:air<const K, V> - or, if your tree is more set-like, simply a T.)

The first thing to implement for an iterator is always incrementation. It's the most complex.
Agreed. This is indeed very complex since my intuition tells me that a useful iterator for this tree must not only iterate through the key ( std::vector<T> key in my case ) but also down the tree to the next node ( one, two, three, or four ).

The tricky part is that the iterator must hold a stack of the indexes it's at in the nodes it has partially traversed. Well, no, it doesn't have to. There's also the option of comparing pointers. In a 234 tree, that's probably more efficient. Basically, when you've exhausted a node, you follow its parent pointer. Then you need to find the place in the parent node where you've been. You can do that simply by comparing the pointer of the node you've just been to one, two, three and four. (You might want to make these an array, by the way.)
Looking over the source of other 234-trees, I do see the use of an array, however i rely on key.size() for much of the condition checking in my insertion/promotion functions. So I'm not really sure how to implement with an array...

For example:
If I use an array of T's (which has a predefined size of 3 ), what if it has only one T value? How do "invalidate" or "nullify" the array at the other two elements? That's what confuses me about the array design.

Well, what do you intend to store in this tree? Just values that you can check for existence, or do you have clearly distinct keys and values?

So I'm not really sure how to implement with an array...
Code:
node *one, *two, *three, *four;
you simply have
Code:
node *subs[4];
Whenever you have variables named after numbers, that's a very strong sign that you should be using an array instead.

The values can stay in a vector. Its resizing is indeed quite useful in this case.

7. Well, what do you intend to store in this tree? Just values that you can check for existence, or do you have clearly distinct keys and values?
Perhaps the name I gave for the vector was misleading... they are indeed simply values that can be checked for existence as you put it.

The values can stay in a vector. Its resizing is indeed quite useful in this case.
I whole-heartedly agree! 234-trees are pretty darned confusing in and of themselves, certainly adding the complexity of dynamic arrays is not something I'd like to tackle! Now that I think about it, the other 234-Tree sources I looked at were in Java, and I imagine an array of objects in Java is quite different than a C++ dynamic array (though I can't really claim to know the difference between Java and C++ arrays...)

So like you said in your first reply, I need to implement the complex increment operator, but that's where I'm having issues. I want users to be able to use the iterator like this:
Code:
Tree_iterator it = begin() // initialized to root->key.begin()
while( it != some_condition ){

*it; // stick the value T into some container or display it, or whatever.

++it;
}
That's it. That's all I want to do. And so what's difficult is that within the increment operator I can say things like:
Code:
Tree_iterator& operator++(){

if( curr->one != 0 ){ // indicates you're at a bottom level Node

if( *position < curr->key[0] ){

position = curr->one->key.begin(); // Takes you to the next Node at curr->one
}

else if( *position > *( ++position ) ){ // dereferenced value less than the "next" value in the vector

position = ++position;
}

// etc, etc, etc.

}

return *this;
};
I realize that's not complete, and I don't know how to get around saying something like:
Code:
else if( *position > *( ++position ) )
but you get the idea.

It's the actual "path" of "values" that the iterator follows, that I want to get at. I could easily write a search( const T& d ) function that returns a list or vector of those values without messing around with a custom iterator, but I'd rather have an intuitive iterator that user's could use to stick that path information into whatever container/function they might find useful.

So my requirement is that it is able to "return" (through dereference) each "value" it visits and checks it's (T's) operator< against.

8. I imagine it would need to look something like this:
Code:
class Tree_iterator
{
node *m_node;
int m_idx;

void increment()
{
if(m_idx == m_node->keys.size() - 1) {
// That was the last key in the node, go back up.
node *oldnode = m_node;
m_node = m_node->parent;
for(m_idx = 0; m_node->subs[m_idx] != oldnode; ++m_idx) {
// This loop finds the index of the child we were at.
}
// But we want the next one.
++m_idx;
} else {
// Not the last, go down the subtree.
m_node = m_node->subs[m_idx];
m_idx = 0;
}
}

T &dereference()
{
return m_node->keys[m_idx];
}
};
This works for forward iteration. Bidirectional iteration is more complicated, because the end iterator is more complex.

9. So are you suggesting I keep the Node pointers ( one, two, three and four ) in an array as denoted by:
Code:
m_node->subs[]

So the Node class should look like:
Code:
class Node{

private:

std::vector<T> key;
Node* subs[4];

// etc.

};
I'm guessing that's what you're saying and haha, I guess I didn't listen correctly to one of your previous replies where you told me to store them in an array. Duh!

Well heck, that's really cool andit should be fun playing around with that.

Muchos gracias CornedBee!