I’ve been having trouble understanding what Chapter 28, Exercise 3 of Jumping into C++ is asking of me. The description is as follows (warning: it’s rather long):

Create a simple XML parser. XML is a basic formatting language, similar to HTML. The document is a
tree structure of nodes, of the form <node>[data]</node>, where [data] is either some text or another
nested node. XML nodes may have attributes, of the form <node attribute=”value”></node>. (The true
XML specification includes many more details, but that would be a lot more work to implement.) Your
parser should accept an interface class with several methods that it calls when something interesting
happens:

1) Whenever a node is read, it calls a method nodeStart, with the name of the node.

2) Whenever an attribute is read, it calls a method, attributeRead; this method should always
be called immediately after the nodeStart method for the node with which the attribute is
associated.

3) Whenever a node has body text, call nodeTextRead, with the content of the text, as a string. If
you have a situation like this <node>text<sub-node>text</sub-node>more text</node>, there
should be separate calls to nodeTextRead for the text before to the sub- node and the text
after the sub-node.

4) Whenever an end-node is read, call nodeEnd, with the name of the node.

5) You may treat any < or > character as the start of a node. If an XML author wants < or > to
appear in the text, it should be written as &lt; or &gt; (for less-than and greater-than). Since
ampersands must also be escaped, they must appear as &amp;. You do not need to perform
translation of &lt; and &gt; or &amp; in your code, however.

Here are a few example XML documents for you to use as input test data:
<address-book>
<entry>
<name>Alex Allain</name>
<email>[email protected]</email>
</entry>
<entry>
<name>Joe Doe</name>
<email>[email protected]</email>
</entry>
</address-book>
And
<html>
<head>
<title>Doc title</title>
</head>
<body>This is a nice <a href="http://www.cprogramming.com">link</a> to
a website.</body>
</html>

To test that your parser is working correctly, you can write a piece of code that displays each element of
the file as it is parsed, and validate that it gets the elements that you expect. Or you can implement the
next exercise, which will show an example of your parser in use.


When it says “when a node is read, call nodeStart”, does it mean reading in the tag for the start of a node, or a whole node? If it’s the former, then the part about calling nodeEnd makes sense. But then, the part about calling attributeRead immediately after calling nodeStart wouldn’t make sense; I’m supposed to call attributeRead immediately when I read an attribute, which would never happen; I’d have to read in a bunch of characters first.

It also wouldn’t make sense if I’m supposed to call nodeStart after reading in a whole node, since an attribute wouldn’t necessarily follow it.

I’m also confused as to what these functions are supposed to do, exactly. My guess was that they just output what was read (with maybe some formatting), but that also conflicts with the issues I mentioned above. :/

I’d really appreciate help with this.