What exactly don't you understand?
Tokenization:
Code:
word_start = std::find_if(word_end, line.end(), std::ptr_fun<int, int>(std::isalnum))
Finds the first character for which isalnum returns true - start of word.
Code:
word_end = std::find_if(word_start, line.end(), std::not1(std::ptr_fun<int, int>(std::isalnum)))
Finds the first character for which isalnum returns not true - start of delimiter sequence.
Probably the third arguments to find_if are hard to grasp. These consist of function objects (more precicely functions returning function objects) found in header <functional> (which I failed to include :)) which turn the standard isalnum function into a suitable function object.
Instead you could write simple function objects like this:
Code:
struct is_alnum
{
bool operator()(char c) const
{
return isalnum(c);
}
};
//used
word_start = std::find_if(word_end, line.end(), is_alnum());
Inserting into the index
Code:
index[std::string(word_start, word_end)].insert(line_num);
This does many things:
1) construct a string from the found iterator pair
2) use this string as an index into the map. map's operator[] either returns the item corresponding to the key if it exists, or first creates a new entry for this key and then returns the corresponding default-constructed value (in this case an empty set) - exactly what's needed.
3) then it calls the insert method on the returned set to add the line number (again, if the value already exists in the set, nothing is added).
Code:
//display what we got
This is a pretty ordinary way to loop over containers with iterators. The only thing to keep in mind that map holds a std::pair, which has two members: first (the key - in this case of string type) and second (the mapped type - in this case a set of unsigneds).
Thinking about it, perhaps using the set for line numbers is a bit of an overkill: the line numbers will be added in sorted order anyway, so we don't need one part of the functionality of the set, and because of that it wouldn't be hard to ensure uniqueness of mapped numbers either if we had a used a somewhat more memory-efficient vector - just look at the last value in the vector (if any) and see if the value to be added is different.