I'm trying to filter out a text so that I only have words with alphabetic characters left (no punctuation, no numbers, no words mixed with numbers, etc.). Now, here's the twist. I'm trying to make it take OS locale settings to filter the characters since I'm going to have to process some text in German and Portuguese.
So here's how I set it up:
Code:
std::locale loc = std::locale("de_DE.utf8"); // the german locale in ubuntu
const std::ctype<char>& ct = std::use_facet<std::ctype<char> >(loc);
std::string line;
while(input_file_stream >> line)
{
const char* const c = line.c_str();
std::string::size_type st = line.size();
if(ct.scan_not(std::ctype_base::alpha, c, c + st) == c + st)
dump_line_in_some_container;
Now when I hit a German word with an upper ascii character such as "erkläre", I would still expect that word to be added to my container. However, this word fails the conditional and the word is not added. So is there a way to take into account international settings while still using the STL? I could define my own filter table, but why do that when you have a standard?