-
Strings in C++
I come from a background of Java. It's sort of my "native language", since I learned it first in college. In Java, strings are beatiful, flexible, graceful things that are ever so easy to manipulate.
I really hate worrying about having an array of characters. I don't like working with them at all. I much prefer the thought of just having "a string". So I immediately hopped into the C++ string.
...and was dissapointed. Is this really all the string functions (I know that there are more functions that work with character arrays, but I'd really prefer to use simply "string") that there are? A lot of the character array functions didn't even carry over, such as converting to lower case and getting tokens. Sure, one could convert to lower case in two simple lines (or one, if you want it ugly), but it gets annoying to write my own tokenizer function.
Is this really all there is, or is the list incomplete?
http://www.cppreference.com/cppstring.html
Just wondering.
-
You could look into the string header file, although I do not know anything else besides it being able to creat a string variable.
-
This is where the C++ multi-paradigm approach can get a bit messy. Functions such as toupper, tolower are said not to belong to a string object but as part of a locale (there is a set of template functions for manipulating strings using different char sets in cctype). Also, as the string class is part of the STL; there's a set of generic functions for calling global functions on an object. Using tolower would look something like -
Code:
#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
using namespace std;
int main()
{
string name = "JOE";
transform(name.begin(),name.end(),name.begin(),tolower);
cout << name;
return 0;
}
-
Yikes. That can sure look confusing. What I've done is written a bunch of "utility" functions that I can call without having to think about it, because they work in the background. I'll stick them in a header so I can use them in other programs I write.
Examples
Code:
//initializes the tokenizer
string tokenize(string str1, const char *str2)
{
string answer = strtok((char*)str1.c_str(), str2);
return answer;
}
//finds the next token in the tokenizer
string tokenize(const char *str2)
{
string answer = strtok(NULL, str2);
return answer;
}
//returns the number of tokens in a string
int tokensIn(string str)
{
int count = 1;
if(str.empty())
return 0;
for(int i = 0; i < str.size(); i++)
if(str.at(i) == ' ')
count++;
return count;
}
//converts string to lower case
void stringToLower(string &str)
{
for(int i = 0; i < str.size(); i++)
str.at(i) = tolower(str.at(i));
}
-
don't cast away const, particularly a const you did not write yourself. Your tokenizer can break in a very subtle manner as str1's destructor is called while strtok's static char* still points to it's buffer. If I may suggest.
Code:
std::string extract_token(std::string &str, const std::string &sep=" \t") {
typedef std::string::size_type pos_t;
pos_t start = str.find_first_not_of(sep);
if(start == std::string::npos) {
str.clear();
return "";
}
pos_t end = str.find_first_of(sep,start); // returns npos on failure
std::string token = str.substr(start,end-start); // if end==npos end-start is huge
str.erase(0,end);
return token;
}
this takes a string and a string of seperator characters and returns the first token and strips the first token off the string you passed it. This is a lot more usefull than strtok() as you can mix extract_token() calls to different strings independantly, rather than being sure you have called strtok(NULL) as often as you are ever going to need to.
The other two handy parsing tricks to know about are stringstream, because many tokens are numbers, and the boost
librarys in particular for regex++, though they also have a fancy tokenizer.
-
Two more nit's, `cause I just can't shut up, tokensIn does not count tokens, it counts spaces. Second while I applaud paranoid programming str.at(i) performs a pointless bounds check. str[i] is mildly faster, anytime you use at() you really should also have the try catch block in place. Although obviously its much better to use .at() too often then not enough:)
-
Yeah, I noticed that one a while ago on that part with the token counter. When I slapped it together, I overlooked that I needed a bit more code. I'll have to make sure I rewrite those helpful little functions before they cause trouble.
And thanks for pointing out that problem with my tokenizer. I hadn't noticed that one.