Originally Posted by
tabstop
This sounds like a problem for state machines. At any given time, you're in a certain "mode" (reading a number, reading a bracket, reading spaces, etc.) based on what you've seen so far. How you handle input depends on what mode you're in. (For instance, reading '4' when you're in the middle of a number would just add a 4 onto that number, but reading '4' when you've been reading spaces will start a new number, etc.)
Yes, this is true, although the actual implementation can be done easier than actually implementing this machine.
Take a function with the following prototype:
Code:
struct token get_token(const char **ptr);
Then, you could use this the following way:
Code:
const char *str = "1+2*3^4+5*6*(7+8*9)";
token cur_token;
while(1) {
cur_token = get_token(&str);
if(cur_token.type == TOK_END_OF_STRING)
break;
// Handle token here
}
Now, the implementation would go something like this:
- Skip any whitespace.
- Is the character ptr points to a number? If yes; read the entire number (everything that is a valid number) and return this token.
- Is the character ptr points to a +? If yes; read just that character and return this token. (In a C++ tokenizer this would also check for another + for the ++ operator).
- Is the character ptr points to a (? If yes; read just that character and return this token.
- Etc with the other characters
- Set *ptr to the last character that was unread and return the token info.
The token struct might be something like:
Code:
struct token {
enum token_type type;
const char *token_begin, *token_end;
};