Tokenizing an expression, strtok or other alternatives?

**_Marcel_** · 03-04-2009

Hello,

This is my first post on this forum, hopefully It'll be according to the posting rules.

For the small (school) project I'm currently working on I am trying to implement an infix-to-postfix translator (Reverse Polish Notation: RPN). T
he reason behind this is that I want to be able to execute expressions on a CUDA-capable GPU (CUDA) which doesn't have support for recursion (so no expression trees etc).

So for example I have an expression like: 3 + 4 * 2 / (1 - 5) ^ 2 ^ 3 and I want to translate that to the postfix notation like: 3 4 2 * 1 5 - 2 3 ^ ^ / +.

The expression will come from a different program and these will include variable names as well, so something like 3 + (2 / x) ^ y ^ z but that is not what I am having problems with now (or, not yet..).

Currently my program can deal with expressions like the ones above, but only because each element in the string literal is a token, so I can walk through the string doing something on each element. However, this does not work when values are floats or negative values. The expression 3.14 / -.2 + (2.5 * x) needs to be split into tokens in a whole different way.

I am not experienced enough to find a method that can split such an expression in the way I would like, namely: '3.14', '/', '-.2', '+', '(', '2.5', '*', 'x', ')'.
This data would then be stored in a struct holding both values and operators. I thought about using a simple Perl regex, but since I plan on distributing the program (later on) I don't feel much about asking people to also install Perl just to run it.

I tried using strtok() for this, but since I can't just split on, say, spaces since I can't be sure that all elements are separated by spaces, especially when there are brackets in the expression. My question is really more about a direction I should go instead of complete working code examples, but if someone knows about an existing infix->postfix translator in plain C (couldn't find one!) that would be very great indeed!

Thanks for your time,
Marcel.

**tabstop** · 03-04-2009

This sounds like a problem for state machines. At any given time, you're in a certain "mode" (reading a number, reading a bracket, reading spaces, etc.) based on what you've seen so far. How you handle input depends on what mode you're in. (For instance, reading '4' when you're in the middle of a number would just add a 4 onto that number, but reading '4' when you've been reading spaces will start a new number, etc.)

**EVOEx** · 03-04-2009

Originally Posted by tabstop

This sounds like a problem for state machines. At any given time, you're in a certain "mode" (reading a number, reading a bracket, reading spaces, etc.) based on what you've seen so far. How you handle input depends on what mode you're in. (For instance, reading '4' when you're in the middle of a number would just add a 4 onto that number, but reading '4' when you've been reading spaces will start a new number, etc.)

Yes, this is true, although the actual implementation can be done easier than actually implementing this machine.

Take a function with the following prototype:

Code:

struct token get_token(const char **ptr);

Then, you could use this the following way:

Code:

const char *str = "1+2*3^4+5*6*(7+8*9)";
token cur_token;
while(1) {
	cur_token = get_token(&str);
	if(cur_token.type == TOK_END_OF_STRING)
		break;

	// Handle token here
}

Now, the implementation would go something like this:
- Skip any whitespace.
- Is the character ptr points to a number? If yes; read the entire number (everything that is a valid number) and return this token.
- Is the character ptr points to a +? If yes; read just that character and return this token. (In a C++ tokenizer this would also check for another + for the ++ operator).
- Is the character ptr points to a (? If yes; read just that character and return this token.
- Etc with the other characters
- Set *ptr to the last character that was unread and return the token info.

The token struct might be something like:

Code:

struct token {
	enum token_type type;
	const char *token_begin, *token_end;
};

**_Marcel_** · 03-04-2009

Thank you very much for your replies, this is definitely something I can work with. It will take me some time to try and implement this though, but if I have any questions (or a working example) I'll post it here.

Thanks again for this very fast and help full reply!!

Marcel.

Thread: Tokenizing an expression, strtok or other alternatives?

Thread Tools

Search Thread

Display

Tokenizing an expression, strtok or other alternatives?

Similar Threads

Help with making a Math Expression DLL

Screwy Linker Error - VC2005

recursion error

Please Help - Problem with Compilers

strtok tokenizing on spaces as well as my delimiter