Thread: Tokenizing an expression, strtok or other alternatives?

  1. #1
    Registered User
    Join Date
    Mar 2009
    Posts
    3

    Tokenizing an expression, strtok or other alternatives?

    Hello,

    This is my first post on this forum, hopefully It'll be according to the posting rules.

    For the small (school) project I'm currently working on I am trying to implement an infix-to-postfix translator (Reverse Polish Notation: RPN). T
    he reason behind this is that I want to be able to execute expressions on a CUDA-capable GPU (CUDA) which doesn't have support for recursion (so no expression trees etc).

    So for example I have an expression like: 3 + 4 * 2 / (1 - 5) ^ 2 ^ 3 and I want to translate that to the postfix notation like: 3 4 2 * 1 5 - 2 3 ^ ^ / +.

    The expression will come from a different program and these will include variable names as well, so something like 3 + (2 / x) ^ y ^ z but that is not what I am having problems with now (or, not yet..).

    Currently my program can deal with expressions like the ones above, but only because each element in the string literal is a token, so I can walk through the string doing something on each element. However, this does not work when values are floats or negative values. The expression 3.14 / -.2 + (2.5 * x) needs to be split into tokens in a whole different way.

    I am not experienced enough to find a method that can split such an expression in the way I would like, namely: '3.14', '/', '-.2', '+', '(', '2.5', '*', 'x', ')'.
    This data would then be stored in a struct holding both values and operators. I thought about using a simple Perl regex, but since I plan on distributing the program (later on) I don't feel much about asking people to also install Perl just to run it.

    I tried using strtok() for this, but since I can't just split on, say, spaces since I can't be sure that all elements are separated by spaces, especially when there are brackets in the expression. My question is really more about a direction I should go instead of complete working code examples, but if someone knows about an existing infix->postfix translator in plain C (couldn't find one!) that would be very great indeed!

    Thanks for your time,
    Marcel.

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    This sounds like a problem for state machines. At any given time, you're in a certain "mode" (reading a number, reading a bracket, reading spaces, etc.) based on what you've seen so far. How you handle input depends on what mode you're in. (For instance, reading '4' when you're in the middle of a number would just add a 4 onto that number, but reading '4' when you've been reading spaces will start a new number, etc.)

  3. #3
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Quote Originally Posted by tabstop View Post
    This sounds like a problem for state machines. At any given time, you're in a certain "mode" (reading a number, reading a bracket, reading spaces, etc.) based on what you've seen so far. How you handle input depends on what mode you're in. (For instance, reading '4' when you're in the middle of a number would just add a 4 onto that number, but reading '4' when you've been reading spaces will start a new number, etc.)
    Yes, this is true, although the actual implementation can be done easier than actually implementing this machine.

    Take a function with the following prototype:
    Code:
    struct token get_token(const char **ptr);
    Then, you could use this the following way:
    Code:
    const char *str = "1+2*3^4+5*6*(7+8*9)";
    token cur_token;
    while(1) {
    	cur_token = get_token(&str);
    	if(cur_token.type == TOK_END_OF_STRING)
    		break;
    
    	// Handle token here
    }
    Now, the implementation would go something like this:
    - Skip any whitespace.
    - Is the character ptr points to a number? If yes; read the entire number (everything that is a valid number) and return this token.
    - Is the character ptr points to a +? If yes; read just that character and return this token. (In a C++ tokenizer this would also check for another + for the ++ operator).
    - Is the character ptr points to a (? If yes; read just that character and return this token.
    - Etc with the other characters
    - Set *ptr to the last character that was unread and return the token info.

    The token struct might be something like:
    Code:
    struct token {
    	enum token_type type;
    	const char *token_begin, *token_end;
    };
    Last edited by EVOEx; 03-04-2009 at 07:04 AM. Reason: Proof-reading is for pussies

  4. #4
    Registered User
    Join Date
    Mar 2009
    Posts
    3

    Thumbs up

    Thank you very much for your replies, this is definitely something I can work with. It will take me some time to try and implement this though, but if I have any questions (or a working example) I'll post it here.

    Thanks again for this very fast and help full reply!!

    Marcel.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help with making a Math Expression DLL
    By MindWorX in forum C Programming
    Replies: 19
    Last Post: 07-19-2007, 11:37 PM
  2. Screwy Linker Error - VC2005
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 06-19-2007, 02:39 PM
  3. recursion error
    By cchallenged in forum C Programming
    Replies: 2
    Last Post: 12-18-2006, 09:15 AM
  4. Please Help - Problem with Compilers
    By toonlover in forum C++ Programming
    Replies: 5
    Last Post: 07-23-2005, 10:03 AM
  5. strtok tokenizing on spaces as well as my delimiter
    By snowblind37 in forum C++ Programming
    Replies: 2
    Last Post: 06-15-2004, 12:39 AM