Thread: Parsing mathematical function to tree structure

  1. #1
    Registered User
    Join Date
    Oct 2004
    Posts
    19

    Parsing mathematical function to tree structure

    Background Banter:
    As per salem's advice earlier today ive learnt how to convert from infix to postfix using a stack, and how to then evaluate that RPN expression.

    My next question is how to do this using a tree, ive done quite a bit of reading and have figured out how to evaluate an expression already parsed into the tree, but I cant find a resource, book or otherwise documenting how to parse an infix expression into a tree structure. Generally speaking im a little tired and may have missed something but im normally thorough with searching.

    Question:

    1 - With an algebraic, possibly containing complex arithmeitic, infix expression passed as an input, how do you parse this input expression into a tree structure, operators as nodes, variables as child, in order to then evaluate using a recursion algorithm.

    Thanks in advance, any advice is greatly appreciated.

    This topic is technically different to the last one I started, although they are linked by commonality of purpose. My thinking was that this is how the admin's would prefer, if I'm wrong please accept my apologies, delete this and I will append to my earlier post.

  2. #2
    Registered User
    Join Date
    Sep 2004
    Posts
    719
    hey...sounds like we're working on close to the same project....i think you're a bit ahead of me though....

    try this link (go to bottom of page for link about *fix)

    http://www.brpreiss.com/books/opus4/html/page262.html



    you got any helpful resources?

  3. #3
    Registered User
    Join Date
    Sep 2004
    Posts
    719
    i guess i didn't really read your post all that well at first.....

    my best solution so far is..(for simplicity i'll only use number constants)
    i will use the terms brother for one node child and sister for the other
    Code:
    read number - store in root
    read sign - push down root (sign is in root now, first number is in root's child)
    read number - store in roots child
    go to current parent
    
    read sign
    if sign is - or +
          create parent (= root)
          store sign in new parent
          create daughter
          read number
          store in daughter
    else
          create sister (= child on right side if you were to visualize the tree)
          store * or / in new child (sister)
          read number
          store in newly created child

    something like that...i can find were i wrote it down

    the key is when you come across - or + to create a new parent (or push all the other nodes down) .......follow order of operations. the first in the oder should be at the bottom (leaves) of the tree..... parenthisis will be the lowest, then multiplication, then division, then subtraction, then addition will be on top....





    [/code]

  4. #4
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Shouldn't be too hard...

    Find the operator with the lowest priority, store it as the root node with the left and right expressions as subnodes.
    Repeat for both subnodes.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  5. #5
    Registered User
    Join Date
    Sep 2004
    Posts
    719
    you make it sound so easy

  6. #6
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    It's not very hard.
    One thing that complicates it a little is characters that can act as both unary and binary operators (-).

    One way to handle parenthesis is to add priority to every operators within () when tokenizing the expression. If you assign priorities 1-10 to your operators, an operator within n levels of () would have an addtional priority of 11*n.
    After this adjustment, the parenthesies are removed.
    Last edited by Sang-drax; 10-02-2004 at 07:28 AM.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  7. #7
    Registered User
    Join Date
    Oct 2004
    Posts
    19
    Misplaced, do you also have to parse complex arithmetic? Whats the title of your project?

    Thanks for that, that pseudo code does look right.

    Sang-drax, the way I understand it I can deal with that unary/binaryop duality issue in my scanner/tokeniser.

    Back to work, misplaced can share my code with you later on in the week if you like.

  8. #8
    Registered User
    Join Date
    Aug 2003
    Posts
    470
    I think you are trying to use operator precidence parsing. I have never used tried to create this kind of parser but you should be able to visualize the algorithm. That is, you have an expression such as "4 * (4 + 6)" and you represent precidence as
    "<<4> * <(<4> + <6>)>>" Of course, it's a little more complicated because most algorithms do this with only one token of lookahead.

    If you know grammars, just write a grammar for your expressions, weed out the left recursion, and write a recursive descent parser.

    But if you convert an infix expression to reverse-polish notation, then you should be able to create a tree. For instance, if you have (+ 4 (* 3 (+ 6 3))) you recursively eval to create a tree. That is, you create a tree with + as its operator, 4 as its left node and the rerturn of eval(* 3 (+ 6 3)) as its right node.

  9. #9
    Registered User
    Join Date
    Sep 2004
    Posts
    719
    i don't have a name for my project yet, but because i'm bored and have 3-4 more semesters of math courses i decided to write a program to simplify equations and solve them..after i get basics i will add more functionality to it....

    i've broken the parsing into 2 parts...the first part seperates the signs from the terms..
    the second part will store it in a tree.. i don't know if this is the best way to do it, but so far i've written the first part of parsing algorithm which returns a vector of strings...each operator (-+/*(){}[]) is it's own string and then each term is its own string in the same order. it looks like this
    | 3 | + | 5xw | + | 2 | / | ( |c |- | d | ) | (3+5xw+2/(c-d))
    | <-one element of vector<string>)-> |

    i've come to a stopping point here because i need to finish my tree...or start it even...
    but my plan from there, after i get the tree done, is to iterator through the vector and depending on what symbol i come across, i will either move up in the tree or down (like in the psuedo code). i doubt you will want to use it, but next time i'm on the unix side of my system i will send over to windows so i can post it here and you can take a look at it.
    i know you probably want the second part, but ...so do i...

  10. #10
    Registered User
    Join Date
    Sep 2004
    Posts
    719
    i'm only in my second semester of programming, so this code may or may not look like a joke to you, but maybe you can look at it and get some ideas (even if they're ideas of what not to do)

    Code:
    #include <string>
    #include <vector>
    
    using namespace std;
    
    //return true if x is  */-+(){}[]=
    //[]{}() precedence operators
    #define isOp(x) (x=='+'||x=='-'||x=='/'||x=='*'||x=='{'||x=='{'||x=='['||x==']'||x=='('||x==')'||x=='='||)
    
    //this function accepts a string to be parsed. the function will
    //then put each part (substring) into a vector and returns the vector.
    // -+\*[](){}'s will be placed in it own string inside the vector 
    // all other characters will be placed together if not seperated by an operator
    // a/b+c(dx - c) = |a| / |b| + |c| ( |dx| - |c| ) |  
    //source is the expression in the form of a string
    vector<string> Xpress::parse(string source, vector<string> &list)
    {
    	string temp("");			//start with empty buffer
    
    	//might want to insert '+' if source[0] is not '-' or '+'
    
    	
    	//loop through source and collect characters in buffer until a operator 
    	//is found. if operator found flush buffer; store operator in buffer; flush; 
             int length = source.length();
             char current;
    	for(int i = 0; i <= length; i++)
    	{
    		current = source[i];
    		if(isOp(current))		//if source[i]==/+-*()[]}or{
    		{
    			list.push_back(temp);     //insert buffer into vect
    			temp = "";                //clear buffer for operator
    			temp = temp + current;  //store source[i] into temp
    			list.push_back(temp);     //insert operator
    			temp = "";                //clear buffer
    		}else{
    			temp = temp + current;  //append character onto temp
    		}			
    	}
    	
    	list.push_back(temp);   //finish the job and clear buffer 1 last time
    	return list;
    }
    oh yea, this stores exponents with the terms....exponents represented by ^ will be parsed by a term class, not by the above function.
    Last edited by misplaced; 10-02-2004 at 11:36 PM.

  11. #11
    Registered User
    Join Date
    Oct 2004
    Posts
    19
    I'll have to come back to this thread later misplaced, at the moment not in a situation to comment. Thanks for the help though.

  12. #12
    Registered User
    Join Date
    Sep 2004
    Posts
    719
    is it neccessary for it to go straight from infix to a tree? because after doing some reading it seems MUCH easier converting infix to prefix and then building the tree. I've been working on the infix to tree algorithm all night and it just ends up becoming one big mess.
    Last edited by misplaced; 10-03-2004 at 07:23 AM.

  13. #13
    Registered User
    Join Date
    Aug 2003
    Posts
    470
    Yes, it's possible, but not necessary, to go from infix to the tree without any intermediate step. I suggest you do this with recursive descent, because the algorithm is easier to understand. If you have a right-recursive grammar such as
    E->T + E | T - E | T
    T->F * T | F / T | F
    F->digit | (E)
    You can write a lexer that will return the next token and a match function that will match a given token, call an error routine if the next token does not match, then scan the next token. At each step, the next token determines which nonterminal you call. If you have ,for instance, the expression "4 + 3$", where '$' is the EOF character, your recursive call sequence looks like as follows:
    next_token == 4
    E()
    T()
    F()
    digit
    match(DIGIT)
    next_token == +
    F() returns
    T() returns
    Were're now at E. Because the next token is a "+", we know that the derivation is E->T + E; we then know that we should match('+'), then call E().
    match('+')
    next_token == 3
    E()
    T()
    F()
    match(DIGIT)
    next_token = $
    F() returns
    T() returns
    E() returns

    You then need to map each nonterminal symbol with a respect function. The function E, corresponding to nonterminal E, would then look like this:
    Code:
    E* E() 
    {
            T* left = T();
            E* right = NULL;
            E* node = NULL;
    
            if (next_terminal == '+') {
                   match('+');
                   right = E();        
                   node = new_tree(left, right);
            } else if (next_terminal == '-') {
                   match('-');
                   right = E();
                   node = new_tree(left, right);
            } else if (next_terminal == '$') {
                   node = left;
            } else {
                    error();
            }
    
            return node;
    }
    You might need to make some changes, however. Both T and E must be derived from a base Node class in order for the nodes to mixed together in the tree. Your evaluation order will be a bit different than usual. "3 + 5 + 6" will be evaluated "3 + (5 + 6)", "3 + (5 / 6)". But if you are able to write the recursive descent code, you should then be ready to make some changes to the grammar.

  14. #14
    Registered User
    Join Date
    Sep 2004
    Posts
    719
    hehe..i was asking Roule is it was necessary to use infix->tree...

    since my last post i have changed my design a bit.......
    the function above will be changed to a stack....and then i will convert to prefix notation....
    then build the tree....

  15. #15
    Registered User
    Join Date
    Oct 2004
    Posts
    19
    Misplaced, are you doing ece4709 by any chance?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 05-13-2011, 08:28 AM
  2. C interview questions
    By natrajdreams in forum C Programming
    Replies: 7
    Last Post: 12-12-2010, 12:40 PM
  3. BST delete again, but this time I think I'm close
    By tms43 in forum C++ Programming
    Replies: 9
    Last Post: 11-05-2006, 06:24 PM
  4. c++ linking problem for x11
    By kron in forum Linux Programming
    Replies: 1
    Last Post: 11-19-2004, 10:18 AM
  5. BST/Red and Black Tree
    By ghettoman in forum C++ Programming
    Replies: 0
    Last Post: 10-24-2001, 10:45 PM