# Thread: Parsing mathematical function to tree structure

1. ## Parsing mathematical function to tree structure

Background Banter:
As per salem's advice earlier today ive learnt how to convert from infix to postfix using a stack, and how to then evaluate that RPN expression.

My next question is how to do this using a tree, ive done quite a bit of reading and have figured out how to evaluate an expression already parsed into the tree, but I cant find a resource, book or otherwise documenting how to parse an infix expression into a tree structure. Generally speaking im a little tired and may have missed something but im normally thorough with searching.

Question:

1 - With an algebraic, possibly containing complex arithmeitic, infix expression passed as an input, how do you parse this input expression into a tree structure, operators as nodes, variables as child, in order to then evaluate using a recursion algorithm.

This topic is technically different to the last one I started, although they are linked by commonality of purpose. My thinking was that this is how the admin's would prefer, if I'm wrong please accept my apologies, delete this and I will append to my earlier post.

2. hey...sounds like we're working on close to the same project....i think you're a bit ahead of me though....

http://www.brpreiss.com/books/opus4/html/page262.html

3. i guess i didn't really read your post all that well at first.....

my best solution so far is..(for simplicity i'll only use number constants)
i will use the terms brother for one node child and sister for the other
Code:
```read number - store in root
read sign - push down root (sign is in root now, first number is in root's child)
read number - store in roots child
go to current parent

if sign is - or +
create parent (= root)
create daughter
store in daughter
else
create sister (= child on right side if you were to visualize the tree)
store * or / in new child (sister)
store in newly created child```

something like that...i can find were i wrote it down

the key is when you come across - or + to create a new parent (or push all the other nodes down) .......follow order of operations. the first in the oder should be at the bottom (leaves) of the tree..... parenthisis will be the lowest, then multiplication, then division, then subtraction, then addition will be on top....

[/code]

4. Shouldn't be too hard...

Find the operator with the lowest priority, store it as the root node with the left and right expressions as subnodes.
Repeat for both subnodes.

5. you make it sound so easy

6. It's not very hard.
One thing that complicates it a little is characters that can act as both unary and binary operators (-).

One way to handle parenthesis is to add priority to every operators within () when tokenizing the expression. If you assign priorities 1-10 to your operators, an operator within n levels of () would have an addtional priority of 11*n.
After this adjustment, the parenthesies are removed.

7. Misplaced, do you also have to parse complex arithmetic? Whats the title of your project?

Thanks for that, that pseudo code does look right.

Sang-drax, the way I understand it I can deal with that unary/binaryop duality issue in my scanner/tokeniser.

Back to work, misplaced can share my code with you later on in the week if you like.

8. I think you are trying to use operator precidence parsing. I have never used tried to create this kind of parser but you should be able to visualize the algorithm. That is, you have an expression such as "4 * (4 + 6)" and you represent precidence as
"<<4> * <(<4> + <6>)>>" Of course, it's a little more complicated because most algorithms do this with only one token of lookahead.

If you know grammars, just write a grammar for your expressions, weed out the left recursion, and write a recursive descent parser.

But if you convert an infix expression to reverse-polish notation, then you should be able to create a tree. For instance, if you have (+ 4 (* 3 (+ 6 3))) you recursively eval to create a tree. That is, you create a tree with + as its operator, 4 as its left node and the rerturn of eval(* 3 (+ 6 3)) as its right node.

9. i don't have a name for my project yet, but because i'm bored and have 3-4 more semesters of math courses i decided to write a program to simplify equations and solve them..after i get basics i will add more functionality to it....

i've broken the parsing into 2 parts...the first part seperates the signs from the terms..
the second part will store it in a tree.. i don't know if this is the best way to do it, but so far i've written the first part of parsing algorithm which returns a vector of strings...each operator (-+/*(){}[]) is it's own string and then each term is its own string in the same order. it looks like this
| 3 | + | 5xw | + | 2 | / | ( |c |- | d | ) | (3+5xw+2/(c-d))
| <-one element of vector<string>)-> |

i've come to a stopping point here because i need to finish my tree...or start it even...
but my plan from there, after i get the tree done, is to iterator through the vector and depending on what symbol i come across, i will either move up in the tree or down (like in the psuedo code). i doubt you will want to use it, but next time i'm on the unix side of my system i will send over to windows so i can post it here and you can take a look at it.
i know you probably want the second part, but ...so do i...

10. i'm only in my second semester of programming, so this code may or may not look like a joke to you, but maybe you can look at it and get some ideas (even if they're ideas of what not to do)

Code:
```#include <string>
#include <vector>

using namespace std;

//return true if x is  */-+(){}[]=
//[]{}() precedence operators
#define isOp(x) (x=='+'||x=='-'||x=='/'||x=='*'||x=='{'||x=='{'||x=='['||x==']'||x=='('||x==')'||x=='='||)

//this function accepts a string to be parsed. the function will
//then put each part (substring) into a vector and returns the vector.
// -+\*[](){}'s will be placed in it own string inside the vector
// all other characters will be placed together if not seperated by an operator
// a/b+c(dx - c) = |a| / |b| + |c| ( |dx| - |c| ) |
//source is the expression in the form of a string
vector<string> Xpress::parse(string source, vector<string> &list)
{

//might want to insert '+' if source[0] is not '-' or '+'

//loop through source and collect characters in buffer until a operator
//is found. if operator found flush buffer; store operator in buffer; flush;
int length = source.length();
char current;
for(int i = 0; i <= length; i++)
{
current = source[i];
if(isOp(current))		//if source[i]==/+-*()[]}or{
{
list.push_back(temp);     //insert buffer into vect
temp = "";                //clear buffer for operator
temp = temp + current;  //store source[i] into temp
list.push_back(temp);     //insert operator
temp = "";                //clear buffer
}else{
temp = temp + current;  //append character onto temp
}
}

list.push_back(temp);   //finish the job and clear buffer 1 last time
return list;
}```
oh yea, this stores exponents with the terms....exponents represented by ^ will be parsed by a term class, not by the above function.

11. I'll have to come back to this thread later misplaced, at the moment not in a situation to comment. Thanks for the help though.

12. is it neccessary for it to go straight from infix to a tree? because after doing some reading it seems MUCH easier converting infix to prefix and then building the tree. I've been working on the infix to tree algorithm all night and it just ends up becoming one big mess.

13. Yes, it's possible, but not necessary, to go from infix to the tree without any intermediate step. I suggest you do this with recursive descent, because the algorithm is easier to understand. If you have a right-recursive grammar such as
E->T + E | T - E | T
T->F * T | F / T | F
F->digit | (E)
You can write a lexer that will return the next token and a match function that will match a given token, call an error routine if the next token does not match, then scan the next token. At each step, the next token determines which nonterminal you call. If you have ,for instance, the expression "4 + 3\$", where '\$' is the EOF character, your recursive call sequence looks like as follows:
next_token == 4
E()
T()
F()
digit
match(DIGIT)
next_token == +
F() returns
T() returns
Were're now at E. Because the next token is a "+", we know that the derivation is E->T + E; we then know that we should match('+'), then call E().
match('+')
next_token == 3
E()
T()
F()
match(DIGIT)
next_token = \$
F() returns
T() returns
E() returns

You then need to map each nonterminal symbol with a respect function. The function E, corresponding to nonterminal E, would then look like this:
Code:
```E* E()
{
T* left = T();
E* right = NULL;
E* node = NULL;

if (next_terminal == '+') {
match('+');
right = E();
node = new_tree(left, right);
} else if (next_terminal == '-') {
match('-');
right = E();
node = new_tree(left, right);
} else if (next_terminal == '\$') {
node = left;
} else {
error();
}

return node;
}```
You might need to make some changes, however. Both T and E must be derived from a base Node class in order for the nodes to mixed together in the tree. Your evaluation order will be a bit different than usual. "3 + 5 + 6" will be evaluated "3 + (5 + 6)", "3 + (5 / 6)". But if you are able to write the recursive descent code, you should then be ready to make some changes to the grammar.

14. hehe..i was asking Roule is it was necessary to use infix->tree...

since my last post i have changed my design a bit.......
the function above will be changed to a stack....and then i will convert to prefix notation....
then build the tree....

15. Misplaced, are you doing ece4709 by any chance?