Confused about the application of some OOP concepts.

**manasij7479** · 08-20-2011

Suppose I have a class called token.
Another class called Op (for operator) is derived from it.

Polymorphism says that objects of Op can be treated as a token.

But how do I use this when a somewhat reverse situation comes up?

For example:
I have a lexer function that takes in a big std::string and generates substrings which feed the constructor of token.
Is there a way, by which I can automatically make the value in question an Op(thus calling Op's constructor) depending on a check made by token's constructor ?

**grumpy** · 08-20-2011

No. The decision about the type of an object to be created has to be made before any constructors are invoked.

You need to use some approach that checks the string, and decides what type of object to create.

For example,

Code:

    token *tok;
    if (corresponds_to_Op(the_string))
         tok = new Op(the_string);
    else
         tok = something_else();

    // do stuff with the token

    delete tok;

You might want to look up the factory pattern.

**manasij7479** · 08-20-2011

I came up with the following function.
Is there any memory-leak effects or something else I've overlooked ?
//Here Keyword, Identifier, Operator, Literal and Punctuator are sub classes of token .

Code:

token* token_factory(std::string& input)
{
    token* t;
    
    if(is_keyword(input))
    {        
        //type=keyword;
        t = new Keyword(input);
        
    }
    else if(isalpha(input[0]))
    {
        t = new Identifier (input);
        //type=identifier;
    }
    else if(is_operator(input))
    {
        t = new Operator(input);
        //type=op;
    }
    else if(isdigit(input[0])||input[0]=='"'||input[0]=='\'')
    {
        t = new Literal(input);
        //type=literal;
    }
    else if(is_punctuator(input))
    {
        t = new Punctuator(input);
        //type = punctuator;
    }
    else throw(std::string("Unsupported Token: "+input));
    
    return t;
}

**Elysia** · 08-20-2011

In case you need to use pointers like this, use smart pointers to avoid leaks.

**whiteflags** · 08-20-2011

The only way this function leaks is if the return value is ignored.

throw(std::string("Unsupported Token: "+input));

I wouldn't throw a string. Make an exception type from std::exception.

Code:

class UnsupportedToken: public std::exception {
public:
   UnsupportedToken (std::string& input) {
      // construct message
   }
   const char* what () const {
      return message.c_str();
   }
private:
   std::string message;
};

**gardhr** · 08-20-2011

Originally Posted by manasij7479

Suppose I have a class called token.
Another class called Op (for operator) is derived from it.

Polymorphism says that objects of Op can be treated as a token.

But how do I use this when a somewhat reverse situation comes up?

For example:
I have a lexer function that takes in a big std::string and generates substrings which feed the constructor of token.
Is there a way, by which I can automatically make the value in question an Op(thus calling Op's constructor) depending on a check made by token's constructor ?

The process of first breaking the input into tokens and then assigning a type to them is a bit overdrawn - the act of "breaking" itself is precisely the point where the token should be matched. Here's what I mean: suppose you are evaluating the expression "exp(1.0)". The lexer produces the sequence "exp", "(", "1.0", ")". Then you test each token and assign it a type. But the lexer already knows the types a priori, because otherwise it could not have broken them up in the first place! So...the token pattern matching and type assignment should ideally occur at the same time. And since the sequence of tokens are only meaningful if syntacticly correct, you'd might as well "line up" the token recognizers in such a manner that describes the grammer of the language in question.

And as far as "perfectly describing the problem in pure OOP" - forget about it. It's a mental trap that will sap your project of sensibility and efficiency, and waste countless hours that could have been productive had you simply looked at the problem from a modular/utilitarian perspective. I'm not trying to be negative, mind you, just speaking from experience...

**manasij7479** · 08-20-2011

Originally Posted by gardhr

The process of first breaking the input into tokens and then assigning a type to them is a bit overdrawn - the act of "breaking" itself is precisely the point where the token should be matched. Here's what I mean: suppose you are evaluating the expression "exp(1.0)". The lexer produces the sequence "exp", "(", "1.0", ")". Then you test each token and assign it a type. But the lexer already knows the types a priori, because otherwise it could not have broken them up in the first place! So...the token pattern matching and type assignment should ideally occur at the same time. And since the sequence of tokens are only meaningful if syntacticly correct, you'd might as well "line up" the token recognizers in such a manner that describes the grammer of the language in question.

And as far as "perfectly describing the problem in pure OOP" - forget about it. It's a mental trap that will sap your project of sensibility and efficiency, and waste countless hours that could have been productive had you simply looked at the problem from a modular/utilitarian perspective. I'm not trying to be negative, mind you, just speaking from experience...

I don't think it'd be possible to do both at the same time (although I'm calling the token_factory function from the lexer itself )
because the lexer only recognizes change in types of characters it encounters .... alphabets, numbers, and other symbols.
How would I, in that case tell a punctuator apart from an operator or a literal apart from an identifier ?

Can you suggest a way to handle that with the lexer?

**gardhr** · 08-21-2011

Originally Posted by manasij7479

I don't think it'd be possible to do both at the same time (although I'm calling the token_factory function from the lexer itself )
because the lexer only recognizes change in types of characters it encounters .... alphabets, numbers, and other symbols.
How would I, in that case tell a punctuator apart from an operator or a literal apart from an identifier ?

Can you suggest a way to handle that with the lexer?

Okay, so you see the lexer is actually preparsing the input and then passing the result to the tokenizer to be reparsed! Not only is this unnecessary, it in fact makes things more difficult because you must constantly keep the logic of the two "in sync" whenever changes in the code have to be made. Just do away with one or the other. The tokenizing factory can simply return a new token if it matches the current sequence of characters, otherwise null or what have you. Just keep in mind that you may eventually reach the point where an ambiguity exists in the grammer that may make it necessary to be mindful of the ordering of your tokenizers. For example, lets say both "=" and "==" are valid tokens. In that case, you obviously have to check for the second sequence first, or else you end up with two separate tokens where a single one should have been recognized.

Thread: Confused about the application of some OOP concepts.

Thread Tools

Search Thread

Display

Confused about the application of some OOP concepts.

Similar Threads

C++0x Concepts fall

Same Language = two concepts

concepts on 3d array

Programming concepts