Thread: Confused about the application of some OOP concepts.

  1. #1
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657

    Confused about the application of some OOP concepts.

    Suppose I have a class called token.
    Another class called Op (for operator) is derived from it.

    Polymorphism says that objects of Op can be treated as a token.

    But how do I use this when a somewhat reverse situation comes up?

    For example:
    I have a lexer function that takes in a big std::string and generates substrings which feed the constructor of token.
    Is there a way, by which I can automatically make the value in question an Op(thus calling Op's constructor) depending on a check made by token's constructor ?

  2. #2
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    No. The decision about the type of an object to be created has to be made before any constructors are invoked.

    You need to use some approach that checks the string, and decides what type of object to create.

    For example,
    Code:
        token *tok;
        if (corresponds_to_Op(the_string))
             tok = new Op(the_string);
        else
             tok = something_else();
    
        // do stuff with the token
    
        delete tok;
    You might want to look up the factory pattern.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  3. #3
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    I came up with the following function.
    Is there any memory-leak effects or something else I've overlooked ?
    //Here Keyword, Identifier, Operator, Literal and Punctuator are sub classes of token .
    Code:
    token* token_factory(std::string& input)
    {
        token* t;
        
        if(is_keyword(input))
        {        
            //type=keyword;
            t = new Keyword(input);
            
        }
        else if(isalpha(input[0]))
        {
            t = new Identifier (input);
            //type=identifier;
        }
        else if(is_operator(input))
        {
            t = new Operator(input);
            //type=op;
        }
        else if(isdigit(input[0])||input[0]=='"'||input[0]=='\'')
        {
            t = new Literal(input);
            //type=literal;
        }
        else if(is_punctuator(input))
        {
            t = new Punctuator(input);
            //type = punctuator;
        }
        else throw(std::string("Unsupported Token: "+input));
        
        return t;
    }

  4. #4
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    In case you need to use pointers like this, use smart pointers to avoid leaks.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  5. #5
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    The only way this function leaks is if the return value is ignored.

    throw(std::string("Unsupported Token: "+input));

    I wouldn't throw a string. Make an exception type from std::exception.

    Code:
    class UnsupportedToken: public std::exception {
    public:
       UnsupportedToken (std::string& input) {
          // construct message
       }
       const char* what () const {
          return message.c_str();
       }
    private:
       std::string message;
    };
    Last edited by whiteflags; 08-20-2011 at 04:50 PM. Reason: Guess what, it's called exception.

  6. #6
    Registered User gardhr's Avatar
    Join Date
    Apr 2011
    Posts
    151
    Quote Originally Posted by manasij7479 View Post
    Suppose I have a class called token.
    Another class called Op (for operator) is derived from it.

    Polymorphism says that objects of Op can be treated as a token.

    But how do I use this when a somewhat reverse situation comes up?

    For example:
    I have a lexer function that takes in a big std::string and generates substrings which feed the constructor of token.
    Is there a way, by which I can automatically make the value in question an Op(thus calling Op's constructor) depending on a check made by token's constructor ?

    The process of first breaking the input into tokens and then assigning a type to them is a bit overdrawn - the act of "breaking" itself is precisely the point where the token should be matched. Here's what I mean: suppose you are evaluating the expression "exp(1.0)". The lexer produces the sequence "exp", "(", "1.0", ")". Then you test each token and assign it a type. But the lexer already knows the types a priori, because otherwise it could not have broken them up in the first place! So...the token pattern matching and type assignment should ideally occur at the same time. And since the sequence of tokens are only meaningful if syntacticly correct, you'd might as well "line up" the token recognizers in such a manner that describes the grammer of the language in question.

    And as far as "perfectly describing the problem in pure OOP" - forget about it. It's a mental trap that will sap your project of sensibility and efficiency, and waste countless hours that could have been productive had you simply looked at the problem from a modular/utilitarian perspective. I'm not trying to be negative, mind you, just speaking from experience...

  7. #7
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by gardhr View Post
    The process of first breaking the input into tokens and then assigning a type to them is a bit overdrawn - the act of "breaking" itself is precisely the point where the token should be matched. Here's what I mean: suppose you are evaluating the expression "exp(1.0)". The lexer produces the sequence "exp", "(", "1.0", ")". Then you test each token and assign it a type. But the lexer already knows the types a priori, because otherwise it could not have broken them up in the first place! So...the token pattern matching and type assignment should ideally occur at the same time. And since the sequence of tokens are only meaningful if syntacticly correct, you'd might as well "line up" the token recognizers in such a manner that describes the grammer of the language in question.

    And as far as "perfectly describing the problem in pure OOP" - forget about it. It's a mental trap that will sap your project of sensibility and efficiency, and waste countless hours that could have been productive had you simply looked at the problem from a modular/utilitarian perspective. I'm not trying to be negative, mind you, just speaking from experience...
    I don't think it'd be possible to do both at the same time (although I'm calling the token_factory function from the lexer itself )
    because the lexer only recognizes change in types of characters it encounters .... alphabets, numbers, and other symbols.
    How would I, in that case tell a punctuator apart from an operator or a literal apart from an identifier ?

    Can you suggest a way to handle that with the lexer?
    Last edited by manasij7479; 08-21-2011 at 01:01 AM.

  8. #8
    Registered User gardhr's Avatar
    Join Date
    Apr 2011
    Posts
    151
    Quote Originally Posted by manasij7479 View Post
    I don't think it'd be possible to do both at the same time (although I'm calling the token_factory function from the lexer itself )
    because the lexer only recognizes change in types of characters it encounters .... alphabets, numbers, and other symbols.
    How would I, in that case tell a punctuator apart from an operator or a literal apart from an identifier ?

    Can you suggest a way to handle that with the lexer?
    Okay, so you see the lexer is actually preparsing the input and then passing the result to the tokenizer to be reparsed! Not only is this unnecessary, it in fact makes things more difficult because you must constantly keep the logic of the two "in sync" whenever changes in the code have to be made. Just do away with one or the other. The tokenizing factory can simply return a new token if it matches the current sequence of characters, otherwise null or what have you. Just keep in mind that you may eventually reach the point where an ambiguity exists in the grammer that may make it necessary to be mindful of the ordering of your tokenizers. For example, lets say both "=" and "==" are valid tokens. In that case, you obviously have to check for the second sequence first, or else you end up with two separate tokens where a single one should have been recognized.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. C++0x Concepts fall
    By Mario F. in forum General Discussions
    Replies: 32
    Last Post: 07-27-2009, 02:31 AM
  2. Same Language = two concepts
    By swgh in forum C++ Programming
    Replies: 1
    Last Post: 03-09-2006, 06:28 PM
  3. concepts on 3d array
    By AngKar in forum C Programming
    Replies: 5
    Last Post: 12-29-2005, 07:56 PM
  4. Programming concepts
    By lyx in forum C++ Programming
    Replies: 2
    Last Post: 12-03-2003, 12:37 AM