Thread: Advice on class design.

  1. #1
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657

    Advice on class design.

    I have a base class that should provide an interface for certain string manipulations/matching.
    The constructor would take a 'begin' and 'end' std::string::iterator to work on.
    Each of the derived classes implement a particular functionality and have to produce a std::string as a result, an iterator denoting the current position and a boolean denoting success/failure.

    My questions are:
    1. What is the best place to perform the computation? (derived constructors or a virtual 'magic' function called when required.)
    2. Should I store the three results as protected member variables and have the base class return them using normal methods when required or is a more 'virtual' way (three getters, I guess) better ?
    3. How should I clean up? (the begin iterator has to be reset in case of failure, regardless to which derived class did the work)

    I'm not posting any code now, but would do so after some suggestions or if I decide on an approach.

  2. #2
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    I'd do it a little differently: Have a type that encapsulates the three parts of the result. Then separately have a base class that has a single virtual operator (), that takes two iterators and returns the result type. Derived classes would have no state, and implement the operator only.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  3. #3
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    I have a base class that should provide an interface for certain string manipulations/matching.
    O_o

    I don't understand the thing you are shooting.

    The constructor would take a 'begin' and 'end' std::string::iterator to work on.
    Why should the constructor be responsible for storing information about the "work" `std::string'? Can I reset them at least? Do I have to build a new object to apply the same "manipulations/matching" object (derived class?) to a different string? Why does it even need to operate on a `std::string'? What if I want to do "manipulations/matching" on a file? Do I have to read the entire file in a `std::string'?

    Each of the derived classes implement a particular functionality and have to produce a std::string as a result, an iterator denoting the current position and a boolean denoting success/failure.
    What if I don't need a new `std::string'? Do I call your function and toss the `std::string' I don't need? If the `std::string' generated is the result of manipulating my input, why do I need an iterator to the current position? If the `std::string' is simply a matched instance of the input, the iterator itself should tell me what I need to know to build a string, so why do I need a particular `std::string' result? Is this functionality intended to be applied iteratively? If so, why am I responsible for creating a new object, or resetting the old object, when you've already told me the object itself is aware of its position in the input? The iterator is already going to be an invalid iterator if the "particular functionality" fails to find success, so why do I need a particular "success"/"fail" result when I can just get an iterator? Does the facilities allow partial success? Is the iterator returned a valid iterator to the current position even when the functionality in question fails?

    What is the best place to perform the computation?
    The discussed facility implements "certain string manipulations/matching", correct? I believe a function named after the "certain string manipulations/matching" processes is a perfect place to perform such computations. In the event a given facility is anonymously created, the function `operator ()' is perfectly named.

    derived constructors
    If your class only serves to implement a single, purposed operation with all mechanisms performed in a constructor you do not, in fact, have a class at all; you have a simple function pretending to be a class.

    derived constructors or a virtual 'magic' function called when required
    If I can change neither the "work" string or the operations performed against it, why am I not just calling a function? If the facilities can be referenced and used anonymously, how could you possibly do the work in a derived constructor?

    Should I store the three results as protected member variables and have the base class return them using normal methods when required
    Why do I need to call a method to query the validity of a `std::string::iterator' or `std::string' object, all of which are possibly generated by the same function invocation, just to use the results of "certain string manipulations/matching"? Are these results somehow unrelated? If they aren't unrelated, why don't you just give me the results when I invoke your facility as, seemingly, that is the purpose entirely of the facility in question? If they are unrelated, why are the classes all part of the same hierarchy?

    How should I clean up? (the begin iterator has to be reset in case of failure, regardless to which derived class did the work)
    Why are you going on about mutating a shared iterator when you can just mutate a copy?

    Then separately have a base class that has a single virtual operator (), that takes two iterators and returns the result type.
    Unrelated to the above, virtual operators can go wrong with come compilers; you can use "NVI" or your favorite variation as a workaround if desired.

    Soma

  4. #4
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    I don't understand the thing you are shooting.
    Part of a simple Recursive Descent Parser using PEG notation.
    Why should the constructor be responsible for storing information about the "work" `std::string'? Can I reset them at least?
    What for ?
    Do I have to build a new object to apply the same "manipulations/matching" object (derived class?) to a different string?
    Yes, that seems to be the simplest and cheapest(I think, compared to storing trees) way to implement memoization later.

    Why does it even need to operate on a `std::string'? What if I want to do "manipulations/matching" on a file? Do I have to read the entire file in a `std::string'?
    Good point.
    I guess making them accept generic iterators(this won't need random access, anyway) would solve that problem.

    What if I don't need a new `std::string'? Do I call your function and toss the `std::string' I don't need?
    Not sure when this situation would come up.
    When the input needs to be matched against optional targets or "zero or more (*)" and similar cases, it can(possibly) just return an empty string and still denote success.

    If the `std::string' generated is the result of manipulating my input, why do I need an iterator to the current position?
    If the `std::string' is simply a matched instance of the input, the iterator itself should tell me what I need to know to build a string, so why do I need a particular `std::string' result?
    Good point(later one is relevant), I guess premature optimization creeped into my though process.

    Is this functionality intended to be applied iteratively?
    If so, why am I responsible for creating a new object, or resetting the old object, when you've already told me the object itself is aware of its position in the input?
    The iterator is already going to be an invalid iterator if the "particular functionality" fails to find success, so why do I need a particular "success"/"fail" result when I can just get an iterator?
    Does the facilities allow partial success?
    Is the iterator returned a valid iterator to the current position even when the functionality in question fails?
    Yes.
    To refer back to old objects later.
    Thats why I thought I needed some cleanup code, to reset the current iterator to begin. Because the matching can still succeed without the iterator having moved.
    Not sure..would give some thought to it.
    It returns the iterator it was constructed with.(Anything better ?)

    If your class only serves to implement a single, purposed operation with all mechanisms performed in a constructor you do not, in fact, have a class at all; you have a simple function pretending to be a class.
    If I can change neither the "work" string or the operations performed against it, why am I not just calling a function?
    It also stores the result for later use.
    Inheritance helps to reduce much of the duplicated code in those functions if I overloaded them.

    If the facilities can be referenced and used anonymously, how could you possibly do the work in a derived constructor?
    I need a factory, I think.
    It can then construct the correct object according to the 'rules' specified.
    They can be used anonymously because the results are of the same nature.


    Why do I need to call a method to query the validity of a `std::string::iterator' or `std::string' object, all of which are possibly generated by the same function invocation, just to use the results of "certain string manipulations/matching"?
    Are these results somehow unrelated? If they aren't unrelated, why don't you just give me the results when I invoke your facility as, seemingly, that is the purpose entirely of the facility in question? If they are unrelated, why are the classes all part of the same hierarchy?
    I don't understand this block!


    Why are you going on about mutating a shared iterator when you can just mutate a copy?
    I am using a copy.
    By 'reset', I meant the iterator that was to be returned, so that is not invalid.

  5. #5
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by manasij7479 View Post
    Part of a simple Recursive Descent Parser using PEG notation.
    There are tools that do this for you, you know. Your post describes lexing, for which you can use a too like flex. It's is easy, if a bit cludgy due to being C, to work with. Or if you might want Unicode, try Quex.

    The way flex solves this particular problem is to convert pattern matching into a lookup table based state machine. This is much faster than pattern matching against each possible pattern in order, like you're thinking.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  6. #6
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by King Mir View Post
    Your post describes lexing, for which you can use a too like flex.
    As I understand it, PEG does away with a separate lexing stage and incorporates it into the parsing with simple 'greedy' dynamics.
    As a result of that, it is quite simple to write by hand.
    Parsing expression grammar - Wikipedia, the free encyclopedia
    The way flex solves this particular problem is to convert pattern matching into a lookup table based state machine. This is much faster than pattern matching against each possible pattern in order, like you're thinking.
    Not sure what you mean by this, but I do plan to have a std::map to match Non-Terminals to the Parsing Expressions which are recursive in nature.

  7. #7
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    After clearing my head, I have come up with a new and simpler design.
    Which allows the use of an object to take arbitrary inputs. The constructor decides what its inputs are going to be matched against.
    This uses the objects like functions, but on the downside has a huge inheritance tree of design.

    I'm quite surprised how pencil and paper gave me the design much quicker than the IDE !
    I'll write and post the code tomorrow.

  8. #8
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by manasij7479 View Post
    Not sure what you mean by this, but I do plan to have a std::map to match Non-Terminals to the Parsing Expressions which are recursive in nature.
    I presume that maps patterns to tokens or similar. lexer tools achieve the same thing, but instead of matching against each pattern in the map in turn, they create a state machine. For instance if you have keywords "bool" and "break", instead of checking the stream for both stings in turn, it would check if it started with a "b" then check for "o" or "r", ect.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  9. #9
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by manasij7479 View Post
    After clearing my head, I have come up with a new and simpler design.
    Which allows the use of an object to take arbitrary inputs. The constructor decides what its inputs are going to be matched against.
    This uses the objects like functions, but on the downside has a huge inheritance tree of design.

    I'm quite surprised how pencil and paper gave me the design much quicker than the IDE !
    I'll write and post the code tomorrow.
    Instead of thinking about these details, you might just do a bit of browsing and find a parser generator that already exists.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  10. #10
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Part of a simple Recursive Descent Parser using PEG notation.
    I knew exactly at what project you shot.

    No. I was talking about the implementation.

    I don't understand this block!
    You say you've changed the design. If you want something more from me, post your current ideas.

    This is much faster than pattern matching against each possible pattern in order, like you're thinking.
    O_o

    The two things you are discussing can not be compared with such a trivial mentality.

    That said, recursive descent parsers and packrat parsers may memoize, or partially memoize, for linear complexity.

    And that said, you are talking about using a tool to do the work of only part of the job that PEG over a recursive descent parser is capable.

    And THAT said, I understand your recommending "flex"; I see it a lot. Now, before you recommend it again, the worlds of theory and practicality both would appreciate it if you went and used GLR/PEG tools. "flex", "lex", "yacc", "bison", and kin all need to die.

    Instead of thinking about these details, you might just do a bit of browsing and find a parser generator that already exists.
    Well, here you are just incredibly wrong. You don't get to stop thinking about designing the functionality around a parser when you use a tool to generate the parser.

    Should the "flex" rules be simple functions passing around static data? (I know it isn't an actual parser.)

    No. That's awful. Should they be classes... right back to static data.

    Should they be methods of a class? Now we are getting somewhere, but how do we integrate the errors reported by the parser generated by "flex"? Is the code "flex" generates robust enough to survive exceptions?

    *shrug*

    I could go on for a long time, but I'm saving that for this next bit which is far more interesting and less, I think, a knee-jerk response of "OMG! Use existing tools!".

    lexer tools achieve the same thing, but instead of matching against each pattern in the map in turn, they create a state machine.
    A mutually recursive descent parser is unambiguously a finite state machine.

    What those tools achieve is automation: they transform a grammar from a reference-based state machine into a table-driven state machine.

    For instance if you have keywords "bool" and "break", instead of checking the stream for both stings in turn, it would check if it started with a "b" then check for "o" or "r", ect.
    That's a great example of a poorly designed phase of lexical analysis.

    Both "break" and "bool" are keywords, but they have different meanings, validity, and semantics.

    Consider instead transforming both inputs, and all other "trivial named labels", into a single token of the form "UNKNOWN_LABEL".

    The words "enum","and","static", and so on as parse against this "UNKNOWN_LABEL" grammar rule which means no breaking and no transitions of any kind.

    The tokenizing is done; we can move into semantic analysis. (This can conceptually be done at the same level the lexer lives. Yes, you can also get some semantics "for free" with a lexer if you massage your grammar.) The token is available; we don't need to move backwards. (We may need to move forward for the next token to make a decision. If we memoize both this token and the next token we can look ahead "forever" without repeating a lexical analysts phase almost regardless of the flavor of lexer we use. It is entirely possible that this process is how we got to this token.) We can then make a decision about whether this is likely to be a declaration ("bool" is found when a declaration is valid), a control statement ("break" if a control statement is valid), or an error (neither statement is valid).

    Now, not every grammar can be massaged into a form so suitable, but many can be made into an intentionally more ambiguous lexical problem without costing anything when the nature of the grammar requires "fix ups" in any event.

    Granted, that was a bad example.

    Let us consider instead something more meaningful to your side of the argument "{" or "(".

    Well, here again, we don't need to cloud our lexical analysis with this condition. (No, we really don't. It matters yes, but they may both be considered "grouping" tokens which simplifies the C lexical analysis by foisting the examination onto, again, semantic considerations which we will ultimately need to perform to produce anything meaningful in the face of errors.)

    Still though, that's a bad example. Let's consider a more generic "UNKNOWN_LABEL" and "GROUPING_OPTIONAL".

    Here again we don't face a problem; if they are unambiguous no extra analysis need be performed regardless of the method of parsing.

    Okay. Let's consider a "UNKNOWN_LABEL" only grammar like a terribly child "BASIC".

    Here everything is a token. That could be problematic if strictly ordered with significant overlay with a recursive descent parser.

    For the sake of argument, let's assign "BEGIN" (define block), "BEGINERROR" (function level `catch' as in C++), "BEGINIF" (control flow), "BEGINWHILE" (control flow), and other such horrors as you imagine as keywords while still allowing things like "BEGIN123" as generic labels.

    Is even this a problem for a recursive descent parser (Here I'm implying that backtracking is both significant and costly.)?

    Yes. Sure, you can get around the issue with specific parsing expressions.

    Code:
    BAD_HACK => 'B','E','G','I','N'
    KEYWORDS => (BAD_HACK,ERROR)|(BAD_HACK,IF)
    But honestly that is an awful thing to do to a grammar.

    The point is, even this can be resolved without backtracking in the lexer by spending more effort on the analysis of the grammar itself.

    Absolutely, none of that is necessary with a parser generated by tools like "flex".

    However, none of that is necessary with a parser generated by tools like "flex" which happens to include GLR/PEG type parsers.

    Ultimately though, it is better, where possible, to change the grammar to be reasonable so that you don't have to go to such lengths to parse it.

    Soma

  11. #11
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    That's an interesting account of issues with flex, but there are also PEG parser generators out there. A quick google reveals there is peg and chilon, and ANTLR is PEG-like. Unfortunately ANTLR C++ documentation is lacking.

    A few other points:
    Well, here you are just incredibly wrong. You don't get to stop thinking about designing the functionality around a parser when you use a tool to generate the parser.

    Should the "flex" rules be simple functions passing around static data? (I know it isn't an actual parser.)

    No. That's awful. Should they be classes... right back to static data.

    Should they be methods of a class? Now we are getting somewhere, but how do we integrate the errors reported by the parser generated by "flex"? Is the code "flex" generates robust enough to survive exceptions?
    If a meta language like flex is used, then there's no need for a direct external interface to rules. They're just switch cases.


    With regards to the bad hack to make key words contextual, I agree that's a hack, but between worrying about how to write a pattern matcher and writing out keywords as grammar rules, I'd recommend the latter.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  12. #12
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    That's an interesting account of issues with flex, but there are also PEG parser generators out there.
    You mean such tools as I referenced more than once? Yes. I am aware that they exist. Yes. They all have problems. (Poor documentation is common enough it isn't really fair to single one out for that.) Generally, they also need to be paired with yet more tools to be useful as a parser generator.

    If a meta language like flex is used, then there's no need for a direct external interface to rules.
    Sorry. That's my fault for your confusion. I used "rule" in two entirely different contexts.

    I'm talking about the "flex" production targets for responding to "flex" as it parses and generates its results like "%%keyword printf("%s", lex_text");". (It has been ages since I've used "flex". I'm certain that syntax isn't valid. I'm just showing you what I was talking about.)

    With regards to the bad hack to make key words contextual, I agree that's a hack, but between worrying about how to write a pattern matcher and writing out keywords as grammar rules, I'd recommend the latter.
    I'm sure you would. You'll find that a lot of people just want to get the job done.

    There is no harm in that for small projects or even large projects parsing a language born of isolation.

    If you want to develop a great grammar, you can't take that shortcut.

    As far as "worrying about how to write a pattern matcher"? If you understand the theory, you don't need to worry. ^_^

    Soma

  13. #13
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by phantomotap View Post
    "flex" ... get the job done... for small projects or even large projects parsing a language born of isolation.
    This is why I brought it up.

    Certainly existing tools are not perfect for every use, but writing a parser for a domain specific language is a pretty common problem, and there are tools out there to do it easily.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  14. #14
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by King Mir View Post
    Certainly existing tools are not perfect for every use, but writing a parser for a domain specific language is a pretty common problem, and there are tools out there to do it easily.
    Well, that does not help me much while trying to learn the theory.
    I may try to create such a generator based on my own code in future, though.

    Anyway, here is the revised design (code not rigorously tested yet..).
    It is few steps forward(much simpler, same object to work on arbitrary strings, etc) and a few back back from the original(uses the input iterator a a reference,huge inheritance tree ..etc).
    Also, it still uses std::string iterators. I'm having some trouble combining templates and inheritance..but they should go away after a few hello world experiments.

    The giant looks like this
    Code:
    ParsingExpression
        Atomic
            Terminal
                 Empty
                 SingleChar
                 Range
            NonTerminal
        Compound
            Sequence
            Choice
            ZeroOrMore
            OneOrMore
            Optional
            AndPredicate
            NotPredicate
    I'm describing it using 4 classes, the base, and three of the leaves.
    Code:
    //Base
        typedef std::string::iterator sit;
        class ParsingExpression
        {
        public:
            virtual bool operator()(sit& begin,sit end)=0;
        };
    Code:
    //Matching ranges
        class Range:public Terminal
        {
        public:
            Range(char start_,char stop_):start(start_),stop(stop_){}
            bool operator()(sit& begin,sit end)
            {
                char i = *begin;
                if(i>=start&&i<=stop)
                {
                    ++begin;
                    return true;
                }
                else return false;
            }
        private:
            char start;
            char stop;
        };
    Code:
    //Sequence 
        class Sequence:public Compound
        {
        public:
           // Sequence(std::vector<ParsingExpression*> s):subjects(s){} //Should I let a 'friendly' factory worry about this?
            bool operator()(sit& begin,sit end)
            {
                sit ori = begin;
                bool result=true;
                for(int i=0;i<subjects.size() && result && begin!=end;++i)
                {
                    result = subjects[i]->operator()(begin,end);
                }
                if(!result) begin = ori;
                return result;
            }
        private:
            std::vector<ParsingExpression*> subjects;
        };
    //Priority Choice
        class Choice:public Compound 
        {
        public:
            bool operator()(sit& begin,sit end)
            {
                for(auto e:subjects)
                {
                    if(e->operator()(begin,end))
                        return true;
                }
            };
        private:
            std::vector<ParsingExpression*> subjects;
        };

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 7
    Last Post: 08-06-2011, 11:58 AM
  2. Design guidelines advice - IRC Bot class
    By IceDane in forum C# Programming
    Replies: 2
    Last Post: 12-05-2008, 01:21 AM
  3. some design advice
    By viaxd in forum Game Programming
    Replies: 0
    Last Post: 08-03-2006, 06:08 AM
  4. need advice with class design
    By DonFiasco in forum C++ Programming
    Replies: 4
    Last Post: 12-27-2004, 08:29 PM
  5. Web Design advice
    By sean in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 04-01-2003, 04:34 PM