Thread: Preprocessor: Separation of class member declaration and implementation

  1. #1
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446

    Preprocessor: Separation of class member declaration and implementation

    I know some here are in the process of writing your own C++ compilers (or have dabbled on it in the past, or have hands-on experience with BNF). I'm currently writing an article where I do a critic of C# Partial Classes. But there's a piece of information I'd like to have on the matter:

    How big of an impact the decision to separate member declarations from implementation can have on the preprocessor. I mean, if you consider this choice has a significant impact when compared to the decision to implement a "monolithic" code structure in which a class can't forward declare its members. I'm guessing the impact is no so big... but don't really know.

    Forget for a moment the semantics behind this on C++. That's a given. Just curious about this choice impact on the preprocessor.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  2. #2
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    Can you give a little more information about what you are asking about?

    The preprocessor doesn't know or care about declarations.

    Declarations are a phase of compilation necessary to fit the rules of C and C++ validation that has little to do with parsing.

    The preprocessor is only a tool used to facilitate the use of declarations in a multiple translation unit situation.

    If you don't mind manually doing the declarations, and need no macros, pragmas and such, you can build a translation unit and link them without ever using a preprocessor. The declarations themselves must still be used if, for the sake of example, function `A' calls function `B' and function `B' calls function `A'. (I'm ignoring the lighter rules C has in place related to implicit declaration.)

    Soma

  3. #3
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    I've always assumed one of the preprocessor tasks was to join header and implementation files into a single translation unit that could later be handled. But in any case, for clarification, I'm considering the whole of the processes taking place before code generation and optimization starts.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  4. #4
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    I've always assumed one of the preprocessor tasks was to join header and implementation files into a single translation unit that could later be handled.
    It absolutely is, but only because the final translation unit needs those declarations in order to follow the semantic rules of the language. (Does this function require two parameters or only one. The syntax parses exactly the same either way, but without that semantic information the required behavior is ambiguous.) It doesn't however require any C or C++ parsing technology. Though, now that you've clarified, it is true that in practice some compiler suites with multiple stages do process names while in the preprocessor stage. It just isn't required.



    Well, anyway, let us assume we are talking about languages derived from C but structurally different in that it always requires separation by mandating declarations for all objects. (Yes, in this case, a preprocessor that knows the language as a means of generating the declarations from source would be useful, but keep in mind that this is a distinct phase and that the declarations here will always be required. The preprocessor here only serves to simplify their creation whereas in real C and C++ it only serves to simplify the reuse of manually crafted declarations.)

    In such a language, even considering the strange semantically constrained context free grammar, forcing the declaration would make parsing faster, but not necessarily easier to implement by third-parties. A name table used in conjunction with memoizing parsers could potentially eliminate large parts of the parsing possibilities by knowing that the token being parsed is (at least so far as) a function name instead of just a variable name; knowing this, the compiler knows to expect the rest of the function call syntax instead of a continued mathematics expression syntax, for example. This has the potential to speed parsing of the language considerably; even more so long as semantic rules imposed on naming are followed.

    This constraint also implies significant advantages in many other similar areas; modestly, it would eliminate entire trees on a character by character basis without maintaining the transition tables of "LR(n)" or the verbosity of certain grammar specification language.

    It is a real win, and more importantly, some of this applies to parsing actual C and C++ especially in the face of classic parsing techniques. (It doesn't really help with "GLR" and "Packrat".) If the compiler, due to parsing the grammar as context free but semantically relevant, is in the "this is expected to name a type phase" and a type is not found the compiler can generate more meaningful messages. Another example would be parsing while in the "this is expected to be a declaration OR a definition OR a nested invocation" phase, knowing that the only semantically meaningful option is invocation OR invalid due to multiple definition would let the compiler ignore the possibility of "this is expected to be a definition" side entirely.

    On the other hand, some parsing technology (as applied to compiler technology in general just not really C and C++) obsoletes the necessity for such things. "GLR" parsing of C will, consider the above example of a one or two argument function, parse the source as "it could be either - fix me up later". Some techniques, can't recall the name this moment, will instead "fork and shadow" using, of all things, tree shadowing algorithms resulting in two separate parsing trees literally sharing much of the same state and will continue to follow this "fork and shadow" approach throughout the source and, when if ever, a correct shadowy branch is found to be the "one true" branch the invalid shadows are destroyed and parsing from their perspective stops.

    On the other had, I AM GORO, a lot of older table generated parsing techniques just don't need these additional fixtures to be performant. (Actually, because some of these techniques are relatively new, the "old hat" still works best once the setup is in place to generate those tools.) A huge number of character by character translation tables may seem insane, but when you can eliminate entire categories with every step, you are going to have a pretty decent parser on your hands.

    *shrug*

    As a matter of interest, as of a few years back anyway, no grammar specification and automatic grammar tool has yet been created that can parse all of C++ without manually either fixing up the tables, adding semantic tags to the result, or editing the generated parser. C++ is a beast to parse.

    O_o

    Granted, most of the "hand written" parsers don't do a great job either. My point isn't to disparage those tools. (Bison is a great tool if you don't try and press it to far.) I'm only saying that C and C++ shouldn't be used as models of "what looks good".

    Soma

    [Edit]
    "climate" firefox? really?
    [/Edit]
    Last edited by phantomotap; 05-20-2011 at 04:31 PM. Reason: none of your business

  5. #5
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Yeah, my objective here is trying to understand the impact of moving a language that currently only allows monolithic class implementations (e.g., C#), into a new version that allows optional forward declarations of class members (e.g., C++). Of note the fact that under a language like C# this wouldn't introduce any new semantics. It's simply a code stylizing feature. So I must admit there is a bit of "what looks good" in here, but motivated by a desire to actually make C# class implementations saner than they are right now.

    But I get from your post that whether this can be manageable or not, will be largely dependent on what is implemented in the compiler already since I suppose the introduction of some of the techniques you mention come with prerequisites, no?
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  6. #6
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I actually think it should be the other way.

    The C++ way requires the programmer to maintain both the header file and the implementation, and they have to be in sync.

    The header file is more like just documentation for class users, so they don't have to scroll through all the code to find the function they want.

    IMHO, the header file should somehow be maintained programmatically, or maybe even not at all (like in Java). Instead, the IDE could parse the cpp file, and generate the list of things in the class for the programmer in real time (serving the purpose of a header file). It removes the redundancy between header and implementation files.

    Of course, it won't work technically in modern C++ for many reasons, but this is more of a high level idea.

  7. #7
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Trust me, I often debate with myself over that. But my rationale eventually always ends up converging to this:

    The focus is on code increased readability and reduced complexity (analytical complexity). And it is entirely a question of code formatting. As far as the structure is concerned, you don't even need to have an header file. A class can be defined and implemented in the same file, only it happens at different places. But monolithic structured classes (if you allow me the audacity of coming up with my own term to define classes that don't support member forward declarations) are very dependent on editor features. Most notably object browsers, declaration finders and code completion. Sharing code on public forums, printing code, or reading it from a featureless text editor makes it more difficult to at a glance observe the class public interface as a whole and isolate it from the its implementation. While this may actually be seen as a advantage since it tends to nudge programmers into being more strict about Separation of Concerns (an indirect benefit of trying to make their classes smaller and more readable), it's just a simple fact of life some classes (many?) will just be more complex. I'd go even further an argue that because feature rich editors facilitate ad-hoc class inspection, some of this benefit is lost because there's no longer a tendency to make them smaller.

    But you may argue that all I'm speaking about is some anecdotal evidence that doesn't really have an echo in the real world. After all, one wants their code in an editor, and a good editor at that. Copy pasting code to web forums or reading it from notepad or on a sheet of paper isn't about programming. I'm ready to accept that. But thing also is, these languages are about as multi-purpose and cross-compatible as one can get. It's entirely admissible to imagine a language like C# or Java to be running on systems which don't have the benefit of a feature rich editor. Heck, C# (at least) doesn't even require a backend framework in its specification, neither to be compiled to intermediary code. On the other hand, on languages like C#, forward declaration of class members has no effect on semantics, so it's entirely an optional process. The programmer may choose or not to take advantage of it. The advantage it gives, if they do take advantage of it, is that they can display a class skeleton for any kind of ad hoc consumption:

    a) How many times didn't we see in these forums a C++ full class definition but showing only just a few methods implemented? How beneficial was the full class definition to, not only understand the overall class structure, but also better evaluate the context of the implementation shown? This type of at-a-glance observation is impossible to do on Java or C# without extensive forum post post-editing. And will also be echoed on other type of code perusal situations. Even on the feature rich editor, since there's so much the object browser can do for us.

    b) While not intended as such, C++ class definitions do also make it much easier to immediately observe the class public interface and isolate it from the rest. In fact, many of us do tend to approach them that way, reordering declarations so public members are kept together. So, let's give this a thought for a moment: if we instinctively do this, isn't it true that is exactly because we recognize an advantage for us and others in doing so?

    c) Would you like to code C# classes on no less feature rich editors like Vim or Emacs? Or would you prefer to code C++ classes there?
    Last edited by Mario F.; 05-20-2011 at 07:25 PM. Reason: typos, clarity fixing, the usual...
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  8. #8
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    That makes sense. I didn't think about tool-less readability.

    Maybe it would be better if tools are used to generate header files from implementation files instead? But that also increases complexity of the process.

    I have tried coding Java using primitive editors, so I know how painful it is. I suppose Java is designed to be used with an IDE (or at least "intelligent" editors).

    I think this is much like TOC in books - nice to have around, but should be maintained automatically.

  9. #9
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    But I get from your post that whether this can be manageable or not, will be largely dependent on what is implemented in the compiler already since I suppose the introduction of some of the techniques you mention come with prerequisites, no?
    Correct. The grammar and semantic analysis capabilities have a large impact on whether or not those techniques will work or even if they will be useful.

    I should have mentioned: my exposure to "CSharp" is limited to one medium sized professional project and some personal experimentation every time the language gets a new feature. (I'm always looking out for a language that has all the facilities I want.) I'm only speculation on the grammar and what it may allow.

    Of course, it won't work technically in modern C++ for many reasons, but this is more of a high level idea.
    Not to change the topic, but those reasons are an artifact of "it is because it was". No reason would exist if the C++ inclusion and linking model was changed to a four component process with dynamic generation of semantic state being one stage and generics inspection and propagation being the other addition.

    I'd go even further an argue that because feature rich editors facilitate ad-hoc class inspection, some of this benefit is lost because there's no longer a tendency to make them smaller.
    I see what you are getting at, but to my mind this is more a product of a poor education in the areas of component design and general feature creep or code bloat.

    I don't think manually maintaining a declaration will help in those areas. Actually, I think it would probably make it worse. I will not name names, but look at how many arguments I've been involved with where some individuals insist on cramming as many interfaces as possible into a location solely because it is convenient to stick them there instead of separating them into a utility of their own. (In some cases, the reason being as simple as "But everything else I need to remember is already there.".)

    *shrug*

    Your mileage may vary.

    Soma

  10. #10
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by phantomotap View Post
    I see what you are getting at, but to my mind this is more a product of a poor education in the areas of component design and general feature creep or code bloat.

    I don't think manually maintaining a declaration will help in those areas. Actually, I think it would probably make it worse. I will not name names, but look at how many arguments I've been involved with where some individuals insist on cramming as many interfaces as possible into a location solely because it is convenient to stick them there instead of separating them into a utility of their own. (In some cases, the reason being as simple as "But everything else I need to remember is already there.".)

    *shrug*

    Your mileage may vary.
    Couldn't agree more if you said it again. But our mileage does vary. And despite all my efforts on contrary (singular purpose is an issue dear to me), I sometimes won't hesitate to put convenience ahead of correctness. Particularly because I'm only willing to take the concept so far. That is, I'm not willing to also of end with an object model that is a veritable haystack. Good gracious, no! Maybe if I was in the business of designing libraries... but I'm not.

    But generally speaking, indeed this is mostly controlled by us and our willingness to accept code correctness.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Can Nested class access parent class's private member?
    By meili100 in forum C++ Programming
    Replies: 4
    Last Post: 06-05-2009, 08:42 AM
  2. Replies: 25
    Last Post: 10-29-2007, 04:08 PM
  3. Replies: 4
    Last Post: 06-12-2007, 09:55 AM
  4. Replies: 2
    Last Post: 04-06-2005, 07:25 AM
  5. Class member variables only readable by member functions?
    By _Elixia_ in forum C++ Programming
    Replies: 4
    Last Post: 10-10-2003, 03:52 PM