Thread: hardest part of parsing c++?

  1. #1
    Registered User
    Join Date
    Jan 2005
    Posts
    108

    hardest part of parsing c++?

    So for a university project I'm thinking of writing a c++ parser for analytical purposes.

    Now, there are open-source parsers that can do this (clang and g++'s parser, but only the former can probably be used easily), but since this is a parsing-themed project I can't use that.

    I asked around for a bit and someone said that the hardest thing about parsing c++ is NOT the #define, #ifdefs and templates, but something else. I hadn't got an answer back on what it was, but what do people think the hardest part of parsing c++ is?

  2. #2
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    So for a university project I'm thinking of writing a c++ parser for analytical purposes.
    For the sake of your grades, do something else.

    Soma

  3. #3
    Registered User
    Join Date
    Jan 2005
    Posts
    108
    I forgot to mention, I have two semesters (a year) and the promise of 100% grade for this. Well, if it does what I want it to do, that is. That, and I don't have to attend much classes as opposed to taking some other courses.

    In short, it's like a super thesis. Would it still be worth it?

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    I'd say that would depend on your deadline...

    Parsing a name out of a string is one thing... parsing source code --especially with a couple of dozen keywords, operators, scopes, classes, templates, and over a thousand library functions is not something you do in a few days... or even a month.

    Unless you pick some extremely elemental task such as counting the frequency of keywords or checking bracket matching, you're in for a big job and a long haul.

    Yes it would be a great project... but can you do it in the time allotted?

  5. #5
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Would it still be worth it?
    Unless they only care if it parses some given percent (like 83%), you are asking to fail.

    Edison Design Group, the GCC community, Microsoft, Sun, and I can names dozens more, have all failed to parse all of C++ correctly after trying for years.

    Soma

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Parsing a name out of a string is one thing... parsing source code --especially with a couple of dozen keywords, operators, scopes, classes, templates, and over a thousand library functions is not something you do in a few days... or even a month.
    O_o

    You only need to parse the language; if you can parse the language, you can parse "over a thousand library functions" by definition.

    Soma

  7. #7
    Registered User
    Join Date
    Jan 2005
    Posts
    108
    Quote Originally Posted by phantomotap View Post
    Unless they only care if it parses some given percent (like 83%), you are asking to fail.

    Edison Design Group, the GCC community, Microsoft, Sun, and I can names dozens more, have all failed to parse all of C++ correctly after trying for years.

    Soma
    Hmm, I see. Well, I doubt I'll have to parse absolutely everything, just enough to do code analysis on a code base of roughly 100k SLOC.

    From what you said though, it sounds like actually getting one working would be akin to finding gold.. and there' sactually stuff that the GCC can't parse? That's crazy.

  8. #8
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    It is actually a topic for debate as to whether gcc (and most other compilers) have trouble parsing the language, or can parse the language and then have trouble processing the output from the parser in order to produce output (for example, the compiled object). The location of problems actually depend on the architecture of the particular compiler: for example, where are the boundaries between the compiler front end (which is normally where a C++ grammar parser might be considered to reside), middle end (which will notionally receive some form of intermediate representation of the program being compiled from the front end and do various transformations), or back end (which squirts out the final compiled code).

    In terms of your project, you would be better of designing a small language that is complete and unambiguous, and writing a parser for that. Practically, this is called "limit your scope to something achievable in the time you have". Or pick another language (for example pascal) for which writing a compiler is often considered to be less complex.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by grumpy
    In terms of your project, you would be better of designing a small language that is complete and unambiguous, and writing a parser for that. Practically, this is called "limit your scope to something achievable in the time you have".
    Agreed. I recall writing a static analysis tool in C++, including the parser, for a simplified version of C as a group project in university. Team of 6 students, 2 modules (out of a usual 5) worth of workload each, yet we barely completed it on time (with various bugs). That said, most of my team were rather new to C++, so they had to cross that hurdle too, which took a bit of time.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > That, and I don't have to attend much classes as opposed to taking some other courses.
    > In short, it's like a super thesis. Would it still be worth it?
    Is there a large market for C++ grammar wizards in your area?
    Other courses might be more work (though I doubt it compared to what you propose), and you'll have more choices later on.

    Also, if you're not really interested in the parsing, but the analysis which follows, then consider this
    GCC-XML

    Writing another (bad) parser for C++ won't get you very far - there are enough C++ parsers in the world to be getting on with.
    But new and interesting analysis tools might be unique and useful.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Pascal is a much easier language to write a parser for, especially if you're familiar with the language.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. String parsing(parsing comments out of HTML file)
    By slcjoey in forum C# Programming
    Replies: 0
    Last Post: 07-29-2006, 08:28 PM
  2. Part 243?!?
    By RadRacer in forum A Brief History of Cprogramming.com
    Replies: 9
    Last Post: 11-28-2004, 07:56 PM
  3. Who are you? Part 2
    By Yoshi in forum A Brief History of Cprogramming.com
    Replies: 18
    Last Post: 12-05-2003, 11:31 AM