Thread: A C program to parse a C program

  1. #1
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,339

    A C program to parse a C program

    I'm giving myself an exercise at writing a basic parser for reading C programs. My logic involves state switches (in_string, inside_parens, in_block_comment, etc.) and looking at individual characters, as I'm using fgetc() for input.

    I'm asking about the proper approach for parsing the file. I've written several hacked-out parsers for various needs over the years, but those were mostly short and sweet and disposable. My objective for writing a parser this time is to isolate all the function calls, standalone or nested.

    I usually take a divide and conquer approach, removing the low hanging fruit that I don't care about, like comments, and the contents of strings and trailing blanks, etc.

    Once I start reading characters, I'm setting states and keeping stats. How many characters read, how many "words" read (which, depending on your definition of a word, could mean many things), lines read, blah blah.

    Once i find an interesting character, like a double quote, or a single quote, or a back slash inside a pair of aposts, I change states.

    Should my parser loop, as it progresses with each new character, consider the state I'm in over the character I read, or should the character drive the state logic?

    For example
    Code:
    while(get a char) { 
       if (in_string) { .... }
    or
    Code:
    while (get a char) { 
       if (c=='"') { .... }
    What's your opinion?

    I'm finding it convenient to keep track of the previous character. I think it would also be handy to keep track of the next character. For example...

    previous is '/'
    current is '*'
    next is '/'

    Right now, if I'm working with current, an asterisk, I can see that previous is a slash. If I'm not in a string, then this is the start of a block comment. When I advance to the next loop, previous is now '*' and current is '/', and if I take into consideration current and previous, then I'm obviously at the end of a block comment, but that's not the real case. So, having 'next" would be handy in this case.
    Mainframe assembler programmer by trade. C coder when I can.

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,339
    Y'all can ignore this. I've decided that for the purposes of just finding all the function calls, I don't need this level of parsing (tokenizing). I can probably just set up a regex to find function names once I de-comment it and isolate the strings, and I have both of those working already.
    Mainframe assembler programmer by trade. C coder when I can.

  3. #3
    Registered User
    Join Date
    Apr 2021
    Posts
    144
    Actually, you cannot.

    It is well-known that C is ambiguous with respect to declarations vs. expressions, like:

    A * B() = C;

    In order to decide if that is a declaration or an expression statement, you must have a symbol table with type information. Regular expressions are not powerful enough for this.

    You may be able to use a regex and coding standards to get 95% of the way there. In fact, that is what C programmers historically did. But with the advent of ANSI C and more and more things involving parentheses, the remaining 5% has started to get bigger and bigger.

    If you write a lexer to do tokenization, plus the rudimentary symbol tracking necessary to recognize types, you will find yourself able to accomplish a surprising amount. And it will be useful code you might be able to re-use, as opposed to a snake-pit of regular expressions you won't understand three weeks from now.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 1
    Last Post: 01-25-2020, 05:06 PM
  2. Parse Error in my first program - help!!
    By Yoshi2007 in forum Tech Board
    Replies: 5
    Last Post: 11-05-2011, 08:51 AM
  3. Replies: 1
    Last Post: 03-03-2009, 04:47 PM
  4. Parse a program for functions, variables
    By Enu in forum C Programming
    Replies: 2
    Last Post: 02-15-2006, 11:08 AM
  5. parse error on my program
    By makveli in forum C++ Programming
    Replies: 5
    Last Post: 11-12-2003, 12:17 PM

Tags for this Thread