Thread: A C program to parse a C program

  1. #1
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Chappell Hill, Texas

    A C program to parse a C program

    I'm giving myself an exercise at writing a basic parser for reading C programs. My logic involves state switches (in_string, inside_parens, in_block_comment, etc.) and looking at individual characters, as I'm using fgetc() for input.

    I'm asking about the proper approach for parsing the file. I've written several hacked-out parsers for various needs over the years, but those were mostly short and sweet and disposable. My objective for writing a parser this time is to isolate all the function calls, standalone or nested.

    I usually take a divide and conquer approach, removing the low hanging fruit that I don't care about, like comments, and the contents of strings and trailing blanks, etc.

    Once I start reading characters, I'm setting states and keeping stats. How many characters read, how many "words" read (which, depending on your definition of a word, could mean many things), lines read, blah blah.

    Once i find an interesting character, like a double quote, or a single quote, or a back slash inside a pair of aposts, I change states.

    Should my parser loop, as it progresses with each new character, consider the state I'm in over the character I read, or should the character drive the state logic?

    For example
    while(get a char) { 
       if (in_string) { .... }
    while (get a char) { 
       if (c=='"') { .... }
    What's your opinion?

    I'm finding it convenient to keep track of the previous character. I think it would also be handy to keep track of the next character. For example...

    previous is '/'
    current is '*'
    next is '/'

    Right now, if I'm working with current, an asterisk, I can see that previous is a slash. If I'm not in a string, then this is the start of a block comment. When I advance to the next loop, previous is now '*' and current is '/', and if I take into consideration current and previous, then I'm obviously at the end of a block comment, but that's not the real case. So, having 'next" would be handy in this case.
    Mainframe assembler programmer by trade. C coder when I can.

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Chappell Hill, Texas
    Y'all can ignore this. I've decided that for the purposes of just finding all the function calls, I don't need this level of parsing (tokenizing). I can probably just set up a regex to find function names once I de-comment it and isolate the strings, and I have both of those working already.
    Mainframe assembler programmer by trade. C coder when I can.

  3. #3
    Registered User
    Join Date
    Apr 2021
    Actually, you cannot.

    It is well-known that C is ambiguous with respect to declarations vs. expressions, like:

    A * B() = C;

    In order to decide if that is a declaration or an expression statement, you must have a symbol table with type information. Regular expressions are not powerful enough for this.

    You may be able to use a regex and coding standards to get 95% of the way there. In fact, that is what C programmers historically did. But with the advent of ANSI C and more and more things involving parentheses, the remaining 5% has started to get bigger and bigger.

    If you write a lexer to do tokenization, plus the rudimentary symbol tracking necessary to recognize types, you will find yourself able to accomplish a surprising amount. And it will be useful code you might be able to re-use, as opposed to a snake-pit of regular expressions you won't understand three weeks from now.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 1
    Last Post: 01-25-2020, 05:06 PM
  2. Parse Error in my first program - help!!
    By Yoshi2007 in forum Tech Board
    Replies: 5
    Last Post: 11-05-2011, 08:51 AM
  3. Replies: 1
    Last Post: 03-03-2009, 04:47 PM
  4. Parse a program for functions, variables
    By Enu in forum C Programming
    Replies: 2
    Last Post: 02-15-2006, 11:08 AM
  5. parse error on my program
    By makveli in forum C++ Programming
    Replies: 5
    Last Post: 11-12-2003, 12:17 PM

Tags for this Thread