Thread: Parse a program for functions, variables

  1. #1
    Registered User
    Join Date
    Feb 2006

    Parse a program for functions, variables

    For a school project, my group is trying to make a linux C program to document the relationships between files, global function, and global variables in a large program. In lieu of parsing through .c files to try and guess which functions and variable are accessed from different function, we were hoping there was some tool (maybe gcc intermediate output) that could do this for us and be easier to read.

    I've tried looking at stabs and dwarf-2 assembly files, and output from nm, readelf, and objdump, but can't find anything that notes which functions and variable are accessed from a function. Is this possible at all?

    Thanks for any help, and let me know if this needs clarification.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    The edge of the known universe
    Here are a couple of free tools which may be of some interest.

    Source navigator already does some of what you want to do, but it's internal data contains a lot more information than can be displayed using the UI. Perhaps you could extend it

    SN data files can also be output as text files for further analysis.

  3. #3
    Sr. Software Engineer filker0's Avatar
    Join Date
    Sep 2005
    West Virginia
    What you're writing is called a "Cross Reference" program. What you want (assuming that you're allowed to use code that you don't write yourselves to parse the source text) is an ANSI C parser, of which there are several available in open-source (and public domain) projects, both old and new.

    Though you can do cross references from object files, this is somewhat less useful than you might think. In order to get the detail that you desire, you need to go into the debugging information, and even that can be a bit off from what the source says due to optimization. Furthermore, the object file analysis only works if the source in question compiles without errors, thus produces an object file. A source-based cross reference program can deal with bad code to some extent, if written appropriately.

    Generally, a good cross-reference program can tell you, in tabular form, the name of the identifier, the type and qualifiers of the identifier, what file it is defined in, what line of that file it is defined in, whether it is initialized at its definition point, the files and functions in which it is referenced (it may be referenced outside of a function, as in an initializer to some other identifier), the type of reference (modify/set/reference/dereference/sizeof), and other pertinent details as appropriate to the kind of identifier it is (parameters for functions, etc.).

    Know the difference between identifiers and variables -- all variables are identifiers, but not all identifiers are variables (functions, structs, unions, typedefs, structure member tags, and enum-members, for example).

    Also, know the difference between declarations and definitions. A declaration describes the identifier, but a definition creates the object the identifier refers to. An extern declaration or a function prototype tell the compiler what the object looks like, but not where it is. Know what a "soft definition" and a "hard definition" is -- one acts like a declaration if another definition is found, and can occur in multiple places, the other is always a definition and must be unique.

    Keep in mind that, in C, a variable can be defined in several files, but only given a static initialization in one or zero files. This should not cause a problem for the cross reference program (even if it's initialized in several files) -- you just have to handle that case properly. Also, you may never see a definition for a global identifier that is used within a source, even if you find a declaration for it. A cross reference program can be used to figure out why something isn't linking.

    You should ask your professor whether you need to worry about header files; often typedefs/structs/enums are declared in header files that are then included by many source files. Also, #defined pre-processor symbols. You'd be amazed at just how much harder this makes the project. If you look at the man page for cxref(1) you might get some ideas as to features of similar programs.
    Insert obnoxious but pithy remark here

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 03-07-2009, 09:07 AM
  2. Client-server system with input from separate program
    By robot-ic in forum Networking/Device Communication
    Replies: 3
    Last Post: 01-16-2009, 03:30 PM
  3. Replies: 4
    Last Post: 06-17-2005, 08:54 AM
  4. scripting program (variables)
    By kinghajj in forum C Programming
    Replies: 13
    Last Post: 01-06-2004, 10:17 PM
  5. My program, anyhelp
    By @licomb in forum C Programming
    Replies: 14
    Last Post: 08-14-2001, 10:04 PM