Thread: Any tricks to understanding a source code repository?

  1. #1
    Registered User
    Join Date
    May 2006
    Posts
    57

    Any tricks to understanding a source code repository?

    I'm wandering about the file in the trunk of an open source repository. It's pretty big.
    Are there any naming conventions to help me find my way around? Especially I want to know the files that are the 'main' files, which are run and use the classes. Otherwise it will take me years to figure out what is going on!

    Thanks

  2. #2
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    I don't see what that has to do with a source code repository? Is your question more of "How do I read and understand someone else's source code?"?

  3. #3
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Understanding code isn't science. It's an art.

    I've spent half a year at a company resolving bugs in a huge PHP codebase (millions of lines of code). I've read dozens of open source projects (or even assembly dumps) looking for bugs.

    I can give you one piece of advise if needed though: make notes on the code. Write down, in your own words, what functions do. And if functions are terribly hard to understand, rewrite them to make them easier to understand but be fairly careful not to change anything important (I actually never did the latter, but I can imagine code being so horrible that you'll need to do this to understand it).

  4. #4
    Registered User jeffcobb's Avatar
    Join Date
    Dec 2009
    Location
    Henderson, NV
    Posts
    875
    One thing that I find helpful (if the code is in reasonably-good shape OO-wise) is a Linux tool called Umbrello. I can point this at a directory of H and CPP files and it can build complete UML diagrams from the source so you can see in a graphical manner this class is a member of that one, this class is derived from that and called from the other, etc. Failing that any good source code cross-reference tool can help you understand a codebase.

    Finally, building and then single-stepping through code can help you understand the flow of logic.

    Beyond this it is a matter of patience and persistence. There are few shortcuts..
    C/C++ Environment: GNU CC/Emacs
    Make system: CMake
    Debuggers: Valgrind/GDB

  5. #5
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Definitely your tools come in handy here. View the source in your editor/IDE of choice and use it's functionality to help you find your way around. Eg, in vim if I click on a variable, function name, define, whatever some thing I dunno even what it is and press "[I" I get a list of sourced files where this is referred to (eg, PATH_MAX):

    /usr/include/bits/stdlib.h
    1: 35 |___|___ "least PATH_MAX bytes long buffer");
    2: 42 #if defined _LIBC_LIMITS_H_ && defined PATH_MAX
    3: 43 if (__bos (__resolved) < PATH_MAX)
    ~/C/test.c
    4: 12 |___PATH_MAX
    Doesn't completely solve that one but it helps. Any decent editor/IDE should have at least one way to do this kind of thing, vim there are quite a few, you can jump to the file, you can jump to other instances in this file, etc.

    I actually wrote a GUI a few years ago that will present all the pages in a directory tree with a single term highlighted, I think most IDE's have some functionality akin to this. Or you can just use simple tools like "grep -R", depends on your platform.

    You can use a debugger and set breakpoints, etc, to trace execution too. Intuition and experience are big factors, doing this for the first time on a big tree source might be a little too frustrating -- good luck. Vis "the main files" -- well, there is only one main()...usually a good place to start.
    Last edited by MK27; 04-10-2010 at 11:02 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  6. #6
    Registered User
    Join Date
    May 2006
    Posts
    57
    Thanks for the ideas.

    The trunk folder in this repository has about twenty folders, and some of the folders have twenty or so folders, and some of these folders have twenty or so source code files. So it's quite a bit of stuff to work through.

    I've been trying to back track included files through their work notes, emails on improvements and updates they have been doing. That is the only search function this website has. Sometimes the file doesn't seem to be where the email says it is...

    To use the tools you guys mentioned to do searches, I have to have the entire trunk folder copied into my computer, correct?

  7. #7
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by darsunt View Post
    To use the tools you guys mentioned to do searches, I have to have the entire trunk folder copied into my computer, correct?
    Yep -- well, you need all the files you would need to successfully compile the project, which it would be plain stupid to start hacking code that you have not yet even compiled.

    Are you sure you need the entire trunk and not just a branch?
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #8
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Finally, building and then single-stepping through code can help you understand the flow of logic.
    This is the really the only way to really understand the code given that you understand the language it is written in. You must compile it, get it running, and single step through it in the debugger. This will save you a lot of code spelunking.

  9. #9
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Quote Originally Posted by Bubba View Post
    This is the really the only way to really understand the code given that you understand the language it is written in. You must compile it, get it running, and single step through it in the debugger. This will save you a lot of code spelunking.
    I wouldn't say it's the only way. I prefer a couple of UML diagrams, then read the code in the hairy parts. Sure it takes time, but everything does. But you've got a problem when the documentation is out of sync with the code.

  10. #10
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    I'm used to working in code lines with little to no documentation or with incorrect and out of date documentation. In this type of situation the only tool left in the box is to single step the code and figure out exactly what is going on and when. In a code base with millions of lines of code it is a bit unrealistic to think you can manually go through it and figure it all out. It's not impossible but it will take a lot of time and you will still probably miss some things.

  11. #11
    Registered User
    Join Date
    May 2006
    Posts
    57
    Thanks, I never thought of downloading and compiling the entire source code. Lack of real world experience, I suppose. But that might be the trick I need to understand this monster. I've been trying to figure out just one class, it would be great to know everywhere it gets called during program execution.

    I don't need to understand the program thoroughly. I just need to understand it enough to decide if I want to put time into contributing for it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. 'Type' Error on Build of Officially Released Source Code
    By Jedi_Mediator in forum C++ Programming
    Replies: 5
    Last Post: 07-07-2008, 05:28 PM
  2. How to make a program that prints it's own source code???
    By chottachatri in forum C++ Programming
    Replies: 38
    Last Post: 03-28-2008, 07:06 PM
  3. DxEngine source code
    By Sang-drax in forum Game Programming
    Replies: 5
    Last Post: 06-26-2003, 05:50 PM
  4. Lines from Unix's source code have been copied into the heart of Linux????
    By zahid in forum A Brief History of Cprogramming.com
    Replies: 13
    Last Post: 05-19-2003, 03:50 PM
  5. C source code for int25 or code help
    By Unregistered in forum C Programming
    Replies: 0
    Last Post: 09-26-2001, 02:04 AM