How do you read code?
I've programmed with C for about 7 years now (10+ year linux user as well) and would like to help out with the community. I like the whole idea of "open source" and now I think its time that i put my skills to use in the programming world.
My question is:
On these big open source projects (30,000 line projects) whats the best way to jump into the code? These projects can be very difficult and complex to understand so whats the best way to jump right in? Is there a neutral program out there that can just step through code line by line like a debugger does?
How do you guys do it? I've been looking for books for a couple years now but had no luck on this topic. What do you guys think?
Do you guys use any tools to help you read code better? Let me know your strategies.
I find that this is less about C programming and more about reading (other people's) code in general, so I have moved this to the General Discussion board.
Personally, I have only grasped parts of large projects (e.g., parts of a library that interest me), or entire small projects (e.g., the TUT unit testing framework nightmare that I eventually gave up on). Since they were libraries rather than end user programs it was a matter of looking at the API and then peeking at the source code to see how they were implemented.
There are software called "Code Browser", which is a tool that reads the source code and cross-references it, then allows you to jump from one piece of code to another.
However, the other thing is that you can't just take a 30K lines software project, read all of it, and understand it [well, at least I know I can't - and I've been working with software for over 20 years now - including working on large open-source projects].
The best path, I think, is to have a project with a goal , and then implement that, along with suitable test-code to prove that the code works - hopefully there already is some existing test-code that proves that you haven't broken anything else. To achieve this goal, you need to study the code and understand the details of the part that needs modification. As a starter project, you don't want something that changes the entire structure of the entire source code - that requires, most likely, too much knowledge to be gathered before you start changing things.
The points I'm making:
1. You most often don't need to understand how the entire project in detail.
2. Start small - make small changes to begin with. Growing up with the project, increasing your knowledge and skill as you go along, is just about the only way to understand "all" of the project [and in some cases, you simply can't understand ALL of it anyways].
 This goal should be such that you believe from your current overall programming skills that you can achieve it without any major difficulty, and the goal should be small enough that you can set a reasonable time to achieve it [say 3 days or 5 weeks - depending on the time you are willing to spend, and the difficulty of the task itself]. It is, obviously, no point thinking that you can modify the scheduler in the Linux kernel if you have never worked on schedulers or in kernel code ever before - that's setting yourself up for failure.
You make a good point. From the looks of your experience as a kernel hacker, I have a scenario for you:
Originally Posted by matsp
Say you had the job of "porting" a kernel (lets say the linux kernel) to another architecture (Lets say, the ARM microprocessor).
Now by default, the kernel was orginally written for the x86 arch.
In order to do something like this, you obviously need to understand two things:
The arch of x86 and arch of ARM.
So heres my question:
Where do you start with that? I mean, kernels are typically huge and complex. You need an understanding of the entire codebase to recognize what needs changing and manipulating to "make it work" for another subsystem.
I used the "Linux Kernel" because its said to be very portable
Based on your experience, What are your thoughts on this? Tell me how you would approach this situation.
Some suggestions (not all free).
One of the better code editors I've used is Source Insight (http://www.sourceinsight.com/) which will create a cross-reference of a project, and allow you to browse that in an interactive fashion.
Another tool is Source Navigator (http://sourcenav.sourceforge.net/). The GUI is clunky to say the least, but the database it generates is very accessible if you want to do some kind of analysis which is outside the scope of what is provided.
The Linux Kernel has it's own cross referencing tool (http://lxr.linux.no/), which can be used for other projects as well.
> Where do you start with that?
Given that it's already been ported to several, perhaps there's a "howto" around, or perhaps even some kind of design documentation.
Also, anywhere where the ASM keyword is used will need to be ported. In conjunction with that, look out for any #ifdef conditionals which refer to symbols of an architecture nature, eg
Where you start is by looking at the general structure  and finding the "arch" (short for Architecture) dependent parts. SInce Linux has been ported to MANY different architectures, it's unlikely that any non-portable code is found outside the "arch/xxx" tree. Just find a architecture that you are familiar with [e.g. x86] [or alternatively, find an architecture similar to ARM] and start porting the parts that are different between x86 and ARM.
Originally Posted by someprogr
You will need to be quite familiar with the OS (Linux in this case) architecture, have good knowledge of both the "from" and "to" processors that you are porting between - particularly the "to" processor - and be able to read up on the "from" processor sufficiently to understand what the often strange concoctions that appear in the inline assembler does.
This is definitely not a trivial task - my first OS port, which wasn't a full-fledged Linux, took several months. The second one, which had less assembler in the first place, and I had more understanding of the "to" architecture thanks to the first port, took about two weeks to get going and 95% functional.
 According to this page, http://www.treblig.org/Linux_kernel_source_finder.html, there is already an ARM project for Linux, so I expect that you will not actually NEED to perform this task as such - but the above explains how to do the task in generic terms.
Since, as has been mentioned, it's not really possible to understand the code base for a large open source project, why not pick a project you would like to work on, and check their bug reports, get the sources and start fixing the bugs.
you will learn the code base by actively working on it to track down the bugs.
you will become a recognized contributor to the code for that project.
you get to help a project you think is worthwhile.
the project gets the bugs fixed.
the project gets another person developing for it.
the project grows better in every wa.