You may find this interesting. Note that I ran this example on Linux, but almost all of these commands (including nm and probably ldd etc) are available with MinGW and definitely with Cygwin under Windows.
First I'll write a simple program.
Code:
$ vim hello.c
$ cat hello.c
#include <stdio.h>
int main() {
printf("Hello, World!\n");
return 0;
}
$ gcc hello.c -o hello
$ ./hello
Hello, World!
$
Now I want to find out where printf() comes from to make this program work. The first tool I'm going to use is called ldd. It shows which libraries are dynamically loaded by the program when it starts up.
Code:
$ ldd hello
linux-gate.so.1 => (0xb7f11000)
libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7da3000)
/lib/ld-linux.so.2 (0xb7f12000)
$
The linux-gate and ld-linux libraries are linked into almost every executable as just part of the necessary code to get a program working. (ld-linux is actually the loader, which dynamically loads the other libraries.) But libc is, as kona49er has pointed out, the library which actually contains printf(). Let's have a look, using the program nm to print out all the symbols defined by a library.
Code:
$ nm /lib/i686/cmov/libc-2.7.so
nm: /lib/i686/cmov/libc-2.7.so: no symbols
$
Drat. My libc does not have debugging symbols included. (This is for efficiency reasons but makes it more difficult to debug.) Since I have the package libc6-dbg installed, I actually do have these symbols in /usr/lib/debug, but that's a bit of a side detour. I'll try nm on the program I created instead.
Code:
$ nm hello
080494b8 d _DYNAMIC
0804958c d _GLOBAL_OFFSET_TABLE_
0804848c R _IO_stdin_used
w _Jv_RegisterClasses
080494a8 d __CTOR_END__
080494a4 d __CTOR_LIST__
080494b0 D __DTOR_END__
080494ac d __DTOR_LIST__
080484a0 r __FRAME_END__
080494b4 d __JCR_END__
080494b4 d __JCR_LIST__
080495ac A __bss_start
080495a4 D __data_start
08048440 t __do_global_ctors_aux
08048320 t __do_global_dtors_aux
080495a8 D __dso_handle
w __gmon_start__
0804843a T __i686.get_pc_thunk.bx
080494a4 d __init_array_end
080494a4 d __init_array_start
080483d0 T __libc_csu_fini
080483e0 T __libc_csu_init
U __libc_start_main@@GLIBC_2.0
080495ac A _edata
080495b4 A _end
0804846c T _fini
08048488 R _fp_hw
08048274 T _init
080482f0 T _start
080495ac b completed.5706
080495a4 W data_start
080495b0 b dtor_idx.5708
08048380 t frame_dummy
080483a4 T main
U puts@@GLIBC_2.0
$ nm hello | grep U
U __libc_start_main@@GLIBC_2.0
U puts@@GLIBC_2.0
$
GCC defines quite a few symbols even in a simple program like this one. But the undefined symbols (marked with U) are the ones which are pulled in from other libraries.
But we called printf(), and the only thing here looks like a call to puts(), plus some nonsense about libc_start. Maybe we should look at the assembly for our C program to see what it looks like. I can ask GCC to compile assembly with -S, or I can even take the generated executable and disassemble it with objdump. [The grep -A just shows the 20 lines following <main> in the output.]
Code:
$ gcc -S hello.c
$ cat hello.s
.file "hello.c"
.section .rodata
.LC0:
.string "Hello, World!"
.text
.globl main
.type main, @function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
movl $.LC0, (%esp)
call puts
movl $0, %eax
addl $4, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",@progbits
$ objdump -D hello | grep -A 20 '<main>'
080483a4 <main>:
80483a4: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483a8: 83 e4 f0 and $0xfffffff0,%esp
80483ab: ff 71 fc pushl -0x4(%ecx)
80483ae: 55 push %ebp
80483af: 89 e5 mov %esp,%ebp
80483b1: 51 push %ecx
80483b2: 83 ec 04 sub $0x4,%esp
80483b5: c7 04 24 90 84 04 08 movl $0x8048490,(%esp)
80483bc: e8 13 ff ff ff call 80482d4 <puts@plt>
80483c1: b8 00 00 00 00 mov $0x0,%eax
80483c6: 83 c4 04 add $0x4,%esp
80483c9: 59 pop %ecx
80483ca: 5d pop %ebp
80483cb: 8d 61 fc lea -0x4(%ecx),%esp
80483ce: c3 ret
80483cf: 90 nop
080483d0 <__libc_csu_fini>:
80483d0: 55 push %ebp
80483d1: 89 e5 mov %esp,%ebp
$
The disassembled code is less useful, I just put it in there for fun to show it could be done. In GCC's generated assembly, look at the lines I've outlined in blue. Clearly it's calling puts(). And even if you don't know much assembly, you can look at the previous line and see a reference to LC0, which is clearly defined as our Hello, World! string.
But notice something strange. The string doesn't have the newline at the end of it anymore. Time to read up on puts().
Code:
$ man puts
...
int puts(const char *s);
...
puts() writes the string s and a trailing newline to stdout.
So GCC took our call to printf(), noticed that no format specifiers were used (%d, %s, or whatever) and that there was a trailing newline, and changed it to a call to puts(), which is more efficient! It's amazing what a compiler will do for you without you even realizing it.
Hope you got something out of this post, and happy coding.