Thread: Simple, but I want to know.

  1. #1
    Registered User jadaces's Avatar
    Join Date
    Jun 2010
    Posts
    18

    Simple, but I want to know.

    This is a simple question I've searched Google but I cant find a single information on this.

    I what to know in GCC, what library (.a) file does (stdio.h) link to, and if someone could point me to a site, to all the other STANDARD header files (.a) file.

    I know GCC links it automatic, but just for knowledge sake I want to know.

    Thank you

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    gcc -v foo.c
    will show you all.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    My suggestion, if you want to see the linking process, is to run the linker yourself ld --verbose. I've never really appreciated the work that goes into a port like MinGW, but I can find things like libmsvcr90.a. That only makes sense because it's a port for windows.

    I doubt the answer is anything terribly obvious to people who don't work on GCC and it is bound to be different for all the ports to other operating systems. So again, to know, run the linker yourself.

  4. #4
    Registered User jadaces's Avatar
    Join Date
    Jun 2010
    Posts
    18
    Thanks all for the help, I'll try the both ideas.

    /************************************************** *****************************/
    The magic is to know how exactly it works, before using it to do your work.
    /************************************************** *****************************/

  5. #5
    Registered User
    Join Date
    Dec 2010
    Posts
    31
    Quote Originally Posted by jadaces View Post
    This is a simple question I've searched Google but I cant find a single information on this.

    I what to know in GCC, what library (.a) file does (stdio.h) link to, and if someone could point me to a site, to all the other STANDARD header files (.a) file.

    I know GCC links it automatic, but just for knowledge sake I want to know.

    Thank you

    Simpler that you think.

    The standard C library is called libc. Use wikipedia.org and search for libc.
    Also, on some unix type systems the command "man libc" will provide details.

  6. #6
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    You may find this interesting. Note that I ran this example on Linux, but almost all of these commands (including nm and probably ldd etc) are available with MinGW and definitely with Cygwin under Windows.

    First I'll write a simple program.
    Code:
    $ vim hello.c
    $ cat hello.c
    #include <stdio.h>
    
    int main() {
        printf("Hello, World!\n");
        return 0;
    }
    $ gcc hello.c -o hello
    $ ./hello
    Hello, World!
    $
    Now I want to find out where printf() comes from to make this program work. The first tool I'm going to use is called ldd. It shows which libraries are dynamically loaded by the program when it starts up.
    Code:
    $ ldd hello
            linux-gate.so.1 =>  (0xb7f11000)
            libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7da3000)
            /lib/ld-linux.so.2 (0xb7f12000)
    $
    The linux-gate and ld-linux libraries are linked into almost every executable as just part of the necessary code to get a program working. (ld-linux is actually the loader, which dynamically loads the other libraries.) But libc is, as kona49er has pointed out, the library which actually contains printf(). Let's have a look, using the program nm to print out all the symbols defined by a library.
    Code:
    $ nm /lib/i686/cmov/libc-2.7.so
    nm: /lib/i686/cmov/libc-2.7.so: no symbols
    $
    Drat. My libc does not have debugging symbols included. (This is for efficiency reasons but makes it more difficult to debug.) Since I have the package libc6-dbg installed, I actually do have these symbols in /usr/lib/debug, but that's a bit of a side detour. I'll try nm on the program I created instead.
    Code:
    $ nm hello
    080494b8 d _DYNAMIC
    0804958c d _GLOBAL_OFFSET_TABLE_
    0804848c R _IO_stdin_used
             w _Jv_RegisterClasses
    080494a8 d __CTOR_END__
    080494a4 d __CTOR_LIST__
    080494b0 D __DTOR_END__
    080494ac d __DTOR_LIST__
    080484a0 r __FRAME_END__
    080494b4 d __JCR_END__
    080494b4 d __JCR_LIST__
    080495ac A __bss_start
    080495a4 D __data_start
    08048440 t __do_global_ctors_aux
    08048320 t __do_global_dtors_aux
    080495a8 D __dso_handle
             w __gmon_start__
    0804843a T __i686.get_pc_thunk.bx
    080494a4 d __init_array_end
    080494a4 d __init_array_start
    080483d0 T __libc_csu_fini
    080483e0 T __libc_csu_init
             U __libc_start_main@@GLIBC_2.0
    080495ac A _edata
    080495b4 A _end
    0804846c T _fini
    08048488 R _fp_hw
    08048274 T _init
    080482f0 T _start
    080495ac b completed.5706
    080495a4 W data_start
    080495b0 b dtor_idx.5708
    08048380 t frame_dummy
    080483a4 T main
             U puts@@GLIBC_2.0
    $ nm hello | grep U
             U __libc_start_main@@GLIBC_2.0
             U puts@@GLIBC_2.0
    $
    GCC defines quite a few symbols even in a simple program like this one. But the undefined symbols (marked with U) are the ones which are pulled in from other libraries.

    But we called printf(), and the only thing here looks like a call to puts(), plus some nonsense about libc_start. Maybe we should look at the assembly for our C program to see what it looks like. I can ask GCC to compile assembly with -S, or I can even take the generated executable and disassemble it with objdump. [The grep -A just shows the 20 lines following <main> in the output.]
    Code:
    $ gcc -S hello.c
    $ cat hello.s
            .file   "hello.c"
            .section        .rodata
    .LC0:
            .string "Hello, World!"
            .text
    .globl main
            .type   main, @function
    main:
            leal    4(%esp), %ecx
            andl    $-16, %esp
            pushl   -4(%ecx)
            pushl   %ebp
            movl    %esp, %ebp
            pushl   %ecx
            subl    $4, %esp
            movl    $.LC0, (%esp)
            call    puts
            movl    $0, %eax
            addl    $4, %esp
            popl    %ecx
            popl    %ebp
            leal    -4(%ecx), %esp
            ret
            .size   main, .-main
            .ident  "GCC: (Debian 4.3.2-1.1) 4.3.2"
            .section        .note.GNU-stack,"",@progbits
    $ objdump -D hello | grep -A 20 '<main>'
    080483a4 <main>:
     80483a4:       8d 4c 24 04             lea    0x4(%esp),%ecx
     80483a8:       83 e4 f0                and    $0xfffffff0,%esp
     80483ab:       ff 71 fc                pushl  -0x4(%ecx)
     80483ae:       55                      push   %ebp
     80483af:       89 e5                   mov    %esp,%ebp
     80483b1:       51                      push   %ecx
     80483b2:       83 ec 04                sub    $0x4,%esp
     80483b5:       c7 04 24 90 84 04 08    movl   $0x8048490,(%esp)
     80483bc:       e8 13 ff ff ff          call   80482d4 <puts@plt>
     80483c1:       b8 00 00 00 00          mov    $0x0,%eax
     80483c6:       83 c4 04                add    $0x4,%esp
     80483c9:       59                      pop    %ecx
     80483ca:       5d                      pop    %ebp
     80483cb:       8d 61 fc                lea    -0x4(%ecx),%esp
     80483ce:       c3                      ret
     80483cf:       90                      nop
    
    080483d0 <__libc_csu_fini>:
     80483d0:       55                      push   %ebp
     80483d1:       89 e5                   mov    %esp,%ebp
    $
    The disassembled code is less useful, I just put it in there for fun to show it could be done. In GCC's generated assembly, look at the lines I've outlined in blue. Clearly it's calling puts(). And even if you don't know much assembly, you can look at the previous line and see a reference to LC0, which is clearly defined as our Hello, World! string.

    But notice something strange. The string doesn't have the newline at the end of it anymore. Time to read up on puts().
    Code:
    $ man puts
    ...
           int puts(const char *s);
    ...
           puts() writes the string s and a trailing newline to stdout.
    So GCC took our call to printf(), noticed that no format specifiers were used (%d, %s, or whatever) and that there was a trailing newline, and changed it to a call to puts(), which is more efficient! It's amazing what a compiler will do for you without you even realizing it.

    Hope you got something out of this post, and happy coding.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  7. #7
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    I know we're drifting a bit off topic from the original question, but I think the following link is inline with dwks' post and gives a great tutorial on what an ELF binary actually requires, and what all goes typically goes into it: A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux.

  8. #8
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by dwks View Post
    So GCC took our call to printf(), noticed that no format specifiers were used (%d, %s, or whatever) and that there was a trailing newline, and changed it to a call to puts(), which is more efficient! It's amazing what a compiler will do for you without you even realizing it.
    That's more troubling than impressive - what if you were linking against some *other* printf implementation (eg: a user defined function, or such)? Surprise!
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  9. #9
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Meh, if you want to link with your own printf() you have to disable linking with libc, which requires a few flags and this would probably disable this optimization as well. Anyway, I can't imagine many people want to do this. GCC does quite a few optimizations like this: memcpy() calls will be inlined, for example. So your call to memcpy() turns into . . . no call at all. And if you look at malloc() calls -- if the size of the block is small the block will be located in a static data segment instead of on the heap. Although I guess this is more of a glibc optimization than a compiler one.

    That's a nice link by the way -- I think I've seen it before but I read it again anyway for amusement value . . . .

    BTW, I've discovered that if you pass -D to nm it can extract dynamic symbols even from a stripped library.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  10. #10
    Registered User
    Join Date
    Jul 2007
    Posts
    131
    Quote Originally Posted by dwks View Post
    Meh, if you want to link with your own printf() you have to disable linking with libc, which requires a few flags and this would probably disable this optimization as well.
    No, you shouldn't have to. Linker is smart: the printf(3) in your objects will override the printf(3) coming from libraries.

    I don't understand how optimizing printf("%s\n", s) to puts(s) would be bad in ANY case, if linking to implementation a or other implementation (I didn't fully understand what you meant with "original" implementation and the other one). Standard puts(3) is always faster than printf(3) (don't dig some vm which has hardware printf(3)), and if you link with -lc, both are available.

    And if you look at malloc() calls -- if the size of the block is small the block will be located in a static data segment instead of on the heap. Although I guess this is more of a glibc optimization than a compiler one.
    What does that mean? Many malloc() implementations use sbrk() to get more .data so everything is stored in .data. Or are they allocated in .data in the executable and you aren't thinking?
    Last edited by fronty; 12-29-2010 at 04:30 PM.

  11. #11
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Quote Originally Posted by fronty View Post
    No, you shouldn't have to. Linker is smart: the printf(3) in your objects will override the printf(3) coming from libraries.
    You're probably right -- I know that with different dynamically loaded libraries specified with e.g. LD_LIBRARY_PATH you can replace or override symbols entirely, so it stands to reason the same would happen with libraries intentionally linked in. As long as you were careful to list the libraries in the right order.

    I don't understand how optimizing printf("%s\n", s) to puts(s) would be bad in ANY case, if linking to implementation a or other implementation (I didn't fully understand what you meant with "original" implementation and the other one). Standard puts(3) is always faster than printf(3) (don't dig some vm which has hardware printf(3)), and if you link with -lc, both are available.
    Which is precisely why the compiler does this optimization. I think Sebastiani's concern was that if a printf() was overridden but puts() was not, someone calling printf() and expecting their actual printf() implementation to be called [perhaps invoking some special behaviour] would be surprised. But of course if the implementation conforms to the standard it has to supply both puts() and printf() and if you're serious about overriding output functions you should override them all.

    What does that mean? Many malloc() implementations use sbrk() to get more .data so everything is stored in .data. Or are they allocated in .data in the executable and you aren't thinking?
    Most likely I'm not thinking. I only meant that if you call malloc() with a small enough size it uses a different allocation scheme, which I meant as an example that library functions are doing different optimizations that you as a programmer don't need to be aware of.
    Normally, malloc() allocates memory from the heap, and adjusts the size of the
    heap as required, using sbrk(2). When allocating blocks of memory larger than
    MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory
    as a private anonymous mapping using mmap(2). MMAP_THRESHOLD is 128 kB by
    default, but is adjustable using mallopt(3).
    From malloc(3) - Linux manual page
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  12. #12
    Registered User
    Join Date
    Jul 2007
    Posts
    131
    Quote Originally Posted by dwks View Post
    Which is precisely why the compiler does this optimization. I think Sebastiani's concern was that if a printf() was overridden but puts() was not, someone calling printf() and expecting their actual printf() implementation to be called [perhaps invoking some special behaviour] would be surprised. But of course if the implementation conforms to the standard it has to supply both puts() and printf() and if you're serious about overriding output functions you should override them all.
    I call raptors on a programmer who names his functions like standard functions but make them behave differently.

    Most likely I'm not thinking. I only meant that if you call malloc() with a small enough size it uses a different allocation scheme, which I meant as an example that library functions are doing different optimizations that you as a programmer don't need to be aware of.
    That clarifies things.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. creating very simple text editor using c
    By if13121 in forum C Programming
    Replies: 9
    Last Post: 10-19-2010, 05:26 PM
  2. Simple message encryption
    By Vicious in forum C++ Programming
    Replies: 10
    Last Post: 11-07-2004, 11:48 PM
  3. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  4. Simple simple program
    By Ryback in forum C++ Programming
    Replies: 10
    Last Post: 09-09-2004, 05:48 AM
  5. Need help with simple DAQ program
    By canada-paul in forum C++ Programming
    Replies: 12
    Last Post: 03-15-2002, 08:52 AM