Thread: Why do compilers produce so much bloated executables?

  1. #1
    Registered User
    Join Date
    Oct 2021
    Posts
    138

    Why do compilers produce so much bloated executables?

    Hello! First of all, let me link this: [GitHub - vishen/go-x64-executable: Generate ELF Linux 64-bit (x86-64) executable manually] repo which is a program written in the Go programming language that manually creates the smallest ELF executable (for linux). So, the program just creates an executable that writes a message to STDOUT and then exits. The size of the executable is `232` bytes!!!! The "tiny" executable only had the ".data" and ".text" segments so it is truly the smallest possible program.

    Ok, it's known that the absolute minimal is not what we want in practical productive level so we would expect compilers to produce something less minimal and agree on a standard. So, how much bigger you would expect the equivalent executable to be? 3 times bigger? 5 times bigger? Well, let's see! I used the following code:

    Code:
    const char* msg = "Hello, world!\n";
    
    void _start() {
      asm(
        "syscall"
          :: "a" (1), "D" (1), "S" (msg), "d" (14)
      );
    
      asm(
        "syscall"
          :: "a" (60), "D" (0)
      );
    }
    I compiled using "GCC" and "TCC" to have a comparison between different compilers. The commands I used to compile are the following:

    GCC:
    Code:
    gcc -Ofast test.c -o test_gcc -nostdlib
    TCC:
    Code:
    tcc test.c -o test_tcc -nostdlib
    The results I got were pretty shocking for me! I used `du` with the 'b' flag to get the result in bytes. The results were the following:

    ```
    14K test_gcc
    5,1K test_tcc
    232 go-x64-executable/tiny-x64
    ```

    What's going on here????? The executable produced with GCC is 60.3 (!!!!) times bigger than the smallest possible program! That's crazy!!! The program produced with TCC is 20 times bigger. This is still a huge difference but funny enough, compared to GCC, this is not as crazy.

    So again, what's going on here? What's all that data that are generated by the compilers? This may be very specific but still, I thought, I'll take my shot and ask in case anyone knows.

    For comparison (and out of curiosity of course), I also tried to create an executable in assembly using "GAS" to compile and "GOLD" to link. The code is the following:

    Code:
    .global _start
    
    .data
      msg: .string "Hello, world!\n"
    
    .text
    _start:
      mov $1,   %eax
      mov $1,   %edi
      mov $msg, %rsi
      mov $14, %edx
      syscall
    
      mov $60, %eax
      mov $0,  %edi
      syscall
    and the command used to compile is:
    Code:
    as test.asm -o test.o && ld.gold test.o -o test_as && ./test_as
    The size of the final executable is 1016 bytes. This is way more logical! Of course, in the case of using an assembler and linker, I would also expect the ability to only include the segments that you are writing (which I wasn't able to find how to do in the "man" pages) but that's another topic for another time and another place!

  2. #2
    Registered User rstanley's Avatar
    Join Date
    Jun 2014
    Location
    New York, NY
    Posts
    1,115
    rempas:

    Why are you so concerned with the size of an executable? With most modern hard drives measured in multiple Terabytes, and ultra fast 8+ core processors, with programs running on a hosted system, the size of the executable is not an issue.

    Embedded systems are a different matter.

    Concentrate on what the program actually does, and the efficiency of the code, not the size.

    This is not the '80's with 8 and 16 bit systems! ;^) You probably never dealt with "16-bit x86 segmented memory architecture"!

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,662
    Code:
    $ gcc -Ofast foo.c -o test_gcc -nostdlib
    $ du -b test_gcc 
    14312	test_gcc
    $ size test_gcc
       text	   data	    bss	    dec	    hex	filename
        287	    280	      0	    567	    237	test_gcc
    The size of the file is not the size of the program.
    The size program gives you a better idea of what ends up in memory.

    Code:
    $ strip test_gcc 
    $ du -b test_gcc 
    13520	test_gcc
    $ size test_gcc
       text	   data	    bss	    dec	    hex	filename
        287	    280	      0	    567	    237	test_gcc
    Stripping out some of the unnecessary fluff helps a little.

    Code:
    $ objdump -x test_gcc | less
    $ hd -v test_gcc | less
    objdump has many options.
    If you just hexdump it, there's a lot of zero padding going on.
    That's just the nature of the file format.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by rstanley View Post
    rempas:

    Why are you so concerned with the size of an executable? With most modern hard drives measured in multiple Terabytes, and ultra fast 8+ core processors, with programs running on a hosted system, the size of the executable is not an issue.

    Embedded systems are a different matter.

    Concentrate on what the program actually does, and the efficiency of the code, not the size.

    This is not the '80's with 8 and 16 bit systems! ;^) You probably never dealt with "16-bit x86 segmented memory architecture"!
    Ok, first of all. Why you are making the conclusion that I'm bothered by the executable size? I cannot make, I just make a question out of interested and interest my friend?

    However, one reason that I may consider it important is that I love when there are not tons of unnecessary things (in general). Yeah, I get your points but still, this doesn't say anything to me. With that logic, why don't we start writing kernels in Python then? I hope you see my point. And like I said, 60 times bigger is a huge difference. I would consider 3-8 been fine but anything bigger than the needs to be explained so I can understand it.

  5. #5
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Salem View Post
    Code:
    $ gcc -Ofast foo.c -o test_gcc -nostdlib
    $ du -b test_gcc 
    14312    test_gcc
    $ size test_gcc
       text       data        bss        dec        hex    filename
        287        280          0        567        237    test_gcc
    The size of the file is not the size of the program.
    The size program gives you a better idea of what ends up in memory.
    Yeah, sorry! I was talking about the size of the file. I thought it was clear.

    Quote Originally Posted by Salem View Post
    Code:
    $ strip test_gcc 
    $ du -b test_gcc 
    13520    test_gcc
    $ size test_gcc
       text       data        bss        dec        hex    filename
        287        280          0        567        237    test_gcc
    Stripping out some of the unnecessary fluff helps a little.
    Are the "unnecessary" stuff the sections? If I'm not wrong, the final executable needs only the segments right?
    If the answer to both of my questions is "yes" then why isn't the linker doing that automatically for us? Also striping them seem to made a better work on the executable that was produced with GAS and GOLD
    rather than the ones that were produced with GCC and TCC.

    Quote Originally Posted by Salem View Post
    Code:
    $ objdump -x test_gcc | less
    $ hd -v test_gcc | less
    objdump has many options.
    If you just hexdump it, there's a lot of zero padding going on.
    That's just the nature of the file format.
    I know that zero padding is necessary in some fields in the ELF header but why is there a lot of padding in other places as well?

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,662
    It's just a file format.

    You could try using the 'objcopy' program to copy only text/data/bss and see what the result is.

    Between obdump and objcopy, there's lots of stuff to try that should keep you busy for days.

    > I know that zero padding is necessary in some fields in the ELF header but why is there a lot of padding in other places as well?
    Ask the authors of the file format.
    But a lot of the reason might be down to there are other measures of efficiency that are not related to simply how many bytes there are in a file.

    Like keeping things on page boundaries
    Code:
    00001000  f3 0f 1e fa b8 01 00 00  00 48 8b 35 f0 2f 00 00  |.........H.5./..|
    00001010  ba 0e 00 00 00 89 c7 0f  05 b8 3c 00 00 00 31 ff  |..........<...1.|
    00001020  0f 05 c3 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    ...
    00002000  48 65 6c 6c 6f 2c 20 77  6f 72 6c 64 21 0a 00 00  |Hello, world!...|
    00002010  01 1b 03 3b 14 00 00 00  01 00 00 00 f0 ef ff ff  |...;............|
    00002020  30 00 00 00 00 00 00 00  14 00 00 00 00 00 00 00  |0...............|
    00002030  01 7a 52 00 01 78 10 01  1b 0c 07 08 90 01 00 00  |.zR..x..........|
    00002040  10 00 00 00 1c 00 00 00  b8 ef ff ff 23 00 00 00  |............#...|
    You could always try
    gcc -v -Ofast foo.c -o test_gcc -nostdlib
    to see what options the compiler / assembler / linker are using.
    Maybe you can alter those from the command line.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    I can see a number of reasons why, some general, some specific to GCC.

    First, much of it has to do with the linker, rather than the compiler per se. Part of it also has to do with the libraries, specifically the difference between static libraries and dynamically shared libraries. With most C compilers, the default is to use static libraries, and in most instances it will link in the whole library, rather than just the portions used. While I can't say for certain that the tool you are using changes the linkages to use shared libraries (the most likely case), I am guessing that if it isn't, it is at least removing the unused library code.

    Also, most code includes a lot of extraneous information for debugging purposes. With ELF, it includes all of the symbol information in the code, in order to allow the code to be debugged symbolically. This information can - and often is, for production code - removed using the strip utility.

    Finally, with GCC specifically, you need to understand that GCC isn't a compiler, exactly: it is a driver program which can call on the compiler, the assembler, and the linker as needed. While this design is flexible and well-suited for a compiler suite such as GCC, it does involve some inherent overhead both in the compilation process and in the resulting executables, as the linker (ld) doesn't trim the unused parts of the libraries.

    And it could be worse, much worse. There was a notorious Eiffel compiler for Windows in the 1990s which produced a 1 MiB executable for "Hello, World!", because it would link in all of the Eiffel runtime code into every single program.

  8. #8
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    A few years ago when I was working on a RISC-V CPU design, I had this minimal bit of assembly to prepare the system to run test fixtures written in C:
    Code:
        .text
        .align    2
        .globl    _start
        .type    _start, @function
            .org    0
    _start:
        lui    a1,0x10001
            # Set the stack address to 0x10000FFC
        addi    sp, a1, -4
            # Call the main function
            jal     ra, test_program
            j       _start
    So four instructions were all that was needed to be able to run C code.

    FWIW this was the test program for the serial peripheral - no "stdio.h" or anything, just talking to the bare metal:


    Code:
    char text[] = "Hello world!\r\n";
    char az[] = "Text ";  
    char bz[] = " characters long\r\n";  
    
    
    volatile char *serial_tx        = (char *)0xE0000000;
    volatile char *serial_tx_full   = (char *)0xE0000004;
    volatile char *serial_rx        = (char *)0xE0000008;
    volatile char *serial_rx_empty  = (char *)0xE000000C;
    volatile int  *gpio_value       = (int  *)0xE0000010;
    volatile int  *gpio_direction   = (int  *)0xE0000014;
    
    
    int getchar(void) {
    
    
      // Wait until status is zero 
      while(*serial_rx_empty) {
      }
    
    
      // Output character
      return *serial_rx;
    }
    
    
    int putchar(int c) {
    
    
      // Wait until status is zero 
      while(*serial_tx_full) {
      }
    
    
      // Output character
      *serial_tx = c;
      return c;
    }
    
    
    int puts(char *s) {
        int n = 0;
        while(*s) {
          putchar(*s);
          s++;
          n++;
        } 
        return n;
    }
    
    
    int mylen(char *s) {
        int n = 0;
        while(*s) {
          s++;
          n++;
        } 
        return n;
    }
    
    
    int test_program(void) {
      puts("System restart\r\n");  
    
    
      /* Check some junk in memory */
      puts("String is ");
      putchar('0'+mylen(az));
      puts(" characters long\r\n");
    
    
      puts(az);
      putchar('0'+mylen(az));
      puts(bz);
    
    
      /* Run a serial port echo */
      *gpio_direction = 0xFFFF;
      while(1) {
        putchar(getchar());
        *gpio_value = *gpio_value + 1;
      }
      return 0;
    }

    But it worked - you could run very small text fixtures in simulation and in hardware, some as small as a few dozen bytes.


    Once you start adding the linkage information for the C runtime and libraries, the data space for stdin, stdout, stderr, the standard interrupt handlers and so on, along with the need to aligning things to memory page boundaries so they can be mapped into memory the executable size just ramps up.

  9. #9
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Salem View Post
    It's just a file format.

    You could try using the 'objcopy' program to copy only text/data/bss and see what the result is.

    Between obdump and objcopy, there's lots of stuff to try that should keep you busy for days.

    > I know that zero padding is necessary in some fields in the ELF header but why is there a lot of padding in other places as well?
    Ask the authors of the file format.
    But a lot of the reason might be down to there are other measures of efficiency that are not related to simply how many bytes there are in a file.

    Like keeping things on page boundaries
    Code:
    00001000  f3 0f 1e fa b8 01 00 00  00 48 8b 35 f0 2f 00 00  |.........H.5./..|
    00001010  ba 0e 00 00 00 89 c7 0f  05 b8 3c 00 00 00 31 ff  |..........<...1.|
    00001020  0f 05 c3 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    ...
    00002000  48 65 6c 6c 6f 2c 20 77  6f 72 6c 64 21 0a 00 00  |Hello, world!...|
    00002010  01 1b 03 3b 14 00 00 00  01 00 00 00 f0 ef ff ff  |...;............|
    00002020  30 00 00 00 00 00 00 00  14 00 00 00 00 00 00 00  |0...............|
    00002030  01 7a 52 00 01 78 10 01  1b 0c 07 08 90 01 00 00  |.zR..x..........|
    00002040  10 00 00 00 1c 00 00 00  b8 ef ff ff 23 00 00 00  |............#...|
    You could always try
    gcc -v -Ofast foo.c -o test_gcc -nostdlib
    to see what options the compiler / assembler / linker are using.
    Maybe you can alter those from the command line.
    Thank you! I will not spend days but I will spend some hours for sure!

  10. #10
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Schol-R-LEA-2 View Post
    I can see a number of reasons why, some general, some specific to GCC.

    First, much of it has to do with the linker, rather than the compiler per se. Part of it also has to do with the libraries, specifically the difference between static libraries and dynamically shared libraries. With most C compilers, the default is to use static libraries, and in most instances it will link in the whole library, rather than just the portions used. While I can't say for certain that the tool you are using changes the linkages to use shared libraries (the most likely case), I am guessing that if it isn't, it is at least removing the unused library code.

    Also, most code includes a lot of extraneous information for debugging purposes. With ELF, it includes all of the symbol information in the code, in order to allow the code to be debugged symbolically. This information can - and often is, for production code - removed using the strip utility.

    Finally, with GCC specifically, you need to understand that GCC isn't a compiler, exactly: it is a driver program which can call on the compiler, the assembler, and the linker as needed. While this design is flexible and well-suited for a compiler suite such as GCC, it does involve some inherent overhead both in the compilation process and in the resulting executables, as the linker (ld) doesn't trim the unused parts of the libraries.

    And it could be worse, much worse. There was a notorious Eiffel compiler for Windows in the 1990s which produced a 1 MiB executable for "Hello, World!", because it would link in all of the Eiffel runtime code into every single program.
    You are actually right! Change the linker to "gold", really reduces the size. Not a lot but it does. I also cannot tell about the libraries as my example didn't link any library but this will be a great talk in a practical, big program.
    And for the debug symbols, we can indeed strip them with "-s" (and also "-S" strips something else in the GOLD linker) but I wonder why they don't do it automatically in non debug (where the "-g" option is NOT used) builds.

  11. #11
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by hamster_nz View Post
    A few years ago when I was working on a RISC-V CPU design, I had this minimal bit of assembly to prepare the system to run test fixtures written in C:
    Code:
        .text
        .align    2
        .globl    _start
        .type    _start, @function
            .org    0
    _start:
        lui    a1,0x10001
            # Set the stack address to 0x10000FFC
        addi    sp, a1, -4
            # Call the main function
            jal     ra, test_program
            j       _start
    So four instructions were all that was needed to be able to run C code.

    FWIW this was the test program for the serial peripheral - no "stdio.h" or anything, just talking to the bare metal:


    Code:
    char text[] = "Hello world!\r\n";
    char az[] = "Text ";  
    char bz[] = " characters long\r\n";  
    
    
    volatile char *serial_tx        = (char *)0xE0000000;
    volatile char *serial_tx_full   = (char *)0xE0000004;
    volatile char *serial_rx        = (char *)0xE0000008;
    volatile char *serial_rx_empty  = (char *)0xE000000C;
    volatile int  *gpio_value       = (int  *)0xE0000010;
    volatile int  *gpio_direction   = (int  *)0xE0000014;
    
    
    int getchar(void) {
    
    
      // Wait until status is zero 
      while(*serial_rx_empty) {
      }
    
    
      // Output character
      return *serial_rx;
    }
    
    
    int putchar(int c) {
    
    
      // Wait until status is zero 
      while(*serial_tx_full) {
      }
    
    
      // Output character
      *serial_tx = c;
      return c;
    }
    
    
    int puts(char *s) {
        int n = 0;
        while(*s) {
          putchar(*s);
          s++;
          n++;
        } 
        return n;
    }
    
    
    int mylen(char *s) {
        int n = 0;
        while(*s) {
          s++;
          n++;
        } 
        return n;
    }
    
    
    int test_program(void) {
      puts("System restart\r\n");  
    
    
      /* Check some junk in memory */
      puts("String is ");
      putchar('0'+mylen(az));
      puts(" characters long\r\n");
    
    
      puts(az);
      putchar('0'+mylen(az));
      puts(bz);
    
    
      /* Run a serial port echo */
      *gpio_direction = 0xFFFF;
      while(1) {
        putchar(getchar());
        *gpio_value = *gpio_value + 1;
      }
      return 0;
    }

    But it worked - you could run very small text fixtures in simulation and in hardware, some as small as a few dozen bytes.


    Once you start adding the linkage information for the C runtime and libraries, the data space for stdin, stdout, stderr, the standard interrupt handlers and so on, along with the need to aligning things to memory page boundaries so they can be mapped into memory the executable size just ramps up.
    That's awesome! In my example, I didn't linker with "libc" tho so I wonder if only the alignment makes this huge difference.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Should this produce a warning?
    By Alpo in forum C Programming
    Replies: 4
    Last Post: 05-28-2014, 06:50 PM
  2. Using Structures to produce Report
    By Magi in forum C Programming
    Replies: 9
    Last Post: 12-04-2012, 04:46 PM
  3. What output does it produce?
    By intimidator in forum C Programming
    Replies: 7
    Last Post: 04-24-2011, 01:58 AM
  4. I would like to Produce 2d and 3d plots
    By BobInNJ in forum Windows Programming
    Replies: 2
    Last Post: 03-04-2009, 10:16 PM
  5. Djgpp Bloated executables
    By Jperensky in forum C++ Programming
    Replies: 2
    Last Post: 03-30-2002, 08:10 PM

Tags for this Thread