Thread: Compilation steps in detail?

  1. #1
    Registered User
    Join Date
    Jan 2005
    Location
    Estonia
    Posts
    131

    Compilation steps in detail?

    Hello.

    Can someone please explain me the compiling process in detail - which files get written after which files.

    g++ main.cpp makes a program called a.exe

    But g++ -c main.cpp makes a .o file

    There is also this possibility to create a main.s file by g++ -S main.cpp

    I want to know the steps that g++ makes to reach from .cpp to .exe and in which order are those .o-s and .s-s and stuff generated.

    I searched google, but there was not a single tutorial that answered my questions.
    (Maybe you could give me a link to a good place)

    Thanks in advance

  2. #2
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    The birds-eye view:

    The compiler grabs a cpp file and makes it into a .o file. This is a bytecode file.
    The linker grabs the .o file and makes it into an .exe file. This is a machine-code file.

    g++.exe can act both as a compiler and a linker. It is decided by the arguments passed to it from the command line.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  3. #3
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Hmm... Reading your question more carefully.

    The steps in detail are:

    - preprocessor prepares file for compilation. Result is a preprocessed file.
    - compiler compiles file. Result is an assembler code file
    - compiler assembles file. Result is an object file
    - linker links file. Result is an executable file.

    I think this is what you are asking
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  4. #4
    Registered User
    Join Date
    Jan 2005
    Location
    Estonia
    Posts
    131
    Quote Originally Posted by Mario F.
    Hmm... Reading your question more carefully.

    The steps in detail are:

    - preprocessor prepares file for compilation. Result is a preprocessed file.
    - compiler compiles file. Result is an assembler code file
    - compiler assembles file. Result is an object file
    - linker links file. Result is an executable file.

    I think this is what you are asking
    What is the extension of the preprocessed file? If I type g++ -E main.cpp, then it writes the preprocessed file into the output, but I see no output when the -E is done in the background (when I only use g++ main.cpp), thus, it should write it to a file.

    then it compiles the preprocessed file into assembler code(this is human readable, right?).
    getting only the assembler code should be g++ -S main.cpp and it will produce main.s. Correct?

    Then, the file is assembled - that means, the assembler code from the main.s file will be translated into pure machine code, that is not human-readable anymore and the result is written into main.o

    linker takes main.o and starts looking for the #includes and stuff(#include is somehow marked in the .o file), to link all the necessary files toghether.

    Are my views correct ?

  5. #5
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    > What is the extension of the preprocessed file? If I type g++ -E main.cpp, then it writes the preprocessed file into the output, but I see no output when the -E is done in the background (when I only use g++ main.cpp), thus, it should write it to a file.

    You need to specify the -o option followed by a filename. By defualt the output of -E is sent to the standard output.

    > then it compiles the preprocessed file into assembler code(this is human readable, right?).
    getting only the assembler code should be g++ -S main.cpp and it will produce main.s. Correct?

    I'm not sure of what happens in this step. I never did it. But nothing like trying it out yourself

    The default however is to output to an .s file.

    > Then, the file is assembled - that means, the assembler code from the main.s file will be translated into pure machine code, that is not human-readable anymore and the result is written into main.o

    It will be translated into bytecode. Not machine code. The result is an object file, yes.

    > linker takes main.o and starts looking for the #includes and stuff, to link all the necessary files toghether.

    Sort of, yes. The linker will merge all object files together, add external static libraries or search on dynamic ones for proper name declarations and function calls, and produce a machine-code executable.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  6. #6
    Registered User
    Join Date
    Jan 2005
    Location
    Estonia
    Posts
    131
    Quote Originally Posted by Mario F.
    > Then, the file is assembled - that means, the assembler code from the main.s file will be translated into pure machine code, that is not human-readable anymore and the result is written into main.o

    It will be translated into bytecode. Not machine code. The result is an object file, yes.
    What is the difference between bytecode and machine code?

  7. #7
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Machine code is readily understandable by the processor without the need of anything in between to make the translation. Bytecode is a binary representation of the code that still needs to be translated to machine-code so that the processor understands it...

    However i'm reading on it as we speak. I think I'm wrong. An object (.o) file is not bytecode. http://en.wikipedia.org/wiki/Object_file
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  8. #8
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    This is part of a tutorial I started writing. Forgive the superfluous DocBook markup.

    [qoute] <section>
    <title>Breaking Down the Build Process</title>

    <para>Let's look in more detail at what happens when compiling. For this purpose, we'll use GCC. Other command-line tools will also work, but you'll have to look up the commands yourself if you want to follow along. Alternatively, you can just read this section.</para>
    <para>As I mentioned at the start of this chapter, building a C++ program involves four steps: preprocessing, compiling, assembling and linking.</para>
    <para>Download hello.cpp and put it into a directory of its own. Get a console and change into that directory.</para>
    <para>The first step is preprocessing. The preprocessor is basically a text replacement tool. It looks for preprocessor directives (lines starting with #) and acts accordingly. It also perform a few other replacements, none of which need to concern us now.</para>
    <para>We can invoke the preprocessor using the command "g++ -E".</para>
    <screen>$ g++ -E hello.cpp -o hello.ii</screen>
    <para>This command generates the file hello.ii by preprocessing hello.cpp. The -o option specifies the name of the output file. We have to do this because by default, the preprocessed source is written to the standard output. The .ii ending is a convention indicating preprocessed C++ code.</para>
    <para>The resulting file is very large (over 700k on my system). The reason is the #include preprocessor directive. It instructs the preprocessor to include another file verbatim into the source. This is used to get system definitions into your own programs.</para>
    <para>The next step is compilation proper. We invoke it using the command "g++ -S".</para>
    <screen>$ g++ -S hello.ii</screen>
    <para>This generates the file hello.s, a file containing assembly code. For practical purposes, most compilers for higher-level languages emit assembly code as their output, instead of writing machine code directly. This way, the machine-specific details have to be implemented only once, in the assembler, instead of again and again in every compiler. This is not the only way, though: the first C++ compiler emitted C code.</para>
    <para>The next step is assembling, i.e. invoking the assmbler to create binary object code from the assembly. The command for this is "g++ -c".</para>
    <screen>g++ -c hello.s</screen>
    <para>This generates a file called hello.o. This object file is no longer human-readable. It contains actual machine code, but it is not yet a program. It still needs to be combined with system libraries and some support code, and perhaps other object files if your project consists of several, in the final step of program creation, called linking. The command to link is simply "g++".</para>
    <screen>g++ hello.o -o hello</screen>
    <para>We use the -o option again because the default output filename is a.out.</para>
    <para>Actually, not all these steps are really performed by the g++ program. Only preprocessing and compilation proper are done by g++. (Although cpp is a standalone preprocessor, combining these two steps into a single one is faster.) For assembling, g++ calls upon as or gas, the GNU assembler. For linking it calls upon ld, the GNU or system linker. However, the command lines for these tools are very complicated, so it is easier to let g++ handle everything.</para>
    <para>Of course, these steps need not be done one by one. In fact, all of them can be done at once:</para>
    <screen>g++ hello.cpp -o hello</screen>
    <para>In practice, however, each source file will be separately turned into an object file, and all object files will then be linked together. This allows rebuilding only those files that need it.</para>
    <para>Well, now that you know exactly how hello.cpp is turned into hello (or hello.exe), it's time to look at the contents of this file to see what it actually does.</para>
    </section>[/quote]
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  9. #9
    Registered User
    Join Date
    Jan 2005
    Location
    Estonia
    Posts
    131
    "Assembly languages use mnemonic codes to refer to machine code instructions. Such a more readable rendition of the machine language is called an assembly language and consists of both binary numbers and simple words whereas machine code is composed only of the two binary digits 0 and 1."

    "An object file consists of machine code "

    Thus: .o files are bytecodes(i.e. machine code).
    and .o files are made out of .s files, thus, .o is nothing more than a machine code representation of .s file(.s file contains assembly code).

  10. #10
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Yes, except that bytecode and machine code usually mean different things, as Mario pointed out.

    Bytecode usually refers to the intermediate binary form used by virtual machines such as the JVM or the .Net CLR, or even the PHP or Python interpreters.

    Machine code refers to the binary form that is directly executable by an existing processor.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Prepare for a nice ride

    g++ -v main.cpp

    It shows you all the steps which happen.

    Compare with
    g++ -v -c main.cpp
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    Jan 2005
    Location
    Estonia
    Posts
    131
    Quote Originally Posted by CornedBee
    Yes, except that bytecode and machine code usually mean different things, as Mario pointed out.

    Bytecode usually refers to the intermediate binary form used by virtual machines such as the JVM or the .Net CLR, or even the PHP or Python interpreters.

    Machine code refers to the binary form that is directly executable by an existing processor.
    Oh, so bytecode != machine code. But in our case, .s gets turned into a machine code, not a bytecode, right?


    And thanks CornedBee, your tutorial is great

    Oh, another thing:

    if the .ii file is 600 KB and then we invoke the compiler to turn that preprocessed code into assembly code, the resulting .s file will only be like 4 KB - where has all the size gone?
    .o is made out of .s and is about 2 KB
    but .o -> .exe will result in a 400 KB executable (Where does the data hide itself ?)

  13. #13
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Oh, so bytecode != machine code. But in our case, .s gets turned into a machine code, not a bytecode, right?
    That's right; C++ doesn't use bytecode. Well, not usually; some compilers probably do.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  14. #14
    Registered User
    Join Date
    Jan 2005
    Location
    Estonia
    Posts
    131
    Quote Originally Posted by hardi
    Oh, so bytecode != machine code. But in our case, .s gets turned into a machine code, not a bytecode, right?


    And thanks CornedBee, your tutorial is great

    Oh, another thing:

    if the .ii file is 600 KB and then we invoke the compiler to turn that preprocessed code into assembly code, the resulting .s file will only be like 4 KB - where has all the size gone?
    .o is made out of .s and is about 2 KB
    but .o -> .exe will result in a 400 KB executable (Where does the data hide itself ?)
    it might be, that the .s file consists of only code written by me and some other small stuff, but when the ld.exe is invoked, the other libraries (I used the iostream library) are linked, thus the data comes out of hiding and the resulting executable will be 400KB. Does that mean, that .s file has some information about what files to include at linking instead of turning the includable file into assembly code(and hey! In that case, when is the included file turned into .s then!?).

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Your .s file will have unresolved symbols of one sort or another, which causes the linker to do it's thing.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help! -Linked Lists, Memory Allocation, Time Steps, Debugg
    By MetallicaX in forum C Programming
    Replies: 2
    Last Post: 03-14-2009, 08:50 PM
  2. get keyboard and mouse events
    By ratte in forum Linux Programming
    Replies: 10
    Last Post: 11-17-2007, 05:42 PM
  3. Logical errors with seach function
    By Taka in forum C Programming
    Replies: 4
    Last Post: 09-18-2006, 05:20 AM
  4. Folding@Home Cboard team?
    By jverkoey in forum A Brief History of Cprogramming.com
    Replies: 398
    Last Post: 10-11-2005, 08:44 AM
  5. MS VC++ Crash on compilation
    By Magos in forum A Brief History of Cprogramming.com
    Replies: 10
    Last Post: 08-23-2003, 07:06 PM