Do most good compilers generate assembly code first or do they go directly to machine code?
Printable View
Do most good compilers generate assembly code first or do they go directly to machine code?
A good compiler will do a correct translation between the source code and whatever target language it is built for. Depending on the programming language, what constitutes the build process and the target system of the compiler, a compiler may not even translate source code into assembly or machine code.
I agree with Mario's explanation, but I'd like to clarify:
C and C++ compilers most often do not produce assembler code, but goes directly to the machine code in an object file. Most, however, have an option to produce assembler code instead of object code.
--
Mats
GCC is an important exception, in that nearly all backends go through assembly code and don't even have machine code emitters. GCC relies on the system assembler (GNU as on Linux) for generating machine code.
I was asking this because I was really interested in programming language development and wanted to know whether to learn asm or machine code. Since it seems machine code might be a safer bet, any tutorials or good books to start with?
P.S I tried google for tutorials but couldn't find any.
Asm and machine code are almost the same thing. One is with letters and numbers, the other is in binary or hex. Learn assembly first, because it's more accessible.
For most compilers - machine code first. Outputting assembly code from machine code is relatively easy compared to assembling the assembly code into machine code.
>>Asm and machine code are almost the same thing. One is with letters and numbers, the other is in binary or hex. Learn assembly first, because it's more accessible.
That seems a little unreasonable in my head. Why would you want to do that? This seems to me like a learn C before C++ kind've thing.
Because
mov ax, bx
is more readable than
DF541231234324FDEABAA etc.
And because there are tutorials for learning assembly, but the resources for learning machine code assume that you already know assembly.
I'm not even sure if it makes sense to learn machine code if the hardware manufacturer provided an assembly language.
Most assemblers tell you the next values of the opcodes. Once you understand assembler a bit you will realize that machine code and assembler are quite interchangeable if you read the hex values of the opcodes. Unless you are writing something that needs to produce the machine code, however, I do not see why one would want to use it (I hate to even use the word learn because there isn't technically anything to learn.... Its more an issue of what values represent what opcode).
Typically in programs that I have written that are capable of altering machine code I will use macros such as
Example:
Thus eliminating the need to look at something nasty and in a binary form.Code:#define LDIV (0x0138)
#define LMUL (0x013C)
Is machine code more or less portable than asm? I would imagine asm is more portable among operating systems but machine code is more portable among hardware.
Assembler is not portable at all. It just gives easy to read names to all the opcodes which are just hexidecimal values otherwise. It is nothing more, and nothing less. It would make little sense to not learn assembler. Once you know and understand assembler, you could theoretically write machine code. But to be honest, its hard to read machine code.
NASM is portable among operating systems isn't it?
Across operating systems, yes sure. But to an assembler there is no such thing as an operating system :)
Well.... ok, here is the deal. The machine code is portable across operating systems since it is just the way the CPU operates commands. However, the machine code that runs on a MIPS processor is vastly different than that which runs on a 8088. However, executable files are different across different files. They embed different information within headers and the body of an executable. The sections that contain machine code are going to be the only similarity. So again, no. Assembler is just about the least portable langauge you can use.
Assembler and machine code have a 1:1 relationship. One instruction [as long as we stay with one processor architecture, e.g. X86] will produce one set of instruction bytes. There are some instructions that have synonyms (that is, there are more than one way to form a valid instruction that does this thing), but the output from the assembler will have the exact same meaning for either of those synonyms - it may be a byte or two longer, but other than that, it's the same result.
Sure, NASM, MASM and CPP + GAS have slightly differnet syntax for Macros and other pseudo-operations [pseudo-ops are things that aren't actual machine-code instructions, but rather instructions to the assembler as to what you want done, where you want it [e.g. .data, .code are instrucitons to the assembler that "what comes next should go in the data section or the code section].
Also, if you use gas on windows, the code will work on Windows, and as long as you don't call the OS, you can assemble the same thing in Linux and get the same result.
There is absolutely no meaning in remembering that 8B is MOV, 75 is JNZ, 90 is NOP, EA is a JMP, CC is INT3, etc. [But if you want to check those out, I think I got most of them about right]. That is of course x86. On 68000 the instructions have somewhat differnet names, and completely different opcodes.
The following code should assemble, and make an executable file that works, for both Linux and Windows:
Code:.text
.globl _main
_main:
mov $hello, %eax
push %eax
call _puts
add $4, %esp
xor %eax, %eax
ret
.data
hello: .string "Hello, World!\n"
.end
--Code:as -o asm.o asm.s
gcc -o asm.exe asm.o
Mats
I think the more important issue he is somewhat not fully comprehending is that even though the opcodes are going to remain the same outside of operating system constraint, the underlying hardware is one thing that you cannot necessarily count on. Unless you are programming for something that doesn't allow any sort of variation on the underlying hardware. Such as a gaming console or DVR or something like that.
So it's like your portable among hardware, or your portable among operating systems, but without a portable library you can't be portable among both. Am I getting this strait?
Also, matsp. Your code will not assemble correctly. I keep on getting an error message stating entry point _start can't be found and that _puts is an undefined reference. Any help?
Change _main: to _start: and link to the c standard library....
That sounds dangerous. gcc uses the entry symbol _start to call initialization, and initializing glibc is part of that, I think. Shouldn't you keep using main, but link in GCC's startup files?