can i ask where you are getting the assembler keywords? Are you using a website or book?
Thanks
can i ask where you are getting the assembler keywords? Are you using a website or book?
Thanks
You ended that sentence with a preposition...Bastard!
What "doesn't work" about it? If I do
I see "World" like I would expect to see.Code:.section .data stringVar: .string "Hello World" .comm copyString,12,1 hello: .string "World" .section .text .global _start _start: movl $stringVar, %esi movl $copyString, %edi movl $0, %ebx copyLoop: cmpl $12, %ebx je exit movb (%esi), %cl movb %cl, (%edi) incl %esi incl %edi incl %ebx jmp copyLoop exit: movl $5, %edx movl $0, %ebx movl $hello, %ecx movl $4, %eax int $0x80 movl $1, %eax int $0x80
Alignment is important, because generally for multi-byte reads the computer can only access certain positions; for instance, for a 4-byte read the address has to end in either 0x0, 0x4, 0x8, or 0xc. If you try to read four bytes from something that ends in 0x11, usually that means you have to do two reads and then piece it together. In this case, we'll only be reading one byte at a time, so we don't need to worry about the address being a multiple of anything. (So we specify that the address has to be a multiple of 1, because every address is a multiple of 1.)
No, it means that that segment of memory will line up at an address that's a multiple of 1, i.e. any address, which is fine for character data, which is guaranteed to be a single byte.
You would probably say that for an integer type, but not strictly because ints are 4 bytes. That 4 means the data has to start at an address that is a multiple of 4, like 12 or 16 (not 13, 14 or 15). Alignment restrictions are requirements of the underlying architecture (x86, 68k, ARM, etc), and generally require an alignment boundary that is a multiple of the data size. A single byte is aligned on a 1 byte boundary (anywhere), a 2-byte piece of data on a 2-byte (even) boundary, a 4-byte piece of data on a 4-byte boundary, etc. Violating alignment restrictions will often produce what's known as a bus error, which will probably crash your program in a manner similar to a seg fault.Does it mean each byte of the 12 is exactly one byte? So if it was an integer type we could say
Code:.comm CopyString,12,4
^^
@Tabstop: I was declaring a label
that was why it wasn't working...Code:copyString: .comm copyString,12,1
you had yours like
in your declaration, is copyString still a memory location as doing thisCode:stringVar: "Hello World" .comm copyString,12,1
copyString: ?
what is the difference? (except that one works and the other doesn't)
@Anduril, Tabstop
I get the point about the alignment. Thanks
Last edited by Eman; 12-31-2010 at 12:57 PM.
You ended that sentence with a preposition...Bastard!
Comm - Using as
(EDIT to add: A label is simply a label for the contents of your file. I.e. using a label is a little bit like a line number in basic or fortran. That works for a string literal, since that string literal must also appear somewhere in your file. For a chunk of memory like a variable, not so much.)
(More edit: I suppose you could put something like
to set aside 12 bytes labeled with that label.)Code:CopyString: .byte 0,0,0,0,0,0,0,0,0,0,0,0
Last edited by tabstop; 12-31-2010 at 01:05 PM.
the reason I am asking is if I wanted to print the contents of copyString as I have done with label hello: , would it work? (I am going to try it now)
Thanks for the link.
------
EDIT
Cool everything works now. System calls is a bit clearer than before.
But to clarify about the label
so the label is used to store the address the address of the line number of the first content?
is it allowed if I compare it to C, where when you call a function it pushes the return address - the line number to return to?
So a label is just like that, so it is a bad idea to be used as declaring memory, unless explicitly declared ?
to declare a proper variable I have to use .comm then?
Last edited by Eman; 12-31-2010 at 01:10 PM.
You ended that sentence with a preposition...Bastard!
The assembler should have complained that copyString was already defined:
$ as eman.s
eman.s: Assembler messages:
eman.s:6: Error: symbol `copyString' is already defined
Yes, in both cases copyString is a label that refers to an address (so the assembler turns $copyString into the corresponding address when generating the code). As long as it refers to a chunk of memory of sufficient size, your program will work. The .comm directive requires an identifier as the first item though, so you can't use a label with the same name as the identifier. You could have achieved the same result with something like:that was why it wasn't working...
you had yours like
in your declaration, is copyString still a memory location as doing thisCode:stringVar: "Hello World" .comm copyString,12,1
copyString: ?
what is the difference? (except that one works and the other doesn't)
and ended up with a 12-byte string initialized to all spaces, the start of which you refer to with $copyString. Then you would copy over the spaces instead of the zero bytes that are probably put in by the .comm directive.Code:copyString: .ascii " "
You ended that sentence with a preposition...Bastard!
A label is just a marker for a place in your object file (technically, once your program gets loaded into memory, it refers to that relocated spot, but the point is the same), as opposed to "normal" storage, like the stack. For instance, if you did the blank-string or 0,0,0,0,0 approach, you would have 24 bytes at the beginning of your object file that say
"Hello World0000000000000" and those labels would refer to those spots in the file, just like copyLoop refers to that point in your object file where that cmp instruction lives. In fact, you could do "jmp stringVar" in place of "jmp copyLoop" and everything would be happy (until you try to run it). The two types of labels are actually the one type of label.
Normally, when you declare variables in a C program, those go on the stack -- I don't have hundreds of bytes of zeroes in my source code when I declare an array of doubles. But the assembler might translate the starting point of my array to, I don't know, ebp+8 or whatever (ebp usually points to the base of the stack) and we go from there.
As to the debugger, if you've gotten to here you almost certainly have gdb hanging around. You'll have to assemble with the -g flag (and you might need to specify the debugger format too, like -gstabs) if you want to use the debugger with it.
To expand on tabstop's post, local variables with automatic storage (which is the default for a local function variable including those local to main) go on the stack. Global variables (which are generally evil) and local variables with static storage generally go in the data or BSS section, depending on whether they're initialized (data) or not (BSS - all zeros).
ebp refers to the start of the current stack frame, which represents the function you are currently in. Every function call bumps ebp down the stack (to a lower address) to give each function it's own separate space to work in. The stack grows down from the high memory addresses to the low ones, thus main's base pointer has a larger address than that of, e.g. foo, which is called from within main. When a function returns, ebp is restored to it's previous value to put you back in the frame of reference of the calling function. Parameters to a function in x86 are referred to as ebp+4, ebp+8, etc, while local variables are referred to as ebp- some number. This is because parameters are typically pushed on the stack by the calling function, thus they are on there before ebp is changed (with a higher address than ebp has in the context of the new function) when the new function is called. The local variables and their initialization happen after the call, hence they are farther down the stack and have a lower address than ebp. This link probably explains it better.
Note that some of your assembly, and what tabstop and I are saying is specific to x86 assembly and Linux. The registers and assembly instructions are different for Motorola processors, ARM, SPARC, etc. The system calls are OS specific and the function calling specifics can change with a given compiler implementation.
Here is a fun little read on barebones assembly programming and the ELF format that Linux uses for executables.
for some reason I get the concept of stack more than the label stuff.
What the book "Programming Ground Up" and other sites said was that a label
represented the address of the first content.
so
intArray is the address of the 1 or the memory location of 1.Code:intArray: .long 1 .long 2
But if i did this
intArray and stringVar are the same in memory is confusing.Code:intArray: .long 1 .long 2 stringVar: .string "Hello"
How could the 2 labels be the same, unless that they occupy the same memory location?
You ended that sentence with a preposition...Bastard!
They're in the same memory section. The labels don't have the same value or represent the same piece of memory.
Let's put that into an assembly file and generate some code, then see what addresses each label refers too.
label.s:
Code:.section .data intArray: .long 1 .long 2 stringVar: .string "Hello"Notice that inArray and stringVar have different addresses (0 and 8 respectively).$ as label.s -o label.o
$ objdump -t label.o
label.o: file format elf32-i386
SYMBOL TABLE:
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l .data 00000000 intArray
00000008 l .data 00000000 stringVar