Strcpy Assembly

**Eman** · 12-31-2010

can i ask where you are getting the assembler keywords? Are you using a website or book?
Thanks

**tabstop** · 12-31-2010

What "doesn't work" about it? If I do

Code:

.section .data

stringVar:
        .string "Hello World"
        .comm copyString,12,1

hello:
        .string "World"
.section .text

.global _start

_start:
        movl $stringVar, %esi
        movl $copyString, %edi
        movl $0, %ebx

copyLoop:
        cmpl $12, %ebx
        je exit
        movb (%esi), %cl
        movb %cl, (%edi)
        incl %esi
       incl %edi
        incl %ebx
        jmp copyLoop

exit:
        movl $5, %edx
        movl $0, %ebx
        movl $hello, %ecx
        movl $4, %eax
        int $0x80
        movl $1, %eax
        int $0x80

I see "World" like I would expect to see.

Alignment is important, because generally for multi-byte reads the computer can only access certain positions; for instance, for a 4-byte read the address has to end in either 0x0, 0x4, 0x8, or 0xc. If you try to read four bytes from something that ends in 0x11, usually that means you have to do two reads and then piece it together. In this case, we'll only be reading one byte at a time, so we don't need to worry about the address being a multiple of anything. (So we specify that the address has to be a multiple of 1, because every address is a multiple of 1.)

**anduril462** · 12-31-2010

Originally Posted by Eman

i don't get what you mean by '1' as a byte alignment.

No, it means that that segment of memory will line up at an address that's a multiple of 1, i.e. any address, which is fine for character data, which is guaranteed to be a single byte.

Does it mean each byte of the 12 is exactly one byte? So if it was an integer type we could say

Code:

        .comm CopyString,12,4

You would probably say that for an integer type, but not strictly because ints are 4 bytes. That 4 means the data has to start at an address that is a multiple of 4, like 12 or 16 (not 13, 14 or 15). Alignment restrictions are requirements of the underlying architecture (x86, 68k, ARM, etc), and generally require an alignment boundary that is a multiple of the data size. A single byte is aligned on a 1 byte boundary (anywhere), a 2-byte piece of data on a 2-byte (even) boundary, a 4-byte piece of data on a 4-byte boundary, etc. Violating alignment restrictions will often produce what's known as a bus error, which will probably crash your program in a manner similar to a seg fault.

**Eman** · 12-31-2010

^^
@Tabstop: I was declaring a label

Code:

    copyString: 
           .comm copyString,12,1

that was why it wasn't working...
you had yours like

Code:

stringVar: 
             "Hello World" 
             .comm copyString,12,1

in your declaration, is copyString still a memory location as doing this
copyString: ?
what is the difference? (except that one works and the other doesn't)

@Anduril, Tabstop
I get the point about the alignment. Thanks

**anduril462** · 12-31-2010

Originally Posted by Eman

can i ask where you are getting the assembler keywords? Are you using a website or book?
Thanks

Sorry, they're somewhere in my head. I will try to dig around for a good assembly reference.

**tabstop** · 12-31-2010

Comm - Using as

(EDIT to add: A label is simply a label for the contents of your file. I.e. using a label is a little bit like a line number in basic or fortran. That works for a string literal, since that string literal must also appear somewhere in your file. For a chunk of memory like a variable, not so much.)

(More edit: I suppose you could put something like

Code:

CopyString:
    .byte 0,0,0,0,0,0,0,0,0,0,0,0

to set aside 12 bytes labeled with that label.)

**Eman** · 12-31-2010

Originally Posted by anduril462

Sorry, they're somewhere in my head. I will try to dig around for a good assembly reference.

Ah cool, ty

**Eman** · 12-31-2010

Originally Posted by tabstop

Comm - Using as

(EDIT to add: A label is simply a label for the contents of your file. I.e. using a label is a little bit like a line number in basic or fortran. That works for a string literal, since that string literal must also appear somewhere in your file. For a chunk of memory like a variable, not so much.)

the reason I am asking is if I wanted to print the contents of copyString as I have done with label hello: , would it work? (I am going to try it now)
Thanks for the link.
------
EDIT

Cool everything works now. System calls is a bit clearer than before.
But to clarify about the label
so the label is used to store the address the address of the line number of the first content?

is it allowed if I compare it to C, where when you call a function it pushes the return address - the line number to return to?

So a label is just like that, so it is a bad idea to be used as declaring memory, unless explicitly declared ?

to declare a proper variable I have to use .comm then?

**anduril462** · 12-31-2010

Originally Posted by Eman

^^
@Tabstop: I was declaring a label

Code:

    copyString: 
           .comm copyString,12,1

The assembler should have complained that copyString was already defined:
$ as eman.s
eman.s: Assembler messages:
eman.s:6: Error: symbol `copyString' is already defined

that was why it wasn't working...
you had yours like

Code:

stringVar: 
             "Hello World" 
             .comm copyString,12,1

in your declaration, is copyString still a memory location as doing this
copyString: ?
what is the difference? (except that one works and the other doesn't)

Yes, in both cases copyString is a label that refers to an address (so the assembler turns $copyString into the corresponding address when generating the code). As long as it refers to a chunk of memory of sufficient size, your program will work. The .comm directive requires an identifier as the first item though, so you can't use a label with the same name as the identifier. You could have achieved the same result with something like:

Code:

copyString:
    .ascii "            "

and ended up with a 12-byte string initialized to all spaces, the start of which you refer to with $copyString. Then you would copy over the spaces instead of the zero bytes that are probably put in by the .comm directive.

**Eman** · 12-31-2010

Originally Posted by anduril462

The assembler should have complained that copyString was already defined:
$ as eman.s
eman.s: Assembler messages:
eman.s:6: Error: symbol `copyString' is already defined

Yes, in both cases copyString is a label that refers to an address (so the assembler turns $copyString into the corresponding address when generating the code). As long as it refers to a chunk of memory of sufficient size, your program will work. The .comm directive requires an identifier as the first item though, so you can't use a label with the same name as the identifier. You could have achieved the same result with something like:

Code:

copyString:
    .ascii "            "

and ended up with a 12-byte string initialized to all spaces, the start of which you refer to with $copyString. Then you would copy over the spaces instead of the zero bytes that are probably put in by the .comm directive.

sweet, that is clear now.
Oddly enough I didn't get an error. It linked fine and executed as well.
I think I should enough to pass the upcoming exam, but I will keep practicing!
Does the insight debugger come with the compiler by the way?

**tabstop** · 12-31-2010

A label is just a marker for a place in your object file (technically, once your program gets loaded into memory, it refers to that relocated spot, but the point is the same), as opposed to "normal" storage, like the stack. For instance, if you did the blank-string or 0,0,0,0,0 approach, you would have 24 bytes at the beginning of your object file that say
"Hello World0000000000000" and those labels would refer to those spots in the file, just like copyLoop refers to that point in your object file where that cmp instruction lives. In fact, you could do "jmp stringVar" in place of "jmp copyLoop" and everything would be happy (until you try to run it). The two types of labels are actually the one type of label.

Normally, when you declare variables in a C program, those go on the stack -- I don't have hundreds of bytes of zeroes in my source code when I declare an array of doubles. But the assembler might translate the starting point of my array to, I don't know, ebp+8 or whatever (ebp usually points to the base of the stack) and we go from there.

As to the debugger, if you've gotten to here you almost certainly have gdb hanging around. You'll have to assemble with the -g flag (and you might need to specify the debugger format too, like -gstabs) if you want to use the debugger with it.

**anduril462** · 12-31-2010

To expand on tabstop's post, local variables with automatic storage (which is the default for a local function variable including those local to main) go on the stack. Global variables (which are generally evil) and local variables with static storage generally go in the data or BSS section, depending on whether they're initialized (data) or not (BSS - all zeros).

ebp refers to the start of the current stack frame, which represents the function you are currently in. Every function call bumps ebp down the stack (to a lower address) to give each function it's own separate space to work in. The stack grows down from the high memory addresses to the low ones, thus main's base pointer has a larger address than that of, e.g. foo, which is called from within main. When a function returns, ebp is restored to it's previous value to put you back in the frame of reference of the calling function. Parameters to a function in x86 are referred to as ebp+4, ebp+8, etc, while local variables are referred to as ebp- some number. This is because parameters are typically pushed on the stack by the calling function, thus they are on there before ebp is changed (with a higher address than ebp has in the context of the new function) when the new function is called. The local variables and their initialization happen after the call, hence they are farther down the stack and have a lower address than ebp. This link probably explains it better.

Note that some of your assembly, and what tabstop and I are saying is specific to x86 assembly and Linux. The registers and assembly instructions are different for Motorola processors, ARM, SPARC, etc. The system calls are OS specific and the function calling specifics can change with a given compiler implementation.

Here is a fun little read on barebones assembly programming and the ELF format that Linux uses for executables.

**Eman** · 12-31-2010

for some reason I get the concept of stack more than the label stuff.
What the book "Programming Ground Up" and other sites said was that a label
represented the address of the first content.
so

Code:

                   intArray: 
                              .long 1
                              .long 2

intArray is the address of the 1 or the memory location of 1.
But if i did this

Code:

                   intArray: 
                              .long 1
                              .long 2
                  stringVar:
                             .string "Hello"

intArray and stringVar are the same in memory is confusing.
How could the 2 labels be the same, unless that they occupy the same memory location?

**tabstop** · 12-31-2010

They're in the same memory section. The labels don't have the same value or represent the same piece of memory.

**anduril462** · 12-31-2010

Let's put that into an assembly file and generate some code, then see what addresses each label refers too.
label.s:

Code:

.section .data
intArray:
    .long 1
    .long 2
stringVar:
    .string "Hello"

$ as label.s -o label.o
$ objdump -t label.o

label.o: file format elf32-i386

SYMBOL TABLE:
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l .data 00000000 intArray
00000008 l .data 00000000 stringVar

Notice that inArray and stringVar have different addresses (0 and 8 respectively).