Thread: Strcpy Assembly

  1. #16
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    can i ask where you are getting the assembler keywords? Are you using a website or book?
    Thanks
    You ended that sentence with a preposition...Bastard!

  2. #17
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    What "doesn't work" about it? If I do
    Code:
    .section .data
    
    stringVar:
            .string "Hello World"
            .comm copyString,12,1
    
    hello:
            .string "World"
    .section .text
    
    .global _start
    
    _start:
            movl $stringVar, %esi
            movl $copyString, %edi
            movl $0, %ebx
    
    copyLoop:
            cmpl $12, %ebx
            je exit
            movb (%esi), %cl
            movb %cl, (%edi)
            incl %esi
           incl %edi
            incl %ebx
            jmp copyLoop
    
    exit:
            movl $5, %edx
            movl $0, %ebx
            movl $hello, %ecx
            movl $4, %eax
            int $0x80
            movl $1, %eax
            int $0x80
    I see "World" like I would expect to see.

    Alignment is important, because generally for multi-byte reads the computer can only access certain positions; for instance, for a 4-byte read the address has to end in either 0x0, 0x4, 0x8, or 0xc. If you try to read four bytes from something that ends in 0x11, usually that means you have to do two reads and then piece it together. In this case, we'll only be reading one byte at a time, so we don't need to worry about the address being a multiple of anything. (So we specify that the address has to be a multiple of 1, because every address is a multiple of 1.)

  3. #18
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by Eman View Post
    i don't get what you mean by '1' as a byte alignment.
    No, it means that that segment of memory will line up at an address that's a multiple of 1, i.e. any address, which is fine for character data, which is guaranteed to be a single byte.

    Does it mean each byte of the 12 is exactly one byte? So if it was an integer type we could say
    Code:
            .comm CopyString,12,4
    You would probably say that for an integer type, but not strictly because ints are 4 bytes. That 4 means the data has to start at an address that is a multiple of 4, like 12 or 16 (not 13, 14 or 15). Alignment restrictions are requirements of the underlying architecture (x86, 68k, ARM, etc), and generally require an alignment boundary that is a multiple of the data size. A single byte is aligned on a 1 byte boundary (anywhere), a 2-byte piece of data on a 2-byte (even) boundary, a 4-byte piece of data on a 4-byte boundary, etc. Violating alignment restrictions will often produce what's known as a bus error, which will probably crash your program in a manner similar to a seg fault.

  4. #19
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    ^^
    @Tabstop: I was declaring a label
    Code:
        copyString: 
               .comm copyString,12,1
    that was why it wasn't working...
    you had yours like
    Code:
    stringVar: 
                 "Hello World" 
                 .comm copyString,12,1
    in your declaration, is copyString still a memory location as doing this
    copyString: ?
    what is the difference? (except that one works and the other doesn't)

    @Anduril, Tabstop
    I get the point about the alignment. Thanks
    Last edited by Eman; 12-31-2010 at 12:57 PM.
    You ended that sentence with a preposition...Bastard!

  5. #20
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by Eman View Post
    can i ask where you are getting the assembler keywords? Are you using a website or book?
    Thanks
    Sorry, they're somewhere in my head. I will try to dig around for a good assembly reference.

  6. #21
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Comm - Using as

    (EDIT to add: A label is simply a label for the contents of your file. I.e. using a label is a little bit like a line number in basic or fortran. That works for a string literal, since that string literal must also appear somewhere in your file. For a chunk of memory like a variable, not so much.)

    (More edit: I suppose you could put something like
    Code:
    CopyString:
        .byte 0,0,0,0,0,0,0,0,0,0,0,0
    to set aside 12 bytes labeled with that label.)
    Last edited by tabstop; 12-31-2010 at 01:05 PM.

  7. #22
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by anduril462 View Post
    Sorry, they're somewhere in my head. I will try to dig around for a good assembly reference.
    Ah cool, ty
    You ended that sentence with a preposition...Bastard!

  8. #23
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by tabstop View Post
    Comm - Using as

    (EDIT to add: A label is simply a label for the contents of your file. I.e. using a label is a little bit like a line number in basic or fortran. That works for a string literal, since that string literal must also appear somewhere in your file. For a chunk of memory like a variable, not so much.)
    the reason I am asking is if I wanted to print the contents of copyString as I have done with label hello: , would it work? (I am going to try it now)
    Thanks for the link.
    ------
    EDIT

    Cool everything works now. System calls is a bit clearer than before.
    But to clarify about the label
    so the label is used to store the address the address of the line number of the first content?

    is it allowed if I compare it to C, where when you call a function it pushes the return address - the line number to return to?

    So a label is just like that, so it is a bad idea to be used as declaring memory, unless explicitly declared ?

    to declare a proper variable I have to use .comm then?
    Last edited by Eman; 12-31-2010 at 01:10 PM.
    You ended that sentence with a preposition...Bastard!

  9. #24
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by Eman View Post
    ^^
    @Tabstop: I was declaring a label
    Code:
        copyString: 
               .comm copyString,12,1
    The assembler should have complained that copyString was already defined:
    $ as eman.s
    eman.s: Assembler messages:
    eman.s:6: Error: symbol `copyString' is already defined

    that was why it wasn't working...
    you had yours like
    Code:
    stringVar: 
                 "Hello World" 
                 .comm copyString,12,1
    in your declaration, is copyString still a memory location as doing this
    copyString: ?
    what is the difference? (except that one works and the other doesn't)
    Yes, in both cases copyString is a label that refers to an address (so the assembler turns $copyString into the corresponding address when generating the code). As long as it refers to a chunk of memory of sufficient size, your program will work. The .comm directive requires an identifier as the first item though, so you can't use a label with the same name as the identifier. You could have achieved the same result with something like:
    Code:
    copyString:
        .ascii "            "
    and ended up with a 12-byte string initialized to all spaces, the start of which you refer to with $copyString. Then you would copy over the spaces instead of the zero bytes that are probably put in by the .comm directive.

  10. #25
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    Quote Originally Posted by anduril462 View Post
    The assembler should have complained that copyString was already defined:
    $ as eman.s
    eman.s: Assembler messages:
    eman.s:6: Error: symbol `copyString' is already defined


    Yes, in both cases copyString is a label that refers to an address (so the assembler turns $copyString into the corresponding address when generating the code). As long as it refers to a chunk of memory of sufficient size, your program will work. The .comm directive requires an identifier as the first item though, so you can't use a label with the same name as the identifier. You could have achieved the same result with something like:
    Code:
    copyString:
        .ascii "            "
    and ended up with a 12-byte string initialized to all spaces, the start of which you refer to with $copyString. Then you would copy over the spaces instead of the zero bytes that are probably put in by the .comm directive.
    sweet, that is clear now.
    Oddly enough I didn't get an error. It linked fine and executed as well.
    I think I should enough to pass the upcoming exam, but I will keep practicing!
    Does the insight debugger come with the compiler by the way?
    You ended that sentence with a preposition...Bastard!

  11. #26
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    A label is just a marker for a place in your object file (technically, once your program gets loaded into memory, it refers to that relocated spot, but the point is the same), as opposed to "normal" storage, like the stack. For instance, if you did the blank-string or 0,0,0,0,0 approach, you would have 24 bytes at the beginning of your object file that say
    "Hello World0000000000000" and those labels would refer to those spots in the file, just like copyLoop refers to that point in your object file where that cmp instruction lives. In fact, you could do "jmp stringVar" in place of "jmp copyLoop" and everything would be happy (until you try to run it). The two types of labels are actually the one type of label.

    Normally, when you declare variables in a C program, those go on the stack -- I don't have hundreds of bytes of zeroes in my source code when I declare an array of doubles. But the assembler might translate the starting point of my array to, I don't know, ebp+8 or whatever (ebp usually points to the base of the stack) and we go from there.

    As to the debugger, if you've gotten to here you almost certainly have gdb hanging around. You'll have to assemble with the -g flag (and you might need to specify the debugger format too, like -gstabs) if you want to use the debugger with it.

  12. #27
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    To expand on tabstop's post, local variables with automatic storage (which is the default for a local function variable including those local to main) go on the stack. Global variables (which are generally evil) and local variables with static storage generally go in the data or BSS section, depending on whether they're initialized (data) or not (BSS - all zeros).

    ebp refers to the start of the current stack frame, which represents the function you are currently in. Every function call bumps ebp down the stack (to a lower address) to give each function it's own separate space to work in. The stack grows down from the high memory addresses to the low ones, thus main's base pointer has a larger address than that of, e.g. foo, which is called from within main. When a function returns, ebp is restored to it's previous value to put you back in the frame of reference of the calling function. Parameters to a function in x86 are referred to as ebp+4, ebp+8, etc, while local variables are referred to as ebp- some number. This is because parameters are typically pushed on the stack by the calling function, thus they are on there before ebp is changed (with a higher address than ebp has in the context of the new function) when the new function is called. The local variables and their initialization happen after the call, hence they are farther down the stack and have a lower address than ebp. This link probably explains it better.

    Note that some of your assembly, and what tabstop and I are saying is specific to x86 assembly and Linux. The registers and assembly instructions are different for Motorola processors, ARM, SPARC, etc. The system calls are OS specific and the function calling specifics can change with a given compiler implementation.

    Here is a fun little read on barebones assembly programming and the ELF format that Linux uses for executables.

  13. #28
    The Dragon Reborn
    Join Date
    Nov 2009
    Location
    Dublin, Ireland
    Posts
    629
    for some reason I get the concept of stack more than the label stuff.
    What the book "Programming Ground Up" and other sites said was that a label
    represented the address of the first content.
    so
    Code:
                       intArray: 
                                  .long 1
                                  .long 2
    intArray is the address of the 1 or the memory location of 1.
    But if i did this

    Code:
                       intArray: 
                                  .long 1
                                  .long 2
                      stringVar:
                                 .string "Hello"
    intArray and stringVar are the same in memory is confusing.
    How could the 2 labels be the same, unless that they occupy the same memory location?
    You ended that sentence with a preposition...Bastard!

  14. #29
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    They're in the same memory section. The labels don't have the same value or represent the same piece of memory.

  15. #30
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Let's put that into an assembly file and generate some code, then see what addresses each label refers too.
    label.s:
    Code:
    .section .data
    intArray:
        .long 1
        .long 2
    stringVar:
        .string "Hello"
    $ as label.s -o label.o
    $ objdump -t label.o

    label.o: file format elf32-i386

    SYMBOL TABLE:
    00000000 l d .text 00000000 .text
    00000000 l d .data 00000000 .data
    00000000 l d .bss 00000000 .bss
    00000000 l .data 00000000 intArray
    00000008 l .data 00000000 stringVar
    Notice that inArray and stringVar have different addresses (0 and 8 respectively).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. loading 2 character arrays in a row
    By rivkyfried1 in forum C Programming
    Replies: 4
    Last Post: 12-09-2010, 10:40 AM
  2. C to assembly interface
    By Roaring_Tiger in forum C Programming
    Replies: 4
    Last Post: 02-04-2005, 03:51 PM
  3. assembly language...the best tool for game programming?
    By silk.odyssey in forum Game Programming
    Replies: 50
    Last Post: 06-22-2004, 01:11 PM
  4. True ASM vs. Fake ASM ????
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 7
    Last Post: 04-02-2003, 04:28 AM
  5. C,C++,Perl,Java
    By brusli in forum C Programming
    Replies: 9
    Last Post: 12-31-2001, 03:35 AM