Thread: C being translated into x86 assembly

  1. #1
    Registered User
    Join Date
    Jun 2012
    Posts
    39

    C being translated into x86 assembly

    So I am trying to learn how to step through memory by writing simple programs and examining them in GDB. However there is a section in the assembly dump of main that I do not understand. Can anyone help me? Is it something specific to how Linux works?

    (gdb) disass main
    Dump of assembler code for function main:
    0x08048414 <+0>: push ebp
    0x08048415 <+1>: mov ebp,esp
    0x08048417 <+3>: and esp,0xfffffff0
    0x0804841a <+6>: add esp,0xffffff80
    0x0804841d <+9>: mov DWORD PTR [esp+0x7c],0x0
    0x08048425 <+17>: mov DWORD PTR [esp+0x74],0x0
    0x0804842d <+25>: jmp 0x8048498 <main+132>
    0x0804842f <+27>: mov DWORD PTR [esp+0x78],0x0
    0x08048437 <+35>: jmp 0x8048480 <main+108>
    /*begin section that I have no clue how it works*/
    0x08048439 <+37>: mov edx,DWORD PTR [esp+0x74]
    0x0804843d <+41>: mov eax,edx
    0x0804843f <+43>: add eax,eax
    0x08048441 <+45>: add eax,edx
    0x08048443 <+47>: add eax,eax
    0x08048445 <+49>: add eax,DWORD PTR [esp+0x78]
    /*end section*/
    0x08048449 <+53>: mov edx,DWORD PTR [esp+0x7c]
    0x0804844d <+57>: mov DWORD PTR [esp+eax*4+0x14],edx
    0x08048451 <+61>: add DWORD PTR [esp+0x7c],0x1
    0x08048456 <+66>: mov edx,DWORD PTR [esp+0x74]
    0x0804845a <+70>: mov eax,edx
    0x0804845c <+72>: add eax,eax
    0x0804845e <+74>: add eax,edx
    0x08048460 <+76>: add eax,eax
    0x08048462 <+78>: add eax,DWORD PTR [esp+0x78]
    0x08048466 <+82>: mov edx,DWORD PTR [esp+eax*4+0x14]
    0x0804846a <+86>: mov eax,0x8048580
    0x0804846f <+91>: mov DWORD PTR [esp+0x4],edx
    0x08048473 <+95>: mov DWORD PTR [esp],eax
    0x08048476 <+98>: call 0x8048320 <printf@plt>
    0x0804847b <+103>: add DWORD PTR [esp+0x78],0x1
    ---Type <return> to continue, or q <return> to quit---
    0x08048480 <+108>: cmp DWORD PTR [esp+0x78],0x5
    0x08048485 <+113>: jle 0x8048439 <main+37>
    => 0x08048487 <+115>: mov DWORD PTR [esp],0xa
    0x0804848e <+122>: call 0x8048350 <putchar@plt>
    0x08048493 <+127>: add DWORD PTR [esp+0x74],0x1
    0x08048498 <+132>: cmp DWORD PTR [esp+0x74],0x3
    0x0804849d <+137>: jle 0x804842f <main+27>
    0x0804849f <+139>: mov eax,0x0
    0x080484a4 <+144>: leave
    0x080484a5 <+145>: ret
    And here is the source

    Code:
    #include <stdio.h>
    
    int main(void){
    
      int a[4][6],i,j,sum=0;
    
      for(i=0;i<4;i++){
        for(j=0;j<6;j++){
          a[i][j]=sum++;
          printf("%d",a[i][j]);
        }
        printf("\n");
      }
    
      return 0;
    }

  2. #2
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    You can rewrite
    Code:
        mov edx,DWORD PTR [esp+0x74]
        mov eax,edx
        add eax,eax
        add eax,edx
        add eax,eax
        add eax,DWORD PTR [esp+0x78]
    to
    Code:
        int i; /* &i == (int *)(esp + 0x74); */
        int j; /* &j == (int *)(esp + 0x78); */
    
        edx = i;
        eax = edx;
        eax = eax + eax;
        eax = eax + edx;
        eax = eax + eax;
        eax = eax + j;
    to
    Code:
        int i; /* &i == (int *)(esp + 0x74); */
        int j; /* &j == (int *)(esp + 0x78); */
    
        eax = 6 * i + j;
    Quote Originally Posted by Codegeek892 View Post
    Is it something specific to how Linux works?
    No, it's just the way GCC optimizes i*6 + j on 32-bit x86 architecture.
    Last edited by Nominal Animal; 11-03-2012 at 06:37 PM.

  3. #3
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Nominal, you've got those operands backwards. gdb outputs disassembly in at&t syntax, where the destination is second, and the source is first.

    it's more like:
    Code:
    int i; /* &i == (int *)(esp + 0x74); */
    int j; /* &j == (int *)(esp + 0x78); */
     
    i = edx
    edx = eax;
    eax = eax + eax;
    edx = eax + edx;
    eax = eax + eax;
    j = eax + j;

  4. #4
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Elkvis View Post
    Nominal, you've got those operands backwards. gdb outputs disassembly in at&t syntax
    Nope, I don't think so.

    You see, current GDB supports both AT&T syntax, and Intel syntax. Just use
    Code:
    set disassembly-flavor intel
    to get the disassembly using the Intel syntax. See the gdb documentation for details.

    In particular, in AT&T syntax the registers are always preceded by a %, i.e. %eax instead of eax. I trust the OP did not do something as silly as modify the output of gdb before posting.

  5. #5
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by Nominal Animal View Post
    In particular, in AT&T syntax the registers are always preceded by a %, i.e. %eax instead of eax. I trust the OP did not do something as silly as modify the output of gdb before posting.
    I wasn't aware they had added intel syntax support. that's actually pretty cool. I wonder if gcc supports intel syntax inline assembly. I must google that...

  6. #6
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Elkvis View Post
    I wasn't aware they had added intel syntax support. that's actually pretty cool. I wonder if gcc supports intel syntax inline assembly. I must google that...
    GNU as does, via the .intel_syntax noprefix (and .att_syntax) keywords.

    It is a bit fragile, though, in inline assembly; you must assume GCC uses and will use AT&T syntax in the future, too. I'd say it's safer to just learn to live with the AT&T syntax, even though it feels weird. (And it does feel very weird to me, too: the source,target operand order never made any sense to me.)

  7. #7
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    t is a bit fragile, though, in inline assembly; you must assume GCC uses and will use AT&T syntax in the future, too.
    O_o

    That's almost certainly never going to change, but I rather doubt they will stop supporting the directive for Intel syntax.

    the source,target operand order never made any sense to me.
    ^_^;

    Back when I still did assembler, I moved between tools enough that I somehow found both orders awkward.

    Soma

  8. #8
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by phantomotap View Post
    That's almost certainly never going to change, but I rather doubt they will stop supporting the directive for Intel syntax.
    I agree, neither syntax is not going away, but the latter .att_syntax assumes that is what GCC will use internally. I don't think GCC will ever switch to Intel syntax, but there might be cases where the inline assembly is inserted to code generated in some other way (two static inline functions using Intel syntax inline assembly, using each other in input arguments?) where it might not be the case. I just don't want to make an assumption I don't have any way of ascertaining.

    Also, the way you list e.g. clobbered registers for inline assembly is a bit counter-intuitive when the assembly itself is written using Intel syntax.

    Quote Originally Posted by phantomotap View Post
    Back when I still did assembler, I moved between tools enough that I somehow found both orders awkward.
    As a youngling, I only had access first to 6502, then 80286 architectures. Before I got my hands on TASM, I had to use DOS DEBUG to write my assembly code.. did two projects using it, in fact. I also didn't have a C compiler, but did have Turbo Pascal for my '286 I got from a teacher. When I started with Linux in mid-nineties, I loved the explicitness of the AT&T syntax (compared to BYTE/WORD/DWORD specifiers in Intel assembler); only the operand order felt completely wrong to me. Perhaps I read code more like logic or math or algorithms, rather than English? I definitely do not think in any spoken language when thinking about code and algorithms, that's for sure. I wouldn't have so much difficulty in expressing them in words if I did.

    So, yeah, I must confess I find both Intel and AT&T syntaxes awkward, too.

  9. #9
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by Nominal Animal View Post
    So, yeah, I must confess I find both Intel and AT&T syntaxes awkward, too.
    can you give an example of an assembly language syntax that you think isn't awkward?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Convert assembly>machine code, machine code>assembly
    By wenxinleong in forum C Programming
    Replies: 12
    Last Post: 06-23-2011, 10:42 PM
  2. Compilation Problem w/ CR/LFs in Translated char strings
    By joeprogrammer1 in forum C Programming
    Replies: 1
    Last Post: 09-02-2009, 01:32 PM
  3. Hash function... translated from C++
    By pollypocket4eva in forum C Programming
    Replies: 11
    Last Post: 11-15-2008, 01:54 PM
  4. x86 assembly
    By Shadow in forum A Brief History of Cprogramming.com
    Replies: 9
    Last Post: 02-18-2003, 02:29 AM
  5. Assembly?????
    By kas2002 in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 02-07-2003, 02:31 PM