Thread: calling 32-bit printf from 64-bit binary?

  1. #1
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229

    calling 32-bit printf from 64-bit binary?

    I have this simple C code -

    Code:
    #include <stdio.h>
    
    int main() {
    	printf("asdf");
    }
    And decided to look at the assembly output by gcc and found something strange.

    The relevant section -
    Code:
    main:
    	pushq	&#37;rbp
    	movq	%rsp, %rbp
    	movl	$.LC0, %edi
    	movl	$0, %eax
    	call	printf
    	leave
    	ret
    The thing is, this is on 64-bit Linux, and this is apparently a 64-bit binary (since it's using all those r registers), but the call to printf is 32-bit (address of the format string via edi, and "number of SSE registers used" via eax, both 32-bit registers).

    cyberfish@cyberfish-desktop:/tmp$ gcc -v
    Using built-in specs.
    Target: x86_64-linux-gnu
    Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
    Thread model: posix
    gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
    Any clue what is going on?

    Many thanks

  2. #2
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    I haven't worked with much 64-bit assembler outside of ALPHA but I don't believe I am seeing a problem here. Less I be reading your code incorrectly.

  3. #3
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I just find it strange that it is not calling the 64-bit version of printf. Also, if the address of the format string is 64-bit, like I thought it should be on a 64-bit system, how can it be passed via edi, a 32-bit register?

  4. #4
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Again, I am not 100&#37; sure since I am not the foremost expert on 64-bit assembler, but is that not more a matter of calling convention than anything else?

  5. #5
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    The question is not about calling conventions. It's using the correct calling convention for 32-bit. My question is why is it using the 32-bit version instead of 64-bit.

  6. #6
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404
    Well, even if I can't explain why it uses EDI and EAX instead of RDI and RAX, I would say it's calling "64-bit" code since the convention of passing the first argument in RDI and the number of SSE registers used in RAX is a x86-64 calling convention (namely AMD64 ABI). It it was calling "32-bit" code, I guess it would use something like cdecl, no ?
    I hate real numbers.

  7. #7
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I have just started learning assembly, so I have no idea what cdecl is... . it makes sense that it should use rdi to pass the first argument, but it's using edi. If printf is 64-bit and reads rdi, it will only get half of the address? (the least significant dword would be garbage). Same goes for eax/rax. If printf reads rax, it will get a garbage value in the least significant dword?

  8. #8
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Correct me if I am wrong (this is addressed more to matsp than anyone else). But is it not inferred by the C standard that the above would be the only valid way to even call a standard library function? For example, if you look at some of the libraries that are packed with your assembler you will see the way it documents how to use them by expressing which registers it alters and which it does not. In C the same holds true, but in a more uniform way.

  9. #9
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    But is it not inferred by the C standard that the above would be the only valid way to even call a standard library function
    How so?

    You mean the C stdlib only exist in 32-bit? How can the standard say that? What about pure 8, 16, and 64 bit architectures? I don't see how the C standard can specify what registers to use to pass arguments to stdlib functions. The calling convention should be implementation-defined, no?

    if you look at some of the libraries that are packed with your assembler you will see the way it documents how to use them by expressing which registers it alters and which it does not
    Sure, but what does that have to do with the C standard?

    [edit]
    The calling convention on x86 (64) is to use 6 registers (rdi, rsi, rdx, rcx, r8, r9) to pass the first 6 arguments. The assembly gcc spits out apparently defies that rule, by using edi. I would expect an implementation on 64-bit to follow 64-bit conventions?
    [/edit]
    Last edited by cyberfish; 10-07-2008 at 12:32 AM.

  10. #10
    Reverse Engineer maxorator's Avatar
    Join Date
    Aug 2005
    Location
    Estonia
    Posts
    2,318
    Quote Originally Posted by cyberfish View Post
    I have just started learning assembly, so I have no idea what cdecl is... . it makes sense that it should use rdi to pass the first argument, but it's using edi. If printf is 64-bit and reads rdi, it will only get half of the address? (the least significant dword would be garbage). Same goes for eax/rax. If printf reads rax, it will get a garbage value in the least significant dword?
    It's the other way around. r** registers are 64-bit, e** are 32-bit.

    The way it is done here will fail if the buffer is located at 0x0000000100000000 or higher memory address.
    "The Internet treats censorship as damage and routes around it." - John Gilmore

  11. #11
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Let's make one thing clear here: The code is not calling the 32-bit version of printf - it can't do that, it wouldn't work.

    Quote Originally Posted by maxorator View Post
    It's the other way around. r** registers are 64-bit, e** are 32-bit.

    The way it is done here will fail if the buffer is located at 0x0000000100000000 or higher memory address.
    Actually, it would fail for any address of $.LC0 that is above 0x80000000 (2 ^ 31), since move to %EDI (or any other Eregister) is sign-extended up to the 64-bit size, rather than zero-extended. So if the address of LC0 is outside the first or last 2GB of virtual address, it will fail. However, the linker will tell you if that is the case when it resolves the relocation of the address - and it is very unlikely that your static data will actually reach 2GB - the compiler assumes that this never happens.

    EAX is the 32-bit version of RAX, and since the number of SSE registers that may hold data is always less then 16, it will nicely fit in the lower 32 bits with 28 bits to spare - so there is no need to use the 64-bit version of the register. We do however need to set the entire 32-bit value, so a mov to AL or some such would involve some other instruction to extend the operation, which would add extra instructions and not really help.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  12. #12
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    On the other hand, by not moving to the r** registers, but using the e** registers instead, the compiler saves a byte for the "large operand" instruction prefix that is required for many 64-bit opcodes.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by CornedBee View Post
    On the other hand, by not moving to the r** registers, but using the e** registers instead, the compiler saves a byte for the "large operand" instruction prefix that is required for many 64-bit opcodes.
    Not to mention the four bytes saved by not using a 64-bit operand on the %RDI load constant that would be all zero anyways.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Try compiling with -mcmodel=medium. This forces GCC not to assume that symbols are in the lower 2 GB. You should find that it then moves the string pointer to the full rdi.
    It will still move the 0 constant to eax, of course.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  15. #15
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    the problem is more likely with your disassembler, since the binary for 32 bit instructions and 64 bit instructions are (mostly) identical. It just depends on what mode the processor is in when whether it executes a 32 bit or 64 bit version of the opcode.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 03-05-2009, 10:25 AM
  2. Simple C question: user input to repeat a loop
    By evernaut in forum C Programming
    Replies: 2
    Last Post: 11-18-2006, 09:23 AM
  3. creating an array of rects
    By a1dutch in forum C++ Programming
    Replies: 8
    Last Post: 03-07-2006, 06:15 PM
  4. Resource ICONs
    By gbaker in forum Windows Programming
    Replies: 4
    Last Post: 12-15-2003, 07:18 AM
  5. Drawing tables in C
    By stanoman in forum C Programming
    Replies: 5
    Last Post: 10-09-2003, 10:14 AM