Thread: Two basic questions about generated assembly

  1. #1
    Registered User
    Join Date
    May 2006

    Two basic questions about generated assembly

    Hello everyone,

    Two questions after readnig this article,


    why using LEA to do multiplication is faster than using MUL?

    "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction."


    "The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address?

    thanks in advance,

  2. #2
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Banks of the River Styx
    for #1:
    LEA can only be used for some multiplications. LEA is only capable of multiplying by a constant, and that constant must be 1, 2, 4, or 8. These are all powers of two, and so can be done by bitshifting, which is easier than straight up multiplying. Additionally, you can add the register (as shown in your example), so you get:
    *1, *2, *3, *4, *5, *8, and *9 with LEA.

    Straight from the manual (AMD's version, the page(s) on the LEA instruction):
    The LEA instruction has a limited capability to perform multiplication of operands in general-purpose
    registers using scaled-index addressing. For example:
    lea eax, [ebx+ebx*8]
    loads the value of the EBX register, multiplied by 9, into the EAX register. Possible values of
    multipliers are 2, 4, 8, 3, 5, and 9.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Farncombe, Surrey, England
    And the fact is that LEA is fast because it works on multiples of 2, so it's a simple shift of 0, 1, 2 or 3 bits on one of the inputs and an optional base to add the original value (leading to the 3, 5, 9 multipliers). This can be done by the hardware in a single cycle or perhaps 2 cycles on older (486-Pentium I generation) processors.

    Multiply, on the other hand, will take a few cycles (without looking it up, I'd say about 5-10 on a modern processor, and around 20-30 on a 486 or Pentium) to perform a 32 by 32 bit multiply using generic methods of repeating addiition.

    "Linear address" is the same as a virtual address in most cases. A non-linear address is the physical location (where it may well be that the next 4KB section in linear space is physically located 3.999GB away from the current 4KB section).

    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 10
    Last Post: 11-23-2007, 12:13 AM
  2. I have few questions about basic C++ programing
    By wonderland in forum C++ Programming
    Replies: 2
    Last Post: 10-17-2007, 10:13 PM
  3. A couple of Basic questions
    By ozumsafa in forum C Programming
    Replies: 8
    Last Post: 09-26-2007, 04:06 PM
  4. what are your thoughts on visual basic?
    By orion- in forum A Brief History of
    Replies: 16
    Last Post: 09-22-2005, 04:28 AM