Thread: Inline asm - I love it!

  1. #1
    Registered User wavering's Avatar
    Join Date
    Dec 2001
    Posts
    26

    Smile Inline asm - I love it!

    Having derived great benefit from this board ( mostly via the search facility ) while I learn C I thought I would write something on inline asm. I started using C because I needed to rewite an existing routine written in QuickBasic to run a lot quicker. It now works and runs some 50 times faster. The deepest loop runs to dozens of lines and is repeated in excess of 100 billion times so I have had to use assembly a lot and I thought the following fragment may be of interest for a number of reasons. It is a version of "switch()" but a lot quicker and shows how to access a C variable ( "bits" in this case ) jump out of an asm routine and also how to do indirect jumps in inline asm ( not in the books so I have had to "invent" it ) It is a bit "dodgy" in the sense that I have been presumptious that the source code will be laid down in a straight line but it works and is very fast.
    If you think about it, all a C compiler does is turn the code into assembly so if you make it as tight as possible then it cannot fail to be be faster than C .... On my slow old machine ( 266 MegH )the average asm line runs at about 300M per sec.

    Finally, you may note that this could all be replaced by one line if everything was held in an array - tried that and it is MUCH slower ( asm enthusiasts often spend 50 lines doing what can be done in other languages with one line )

    int bits; //etc etc

    bits = code[liner][1]; //bits takes the values 0 to 7 inclusive
    _asm{
    mov ax,bits ;Multiply bits by three
    mov cx,ax ;as the memory location
    shl ax,1 ;of each label jumps
    add cx,ax ;by three!
    lea di,[int10] ;get address of first label
    add di,cx
    jmp di ;indirect jump
    int10: jmp asm10
    int11: jmp asm11
    int12: jmp asm12
    int13: jmp asm13
    int14: jmp asm14
    int15: jmp asm15
    int16: jmp asm16
    int17: jmp asm17
    }
    asm10:r1 = a;
    goto ext2;
    asm11:r1 = b;
    goto ext2;
    asm12:r1 = c;
    goto ext2;
    asm13:r1 = d;
    goto ext2;
    asm14:r1 = x;
    goto ext2;
    asm15:r1 = 1;
    goto ext2;
    asm16:r1 = 2;
    goto ext2;
    asm17:r1 = 3;

    I would appreciate comments on this - especially suggestions relating to speed of execution.

  2. #2
    Sayeh
    Guest
    > Finally, you may note that this could all be
    > replaced by one line if everything was held
    > in an array - tried that and it is MUCH slower
    > ( asm enthusiasts often spend 50 lines doing
    > what can be done in other languages with one line )

    Actually, this is like comparing apples and oranges. In 'high-level' languages, the number of lines in code really has nothing to do with how fast code runs. A 'high-level' language gets converted to assembly.

    The only reason line count has anything at all to do with program speed is because it is directly related to the amount of work the processor has to do. Since everything is reduced to assembly language, then the number of lines counts only in assembler.

    And yes, in many cases, it is faster to do 50 of the same instruction, rather than a higher-level instruction or loop-type algorithm-- this is because small instructions pipeline better and run faster. No thinking, just execute. Very fast.

    ---

    As for your 'indirect jumps' the reason you don't see anything in the documentation about this is because this used to be a common practice 20+ years ago-- This is called

    Self Modifying Code

    It modifies itself on the fly and is actually easier to do than you've done. However, the reason you DON'T DO IT now, is because it breaks the processor cache. Your CPU prefetches assembly code in chunks and 'predicts' how to execute future code based on what it caches. By modifying your code on the fly, the processor has to flush the cache, reload it, and repredict.

    The cache misdirection causes performance hits. Your code can be faster.

    Also, DON'T modify your base pointer ('bits'). This is a nono. This is how you accidently create memory leaks, and dangling pointers. Always leave your base address alone and use another register to contain an 'offset'. Then you can always calculate address+offset.

    ---

    One last note:

    You might try using the 'code' tags. You should actually format assembler more like what is below:

    Code:
    _asm
       { 
       mov ax,bits                    ;Multiply bits by three 
       mov cx,ax                      ;as the memory location 
       shl ax,1                          ;of each label jumps 
       add cx,ax                       ;by three! 
       lea di,[int10]                  ;get address of first label 
       add di,cx 
       jmp di                             ;indirect jump
    
       int10:
          jmp asm10 
       int11:
          jmp asm11 
       int12:
          jmp asm12 
       int13:
          jmp asm13 
       int14:
          jmp asm14 
       int15:
          jmp asm15 
       int16:
          jmp asm16 
       int17:
          jmp asm17 
       } 
    
    asm10:
       r1 = a; 
       goto ext2; 
    asm11:
       r1 = b; 
       goto ext2; 
    asm12:
       r1 = c; 
       goto ext2; 
    asm13:
       r1 = d; 
       goto ext2; 
    asm14:
       r1 = x; 
       goto ext2; 
    asm15:
       r1 = 1; 
       goto ext2; 
    asm16:
       r1 = 2; 
       goto ext2; 
    asm17:
       r1 = 3;
    That and the C labels above are much easier to read this way. Don't try to cram everything on a line, that has nothing to do with performance in your compiler.

    ---

    Glad you are enjoying assembler. It will serve you well. If you learn what's going on in assembler, you will run rings around most OOPs programmers without much effort when you code in C or C and assembler.

    Low level understanding is worth a lot!

    enjoy.

  3. #3
    Registered User wavering's Avatar
    Join Date
    Dec 2001
    Posts
    26
    Many thanks Sayeh,
    Yes - I use 50 lines of asm because it is quicker not because I have nothing else to do!

    I take your comments re the cache but I am just going on timings and what I have done is the fastest so far - if somebody can make a faster "switch()" that would be great .... any offers?

    Concerning the layout I lost a lot of clarity by not using the code tag on my post here - I religiously indent to avoid lost brackets! The reason I put the labels on the same line is that this stuff runs to well over a thousand lines as it is ...

    "Glad you are enjoying assembler. It will serve you well. If you learn what's going on in assembler, you will run rings around most
    OOPs programmers without much effort when you code in C or C and assembler. "

    Yes. I had rather assumed that which is why I am writing in C and asm. There has to be an overhead with OOP. Ultimatelly all code reduces to series of if tests ( cmp actually ) and goto statements - rather like badly written BASIC.

    Actually, there is a certain irony here because the program I have written is a Genetic Programming environment ie you start with a set of random computer instructions and the whole lot evolves into a program over millions ( billions ) of generations via mutation and breeding. If you are interested I could write more but as an example here is a program which evolved to find cube roots (I am not totally sure how it works but it does to 16 places of decimals - and is more accurate than math.h .... the 8 lines with labels are what evolved - the other stuff is padding. I love the symmetry of lines 6 and 8 .... ) If you try it with other numbers it may work or it may just loop .... evolution is not perfect!

    Code:
    //"x" is the number for which we require the cube root
    //the answer appears in "a" The format here is BASIC 
    
      1: c = x + 3
      2: a = c / 3
      3: c = a * a
      4: c = x / c
      5: c = a + c
      6: if c > 3 * a goto 2
      7: c = a + c
      8: if c < 3 * a goto 2
    
    A working version in C follows:
    
    // found on 6th Jan 2002 by newbld.c
    //"x" is the number for which we require the cube root
    //the answer appears in "a"
    
    #include<math.h>
    #include<stdio.h>
    
    main(){
      double a,b,c,d,sys;
      double x = 99.9;
      long j=0;
    
      system("cls");	//clear screen
    
      lab1: c = x + 3;
      lab2: a = c / 3;
      j++;
      printf("%lu solution so far - root is %16.16f \n",j, a);
      lab3: c = a * a;
      lab4: c = x / c;
      lab5: c = a + c;
      lab6: if ( c > 3 * a) goto lab2;
      lab7: c = a + c;
      lab8: if (c < 3 * a) goto lab2;
    
      printf("Cube root of %5.3f is %16.16f and that cubed is %16.16f \n", x, a, a*a*a);
      sys= pow(x,(double)(1)/3);
      printf("Math root of %5.3f is %16.16f and that cubed is %16.16f \n", x,sys,sys*sys*sys );
    }
    I guess this is not really the right forum for this but what the heck ...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Code review
    By Elysia in forum C++ Programming
    Replies: 71
    Last Post: 05-13-2008, 09:42 PM
  2. Certain functions
    By Lurker in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2003, 01:26 AM
  3. "if you love someone" :D
    By Carlos in forum A Brief History of Cprogramming.com
    Replies: 12
    Last Post: 10-02-2003, 01:10 AM
  4. Inline asm
    By wavering in forum C Programming
    Replies: 2
    Last Post: 01-29-2002, 02:42 AM
  5. My graphics library
    By stupid_mutt in forum C Programming
    Replies: 3
    Last Post: 11-26-2001, 06:05 PM