Inline asm

**wavering** · 01-08-2002

Having derived great benefit from this board ( mostly via the search facility ) while I learn C I thought I would write something on inline asm. I started using C because I needed to rewite an existing routine written in QuickBasic to run a lot quicker. It now works and runs some 50 times faster. The deepest loop runs to dozens of lines and is repeated in excess of 100 billion times so I have had to use assembly a lot and I thought the following fragment may be of interest for a number of reasons. It is a version of "switch()" but a lot quicker and shows how to access a C variable ( "bits" in this case ) jump out of an asm routine and also how to do indirect jumps in inline asm ( not in the books so I have had to "invent" it ) It is a bit "dodgy" in the sense that I have been presumptious that the source code will be laid down in a straight line but it works and is very fast.
If you think about it, all a C compiler does is turn the code into assembly so if you make it as tight as possible then it cannot fail to be be faster than C .... On my slow old machine ( 266 MegH )the average asm line runs at about 300M per sec.

Finally, you may note that this could all be replaced by one line if everything was held in an array - tried that and it is MUCH slower ( asm enthusiasts often spend 50 lines doing what can be done in other languages with one line )

int bits; //etc etc

bits = code[liner][1]; //bits takes the values 0 to 7 inclusive
_asm{
mov ax,bits ;Multiply bits by three
mov cx,ax ;as the memory location
shl ax,1 ;of each label jumps
add cx,ax ;by three!
lea di,[int10] ;get address of first label
add di,cx
jmp di ;indirect jump
int10: jmp asm10
int11: jmp asm11
int12: jmp asm12
int13: jmp asm13
int14: jmp asm14
int15: jmp asm15
int16: jmp asm16
int17: jmp asm17
}
asm10:r1 = a;
goto ext2;
asm11:r1 = b;
goto ext2;
asm12:r1 = c;
goto ext2;
asm13:r1 = d;
goto ext2;
asm14:r1 = x;
goto ext2;
asm15:r1 = 1;
goto ext2;
asm16:r1 = 2;
goto ext2;
asm17:r1 = 3;

I would appreciate comments on this - especially suggestions relating to speed of execution.

Sayeh · 01-08-2002

> Finally, you may note that this could all be
> replaced by one line if everything was held
> in an array - tried that and it is MUCH slower
> ( asm enthusiasts often spend 50 lines doing
> what can be done in other languages with one line )

Actually, this is like comparing apples and oranges. In 'high-level' languages, the number of lines in code really has nothing to do with how fast code runs. A 'high-level' language gets converted to assembly.

The only reason line count has anything at all to do with program speed is because it is directly related to the amount of work the processor has to do. Since everything is reduced to assembly language, then the number of lines counts only in assembler.

And yes, in many cases, it is faster to do 50 of the same instruction, rather than a higher-level instruction or loop-type algorithm-- this is because small instructions pipeline better and run faster. No thinking, just execute. Very fast.

---

As for your 'indirect jumps' the reason you don't see anything in the documentation about this is because this used to be a common practice 20+ years ago-- This is called

Self Modifying Code

It modifies itself on the fly and is actually easier to do than you've done. However, the reason you DON'T DO IT now, is because it breaks the processor cache. Your CPU prefetches assembly code in chunks and 'predicts' how to execute future code based on what it caches. By modifying your code on the fly, the processor has to flush the cache, reload it, and repredict.

The cache misdirection causes performance hits. Your code can be faster.

Also, DON'T modify your base pointer ('bits'). This is a nono. This is how you accidently create memory leaks, and dangling pointers. Always leave your base address alone and use another register to contain an 'offset'. Then you can always calculate address+offset.

---

One last note:

You might try using the 'code' tags. You should actually format assembler more like what is below:

Code:

_asm
   { 
   mov ax,bits                    ;Multiply bits by three 
   mov cx,ax                      ;as the memory location 
   shl ax,1                          ;of each label jumps 
   add cx,ax                       ;by three! 
   lea di,[int10]                  ;get address of first label 
   add di,cx 
   jmp di                             ;indirect jump

   int10:
      jmp asm10 
   int11:
      jmp asm11 
   int12:
      jmp asm12 
   int13:
      jmp asm13 
   int14:
      jmp asm14 
   int15:
      jmp asm15 
   int16:
      jmp asm16 
   int17:
      jmp asm17 
   } 

asm10:
   r1 = a; 
   goto ext2; 
asm11:
   r1 = b; 
   goto ext2; 
asm12:
   r1 = c; 
   goto ext2; 
asm13:
   r1 = d; 
   goto ext2; 
asm14:
   r1 = x; 
   goto ext2; 
asm15:
   r1 = 1; 
   goto ext2; 
asm16:
   r1 = 2; 
   goto ext2; 
asm17:
   r1 = 3;

That and the C labels above are much easier to read this way. Don't try to cram everything on a line, that has nothing to do with performance in your compiler.

---

Glad you are enjoying assembler. It will serve you well. If you learn what's going on in assembler, you will run rings around most OOPs programmers without much effort when you code in C or C and assembler.

Low level understanding is worth a lot!

enjoy.

**wavering** · 01-08-2002

Many thanks Sayeh,
Yes - I use 50 lines of asm because it is quicker not because I have nothing else to do!

I take your comments re the cache but I am just going on timings and what I have done is the fastest so far - if somebody can make a faster "switch()" that would be great .... any offers?

Concerning the layout I lost a lot of clarity by not using the code tag on my post here - I religiously indent to avoid lost brackets! The reason I put the labels on the same line is that this stuff runs to well over a thousand lines as it is ...

"Glad you are enjoying assembler. It will serve you well. If you learn what's going on in assembler, you will run rings around most
OOPs programmers without much effort when you code in C or C and assembler. "

Yes. I had rather assumed that which is why I am writing in C and asm. There has to be an overhead with OOP. Ultimatelly all code reduces to series of if tests ( cmp actually ) and goto statements - rather like badly written BASIC.

Actually, there is a certain irony here because the program I have written is a Genetic Programming environment ie you start with a set of random computer instructions and the whole lot evolves into a program over millions ( billions ) of generations via mutation and breeding. If you are interested I could write more but as an example here is a program which evolved to find cube roots (I am not totally sure how it works but it does to 16 places of decimals - and is more accurate than math.h .... the 8 lines with labels are what evolved - the other stuff is padding. I love the symmetry of lines 6 and 8 .... ) If you try it with other numbers it may work or it may just loop .... evolution is not perfect!

Code:

//"x" is the number for which we require the cube root
//the answer appears in "a" The format here is BASIC 

  1: c = x + 3
  2: a = c / 3
  3: c = a * a
  4: c = x / c
  5: c = a + c
  6: if c > 3 * a goto 2
  7: c = a + c
  8: if c < 3 * a goto 2

A working version in C follows:

// found on 6th Jan 2002 by newbld.c
//"x" is the number for which we require the cube root
//the answer appears in "a"

#include<math.h>
#include<stdio.h>

main(){
  double a,b,c,d,sys;
  double x = 99.9;
  long j=0;

  system("cls");	//clear screen

  lab1: c = x + 3;
  lab2: a = c / 3;
  j++;
  printf("%lu solution so far - root is %16.16f \n",j, a);
  lab3: c = a * a;
  lab4: c = x / c;
  lab5: c = a + c;
  lab6: if ( c > 3 * a) goto lab2;
  lab7: c = a + c;
  lab8: if (c < 3 * a) goto lab2;

  printf("Cube root of %5.3f is %16.16f and that cubed is %16.16f \n", x, a, a*a*a);
  sys= pow(x,(double)(1)/3);
  printf("Math root of %5.3f is %16.16f and that cubed is %16.16f \n", x,sys,sys*sys*sys );
}

I guess this is not really the right forum for this but what the heck ...

Thread: Inline asm - I love it!

Thread Tools

Search Thread

Display

Inline asm - I love it!

Similar Threads

Code review

Certain functions

"if you love someone" :D

My graphics library