Arrays In Inline x86 Assembly

This is a discussion on Arrays In Inline x86 Assembly within the C Programming forums, part of the General Programming Boards category; I have tried to figure out how I can use arrays in inline assembly, but have failed. My question is, ...

  1. #1
    Registered User
    Join Date
    Jun 2004
    Posts
    7

    Arrays In Inline x86 Assembly

    I have tried to figure out how I can use arrays in inline assembly, but have failed. My question is, is there a way and if so, how do you make it work? Like say I have Array[4] just as a made-up example.... how would I read and/or write there in inline x86 assembly?

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,344
    1. write it in C
    2. test the performance - if its adequate, stop
    3. figure out how to make your compiler/IDE show you the asm for the 'C' code
    4. study that asm to see how it's done
    5. try and beat the compiler.

    If you're looking for some big improvement in performance, you're looking in the wrong place.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,586
    You access an array in inline assembly via a pointer to the array. Load the full pointer into one of the segment/source index register pairs.

    Code:
    unsigned char *test=(unsigned char *)malloc(500);
     
    //C++
    //unsigned char *test=new unsigned char[500];
     
    asm {
      les	 esi,byte ptr [test]
      cld
      mov   ecx,500d
      mov   eax,0
      rep	stosb
    }

  4. #4
    #include<xErath.h> xErath's Avatar
    Join Date
    Jun 2004
    Posts
    722
    let Array be a int's array.
    In C:
    //these constants are always multiplied by the compiler, by the default variable size, if you have a decent compiler
    Array[0] = *(adress+0);
    Array[1] = *(adress+1);
    Array[n] = *(adress+n);
    But the real pointer values are (compilers do this automaticly, you mustn't multiply):
    &Array[0] = adress+4*0);
    &Array[1] = adress+4*1;
    &Array[n] = adress+4*n;
    4 is sizeof(int).
    In assembly you'd have:
    /the array
    mov edi, array
    //Array[0] is;
    mo eax, dword ptr [edi]
    //Array[1] is;
    mo eax, dword ptr [edi+4]
    //Array[n] is;
    mo eax, dword ptr [edi+4*n]

    If your using chars, use 1 instead 4, a byte ptr instead dword.
    For short ints, 2 and word ptr.

  5. #5
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,586
    Why move the data separately when you can do a block write using a REP???

    Totally lost.

    And yes mov edi,array works fine as long as the selector is correct. However I make it a point to load the full pointer to be safe.

    LDS and LES still have their place.


    If you wanted to copy dwords just do this:

    Code:
    unsigned char *test=(unsigned char *)malloc(500);
     
    //C++
    //unsigned char *test=new unsigned char[500];
     
    asm {
      les	 edi,dword ptr [test]
      cld
      mov   ecx,500d
      mov   eax,00000h
      rep	stosd
     
      and   ecx,00003h
      rep	stosb
    }
    The first example is wrong in that you should load EDI instead of ESI if you are using STOS(x). For MOVS(x) you must load both ESI and EDI.

    Haven't done assembly in some time and I'm a bit rusty at it. In recent times I have no need for it because D3D and MSVC are both fast enough without it.

    Salem is right. Follow his advice, I just thought I'd show you how to do it.
    Last edited by VirtualAce; 07-06-2004 at 11:59 PM.

  6. #6
    Registered User
    Join Date
    Jun 2004
    Posts
    7
    Ok thanks guys, this helps.


    To answer a couple questions, this is definitely not a bad optimization question because my program is using a loop which goes on for a couple thousand times at the most (around 5996 at the maximum). I have tested it and it shaved several milliseconds off of that section of the code (tested on PII-400MHz). On slower machines, this would mean much more.

    I'm very picky about my coding. Some people have suggested in the past that I'm a little loony with the way I code -- main example would be the several DOS programs that I wrote 100% in assembly. The source code was obviously very large and too difficult for anyone to understand. But it sure as heck made my program compact which is what I was aiming for -- size over speed. Now I'm getting more involved with C -- I like the language, but some things just aren't quite fast enough to my expectations, so I'll use optimizations where I think things should be fast. I don't intend on optimizing areas that will only benefit from a 2% speed increase or anything, but I will optimize areas that are noticeably slow on slow machines.

    Finally, REP isn't used because it's much quicker to process a list of instructions rather than a loop. Loops are good for making things easier to program and keeping the size compact, but that isn't the case if you're looking towards making your program as fast as you can make it. Obviously, with several thousand statements, that would be insane and therefore a loop would be understandable. However, with a small number of statements, it's just so much more beneficial (speed-wise) to type it all out. I'm guessing you knew all of this of course, but simply didn't understand what the intentions behind the code were (if 'I' even understand them correctly!).

  7. #7
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,344
    So you've used a profiler, and determined that this is the real hot spot?

    Until you've read this
    http://vision.eng.shu.ac.uk/bala/c/c...imization.html
    You're wasting your time turning short bits of C into slightly less short bits of ASM.
    There are vastly bigger fish to catch if you do it right.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  8. #8
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,586
    Salem is right on target and you sir are so off that it isn't funny.

    A REP is not slower than writing all the mov's yourself. If it was...they wouldn't have a REP. Add up all the cycles it takes to do 500 movs, and then add up the cycles it takes to do the same with a REP. The REP is faster because STOSx and MOVSx are optimized to be faster than the equivalent number of MOVs.

    Baffled.

  9. #9
    Registered User
    Join Date
    Oct 2001
    Posts
    2,934
    Quote Originally Posted by Bubba
    Haven't done assembly in some time and I'm a bit rusty at it. In recent times I have no need for it because D3D and MSVC are both fast enough without it.
    Bubba, you feeling all right?

  10. #10
    Registered User
    Join Date
    Oct 2001
    Posts
    2,934
    >Finally, REP isn't used because it's much quicker to process a list of instructions rather than a loop.

    I don't think rep would be considered a loop.

  11. #11
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,586
    Actually REP is a loop but it is an internal one and it is blazing fast. You can copy BYTEs, WORDs, or DWORDs (and soon QWORDs) at a time with it.

    If what this guy is saying were true...then EDI and ESI would be useless as would REP, STOSx and MOVSx. Check the old 486 docs and the timing info....the equivalent 32-bit mov is slower than STOSx or MOVSx so you know that is true on the newer systems. If you really want to know then code it and use a performance counter to determine speed or have MSVC profile it.

    Also to put this into perspective where I work we have a system that outputs data from an inspection system to ports. The programmer coded a function in asm that did a block read from the port regardless of the size of the data....and he used REP along with the port opcode - simple and efficient. Again, I don't remember the port read opcode...but you can look it up.

    Again, I'm totally baffled.

    Loop optimization is important but this is a single loop.

    For instance you wouldn't want to do this:

    Code:
    for(int i=0;i<500;i++)
    {
    for (int j=0;j<500;j++)
    {
    	for (int k=0;k<500);k++)
    	{
    	 ...do something here
    	}
    }
    }
    This loop is slower than crap. Unroll the loops j and k. I did not provide an example calculation inside of the k loop to illustrate this...but you get the idea. However all compilers including MSVC have loop optimization options that will automatically unroll certain loops and also eliminate redundancy.

    I have no idea where we are going with this, really.

    So what this guy is saying is this:

    Code:
    int *array=(int *)malloc(10);
     
    //Slower
    for (int i=0;i<10;i++)
    {
    array[i]=0;
    }
     
    //faster?
    array[0]=0;
    array[1]=0;
    array[2]=0;
    array[3]=0;
    array[4]=0;
    array[5]=0;
    array[6]=0;
    array[7]=0;
    array[8]=0;
    array[9]=0;
    Ridiculous. Even if it is...who cares. Perfect example of optimizing what doesn't need to be optimized. Go ahead and optimize stupid stuff like this and I'll concentrate on the stuff that the profiler tells me is slow, not what I think is slow.

    For very small arrays this might be true as in the case of clearing a 4x4 matrix. But even then most matrix classes don't use arrays so that argument doesn't even hold water.

    OMG I'm so done with this thread.

    Last edited by VirtualAce; 07-07-2004 at 01:15 AM.

  12. #12
    Registered User
    Join Date
    Jun 2004
    Posts
    7
    Sorry, I might be thinking of something else -- I thought REP was loop-related. I'm rusty with x86 because I took about a year away from it and started working with Motorolla code for a while. Now I'm relearning part of what I learned before about x86.

    Anyway, I understand the optimization things you refer to. I read part of that link and it's not anything new to me so far. The places I apply my optimizations are cruicial I assure you. I have tested the program and it performs it's action about 120ms faster on a machine that has no real trouble doing it's job in the first place. If you don't know, this section of my program converts encoded sound data into a WAV file. It processes a lot of information obviously and so x86 code I feel is important for that area. It has helped according to the benchmarks. I am having people test it on old 386 and 486 machines as we speak to see how big of a difference the performance will be.

  13. #13
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,586
    I just dont see the point bud, sorry.

    I have tried to figure out how I can use arrays in inline assembly, but have failed. My question is, is there a way and if so, how do you make it work? Like say I have Array[4] just as a made-up example.... how would I read and/or write there in inline x86 assembly?
    To me this sounds like you need help in the area of asm and yet you come here as all knowing and won't listen to any of us? Something's amiss. If you knew the answer then why did you ask?

    Inline assembly is exactly what it sounds like. Assembler code pasted into your code as is..the compiler won't touch it.
    Last edited by VirtualAce; 07-07-2004 at 01:20 AM.

  14. #14
    Registered User
    Join Date
    Jun 2004
    Posts
    7
    Well gosh I'm sorry. You know I just came here hoping I could get some polite advice from people who know more than me. I have listened and you're right about everything, but what I'm saying is that my specific optimizations are worth it. It used to take about 740ms on average to export the WAV, and I have accomplished making it around 600ms on average. I'm probably wrong, but I would think that would be a 'good' improvement on an old 133MHz Pentium. But you know more than me, so I guess it's not. 10ms is stupid to me, but 140ms isn't in my personal opinion. And this is just with a 2 second recording. It would be an even bigger difference with longer recordings.

    Sorry to have wasted your time. =( I'm just trying to learn.

  15. #15
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,586
    It's no problem but I fail to understand how you can ask a question and then disagree with the answer....if you didn't already know the answer?

    But no big deal man. We are all here to help.

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help using inline assembly for keyboard input
    By sleventeen in forum C Programming
    Replies: 7
    Last Post: 05-10-2009, 01:31 AM
  2. Code review
    By Elysia in forum C++ Programming
    Replies: 71
    Last Post: 05-13-2008, 09:42 PM
  3. Certain functions
    By Lurker in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2003, 12:26 AM
  4. inline assembly question
    By DavidP in forum C++ Programming
    Replies: 3
    Last Post: 02-10-2002, 05:14 AM
  5. Inline Assembly?
    By AlenM in forum C++ Programming
    Replies: 5
    Last Post: 12-07-2001, 11:08 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21