Thread: msvc assembly template

  1. #1
    Dump Truck Internet valis's Avatar
    Join Date
    Jul 2005
    Posts
    357

    msvc assembly template

    I was wondering if anyone knew if assembly templates exist in the cl world.

    To clarify I mean the ability to use assembly but pass a C variable and let the compiler use whatever register it's already using for that variable.

    I know gcc supports it and I already wrote it to work under gcc; however I want to keep this cross compatible.

    Thanks.

    edit:
    hmm, should this have gone in the windows forum?
    Last edited by valis; 08-09-2006 at 07:42 PM.

  2. #2
    vae victus! skorman00's Avatar
    Join Date
    Nov 2003
    Posts
    594
    Could you show me how you would write that using gcc? It sounds odd to me...why would you use assembly and have the compiler choose the registers? That sounds a lot like just writing C code, and letting the compiler deal with the asm.

  3. #3
    Dump Truck Internet valis's Avatar
    Join Date
    Jul 2005
    Posts
    357
    Code:
    #ifdef ENV_x86
    #    define ibswap(x) __asm__ __volatile__ ("bswap %0" : "=r" (x) "0" (x))
    That way I can use ibswap(theVarThatNeedsSwappage) and have the actual instruction (although I haven't checked if it's faster than bitwise operators) and on architectures without such an instruction use bitwise.

    edit: I figure I should note what that does. The %0 is the parameter and it just says use the input register as the output register (because bswap does).
    Last edited by valis; 08-09-2006 at 10:06 PM.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > however I want to keep this cross compatible.
    Then don't use assembler - period.

    > although I haven't checked if it's faster than bitwise operators
    Now it's getting pointless. You don't even know if it is better.
    Just one asm instruction in a function can seriously affect how the compiler optimises that function.

    Code it in C, without trying to be too clever about it and let the optimiser do it's stuff.
    If you're that interested in the result, do
    gcc -S prog.c
    to view the generated assembler code.

    Finish writing and debugging the code in only 'C', then profile it with some real-world data.
    Study the profile to work out where the real hot-spots in the code are, then make careful adjustments to the code until you have the performance you want.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Dump Truck Internet valis's Avatar
    Join Date
    Jul 2005
    Posts
    357
    With an assembly template the compiler can still do all the optimization it wants because it's only a single instruction (with no side effects) and it's free to use whatever register it wants. The speed of it is going to be somewhat important--though I'm sure shifts and ORs or the native instruction will be plenty fast, I use this several hundred thousand times in a number of loops loading media and even though that won't be a bottle neck in the running application, I feel it's important for me to squeeze out all the performance I can get with everything I do in those loops.
    Also I simply enjoy optimizing my code.

    I don't think it really matters that I use the instruction because I provide a C alternative (although I doubt this will ever be running on a non-x86, maybe a ppc but it's unlikely it will ever get out of my environment).

  6. #6
    vae victus! skorman00's Avatar
    Join Date
    Nov 2003
    Posts
    594
    does it have to be in a register? You can use variable names in asm blocks, and if I'm not mistaken, the compiler will decide if it stays on the stack or if it's moved around in a register.

    edit: Just because you write asm code in an asm block doesn't mean the compiler still won't muck with it =). Sometimes they might do things like turn a JE into a JZ. That might just be a disassembler/debugger thing though, because I've seen windbg and visual studio's debugger show different disassembly in similar situations ( which is odd, because I thought they were the same )
    Last edited by skorman00; 08-11-2006 at 07:36 PM.

  7. #7
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    MSVC inline assembler does NOT support thrash lists for registers like GCC, DJGPP, and other do.

    But what you are saying does not make sense. If you want to use a particular register then do so in the assembly code. MSVC will use your assembly code 'as-is' and will simply insert it into the code stream. This is good and bad at the same time.

    Good
    • You have full control over the registers, stack, etc. MSVC won't interfere.
    • You can write highly optimized code (yes, at times, that is faster than the C equivalent).
    • You can make use of SSE1/2, MMX1/2, 3DNow, and other specialized instructions as MSVC inline assembler supports all of these.


    Bad
    • Because MSVC sticks your code in "as-is" you have no idea the state of the system and/or registers it is using at the time. MSVC accomplishes inline asm compatibility by saving the register states prior to entering an assembly block. So you are incurring a hit every time you use the inline asm. There would be no way for MSVC to avoid doing this because they have no idea what code you may write and what registers you may use. So to allow you to use all of them, they must save the state and restore it when the block is exited.
    • Inline assembly while very useful is also very abused. Assembly is NOT always faster and part of the reason for it is what I just mentioned. Inline assembly can be used to do extremely fast memory to memory writes without the overhead of memcpy, memmove, etc.
    • Even though your code may be a bit faster you sacrifice a lot. For instance memcpy has code in it to accomodate for low to high and high to low memory copies and uses different algorithms for both. It also has code to deal with copy propogation and other issues related to copying from one address to another. Your routine may be fast in your code, however, if the OS allocates memory at different places the next time, your algo may not be so fast b/c it does not check for these conditions.
    • Memcpy uses rep stosd. This means that when it can use a rep stosd, it will. It also has code in it to use rep stosd as often as possible - even on memory buffers that are not declared with the DWORD data type. So if you have 2 byte type arrays and you memcpy - it will try as hard as it can to use rep stosd.
    • The speed difference between optimized asm and optimized C/C++ is so small, I cannot fathom using asm just to gain a few cycles - unless it must be 100% pure optimized to run right.


    NOTE: You must specifically tell MSVC to leave your asm blocks alone in the options.
    Last edited by VirtualAce; 08-11-2006 at 11:19 PM.

  8. #8
    Dump Truck Internet valis's Avatar
    Join Date
    Jul 2005
    Posts
    357
    @skorman00
    I will try giving bswap a memory reference and see if msvc understands I want it to just use a register as it sees fit and then if it works make sure it does't produce bloat or unwanted performance drops.

    @bubba
    That's the thing, I don't want to specify a register, I didn't specify thrashed registers in the template, I simply told the compiler to execute the instruction with a variable and it figures out what register to use (also this means there will be literally no performance impact since the instruction has no side effects assuming the compiler is intelligent).
    What I told the compiler to do is execute bswap with the variable x. No bswap exists that uses a 32bit memory reference, only a 32bit register, so if x isn't already in a register a mov will be generated before the bswap (and possibly another to save the register, or a push), but chances are it will already be in one so it will generate just the bswap, no bloat and no side effects to performance.

    Does the standard memcpy really still use string instructions? That's saddening . I know that using the fpu or mmx are all faster for chunks so large the compiler doesn't inline the memory move.

    So far it looks like I won't be using bswap under msvc and instead the bloated (but fast enough) 4 shifts and 3 ors (unless someone knows a magical swap trick) most of which can operate in parallel on a modern cpu.

    Thanks for your input all.

  9. #9
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    You mean like

    A^=B;
    B^=A;
    A^=B;

    That will swap variables A and B.
    Last edited by King Mir; 08-13-2006 at 08:15 PM.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  10. #10
    vae victus! skorman00's Avatar
    Join Date
    Nov 2003
    Posts
    594
    that nifty trick needs ^=, not |=

  11. #11
    Dump Truck Internet valis's Avatar
    Join Date
    Jul 2005
    Posts
    357
    That won't swap them, also I want to swap the bits of a var.

    edit: skorman00 beat me to it.

  12. #12
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by skorman00
    that nifty trick needs ^=, not |=
    My bad wan't thinking. I was thinking XOR, and forgot what simbol that was. I'll fix it now.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Screwy Linker Error - VC2005
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 06-19-2007, 02:39 PM
  2. MSVC 7.1: Template specialisation??
    By cboard_member in forum C++ Programming
    Replies: 4
    Last Post: 06-12-2006, 10:19 AM
  3. error: template with C linkage
    By michaels-r in forum C++ Programming
    Replies: 3
    Last Post: 05-17-2006, 08:11 AM
  4. Template Friend Workaround for MSVC?
    By LuckY in forum C++ Programming
    Replies: 1
    Last Post: 04-01-2005, 05:41 PM
  5. oh me oh my hash maps up the wazoo
    By DarkDays in forum C++ Programming
    Replies: 5
    Last Post: 11-30-2001, 12:54 PM