Thread: DirectXMath & SIMD Registers Question(s)

  1. #1
    Registered User
    Join Date
    Jan 2010
    Posts
    233

    DirectXMath & SIMD Registers Question(s)

    Hi there,

    Don't know where to start with this one! A might be a bit rambly! I've currently come across DirectX's use of the SIMD registers in my work. So far I've gleaned that the registers can accept 128 bits of data represented by the __m128 type.

    There's a vector type in DirectX math named XMVECTOR. It has four floats in it and is thus 128 bits/16 bytes wide. Nice and handy size for the SIMD registers.

    All seems ok until you come across the issue of compiler and platform dependency. DirectX math is obviously written to accommodate many types of machine. So whenever you want to pass these data types (such as XMVECTOR) into a function an issue occurs. That being most likely a function will take more parameters of that type than the SIMD registers can handle.

    Microsoft appear to have resolved this using the following "thing" (macro?):

    XM_CALLCONV

    This seems to run a load of definition style if else statements that ultimately resolves to an arrangement correct for the current machine and compiler. Depending upon the processor architecture (X86 or X64) it resolves to one of two things either:

    __fastcall

    or

    __vectorcall

    These decide how many of the parameters in the function call can be passed to the registers and not the stack. __vectorcall being the superior version.

    Is that all that's needed? No it isn't. It seems you also have to pass parameters into a function in a manually correct way, which relies on you the programmer to specify the correct variant of the XMVECTOR type. There are several variants they are namely:

    Code:
    FXMVECTOR
    GXMVECTOR
    HXMVECTOR
    CXMVECTOR
    These are based on the standard XMVECTOR type.

    An example of such a function call (which follows DirectX's parameter passing rules):

    Code:
    inline XMMATRIX XM_CALLCONV XMMatrixTransformation(
    FXMVECTOR ScalingOrigin,
    FXMVECTOR ScalingOrientationQuaternion,
    FXMVECTOR Scaling,
    GXMVECTOR RotationOrigin,
    HXMVECTOR RotationQuaternion,
    HXMVECTOR Translation);
    So if I've got this right, there's almost two types of checking going on here? The first being the XM_CALLCONV and the other relying on the programmer loading right type of XMVECTOR variant into the function parameter list.

    I'm kinda confused as to why this is. Why not just use one?

    Things get even more confusing when you browse the following link:

    Library Internals - Win32 apps | Microsoft Learn

    And note the following quote from it:

    For 32-bit Windows

    For 32-bit Windows, there are two calling conventions available for efficient passing of __m128 values (which implements XMVECTOR on that platform). The standard is __fastcall, which can pass the first three __m128 values (XMVECTOR instances) as arguments to a function in a SSE/SSE2 register. __fastcall passes remaining arguments via the stack.
    and now this one:

    For 64-bit editions of Windows

    For 64-bit Windows, there are two calling conventions available for efficient passing of __m128 values. The standard is __fastcall, which passes all __m128 values on the stack.
    So for a more sophisticated version it doesn't use the SIMD registers and just puts everything on the stack? I don't understand why. Surely a more advanced version of Windows would put more stuff in the SIMD registers not less?

    It's all left me a bit confused. It gets even more bizzare when you read up the details of __fastcall for an X86 plaform:

    __fastcall | Microsoft Learn

    Which is quoted as stating this:

    Argument-passing order: The first two DWORD or smaller arguments that are found in the argument list from left to right are passed in ECX and EDX registers; all other arguments are passed on the stack from right to left.
    Two DWORDS? That's only 8 bytes. An XMVECTOR type is 16 bytes wide and yet the documentation states (I'll re-post the quote above):

    For 32-bit Windows

    For 32-bit Windows, there are two calling conventions available for efficient passing of __m128 values (which implements XMVECTOR on that platform). The standard is __fastcall, which can pass the first three __m128 values (XMVECTOR instances) as arguments to a function in a SSE/SSE2 register. __fastcall passes remaining arguments via the stack.
    So how on earth can one thing be saying __fastcall can handle two DWORDS for a total of 8 bytes, and the next saying it can accommodate three __m128 data types which amounts to 48 bytes?

    I'm very confused, any insight would be most helpful (even if it's just to tell me I don't need worry about this), thanks

  2. #2
    Registered User
    Join Date
    Dec 2017
    Posts
    1,664
    you also have to pass parameters into a function in a manually correct way
    You have to do that with all functions, like passing an int followed by a long.

    I'm kinda confused as to why this is. Why not just use one?
    I don't see what you're getting at here. They are two totally different things. One is about the calling convention (whether arguments go in registers or on the stack, who cleans up the stack, etc), the other is simply about passing the correct argument types as with any function.

    So for a more sophisticated version it doesn't use the SIMD registers and just puts everything on the stack? I don't understand why. Surely a more advanced version of Windows would put more stuff in the SIMD registers not less?
    Obviously there's a reason, although I have no idea what that may be since this depends on extreme details that are probably only of interest to compiler writers. And you have the other calling convention, which presumably passes some in registers.

    So how on earth can one thing be saying __fastcall can handle two DWORDS for a total of 8 bytes, and the next saying it can accommodate three __m128 data types which amounts to 48 bytes?
    The quote mentioning DWORDs is apparently just about regular arguments, passed in the regular registers (and on the stack). The other quote is about the SIMD vector arguments, which are passed in SSE(2) registers, not regular registers. So they don't conflict but are instead just talking about different things.
    All truths are half-truths. - A.N. Whitehead

  3. #3
    Registered User
    Join Date
    Jan 2010
    Posts
    233
    That's helped a great deal John, thanks very much

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Assembly language programming in SIMD
    By Keystone in forum Windows Programming
    Replies: 0
    Last Post: 01-04-2013, 12:07 PM
  2. C++ SIMD for 3D graphics programming?
    By sarah22 in forum C++ Programming
    Replies: 9
    Last Post: 09-27-2009, 01:29 PM
  3. Registers, C++
    By yanol in forum C++ Programming
    Replies: 6
    Last Post: 06-05-2008, 02:07 AM
  4. Using MMX & XMM Registers
    By HyperShadow in forum C++ Programming
    Replies: 3
    Last Post: 07-14-2007, 12:53 AM
  5. Registers
    By Golden in forum C Programming
    Replies: 2
    Last Post: 09-04-2001, 11:48 AM

Tags for this Thread