Hi there,
Don't know where to start with this one! A might be a bit rambly! I've currently come across DirectX's use of the SIMD registers in my work. So far I've gleaned that the registers can accept 128 bits of data represented by the __m128 type.
There's a vector type in DirectX math named XMVECTOR. It has four floats in it and is thus 128 bits/16 bytes wide. Nice and handy size for the SIMD registers.
All seems ok until you come across the issue of compiler and platform dependency. DirectX math is obviously written to accommodate many types of machine. So whenever you want to pass these data types (such as XMVECTOR) into a function an issue occurs. That being most likely a function will take more parameters of that type than the SIMD registers can handle.
Microsoft appear to have resolved this using the following "thing" (macro?):
XM_CALLCONV
This seems to run a load of definition style if else statements that ultimately resolves to an arrangement correct for the current machine and compiler. Depending upon the processor architecture (X86 or X64) it resolves to one of two things either:
__fastcall
or
__vectorcall
These decide how many of the parameters in the function call can be passed to the registers and not the stack. __vectorcall being the superior version.
Is that all that's needed? No it isn't. It seems you also have to pass parameters into a function in a manually correct way, which relies on you the programmer to specify the correct variant of the XMVECTOR type. There are several variants they are namely:
Code:
FXMVECTOR
GXMVECTOR
HXMVECTOR
CXMVECTOR
These are based on the standard XMVECTOR type.
An example of such a function call (which follows DirectX's parameter passing rules):
Code:
inline XMMATRIX XM_CALLCONV XMMatrixTransformation(
FXMVECTOR ScalingOrigin,
FXMVECTOR ScalingOrientationQuaternion,
FXMVECTOR Scaling,
GXMVECTOR RotationOrigin,
HXMVECTOR RotationQuaternion,
HXMVECTOR Translation);
So if I've got this right, there's almost two types of checking going on here? The first being the XM_CALLCONV and the other relying on the programmer loading right type of XMVECTOR variant into the function parameter list.
I'm kinda confused as to why this is. Why not just use one?
Things get even more confusing when you browse the following link:
Library Internals - Win32 apps | Microsoft Learn
And note the following quote from it:
For 32-bit Windows
For 32-bit Windows, there are two calling conventions available for efficient passing of __m128 values (which implements XMVECTOR on that platform). The standard is __fastcall, which can pass the first three __m128 values (XMVECTOR instances) as arguments to a function in a SSE/SSE2 register. __fastcall passes remaining arguments via the stack.
and now this one:
For 64-bit editions of Windows
For 64-bit Windows, there are two calling conventions available for efficient passing of __m128 values. The standard is __fastcall, which passes all __m128 values on the stack.
So for a more sophisticated version it doesn't use the SIMD registers and just puts everything on the stack? I don't understand why. Surely a more advanced version of Windows would put more stuff in the SIMD registers not less?
It's all left me a bit confused. It gets even more bizzare when you read up the details of __fastcall for an X86 plaform:
__fastcall | Microsoft Learn
Which is quoted as stating this:
Argument-passing order:
The first two DWORD or smaller arguments that are found in the argument list from left to right are passed in ECX and EDX registers; all other arguments are passed on the stack from right to left.
Two DWORDS? That's only 8 bytes. An XMVECTOR type is 16 bytes wide and yet the documentation states (I'll re-post the quote above):
For 32-bit Windows
For 32-bit Windows, there are two calling conventions available for efficient passing of __m128 values (which implements XMVECTOR on that platform).
The standard is __fastcall, which can pass the first three __m128 values (XMVECTOR instances) as arguments to a function in a SSE/SSE2 register. __fastcall passes remaining arguments via the stack.
So how on earth can one thing be saying __fastcall can handle two DWORDS for a total of 8 bytes, and the next saying it can accommodate three __m128 data types which amounts to 48 bytes?
I'm very confused, any insight would be most helpful (even if it's just to tell me I don't need worry about this), thanks