am I correct in the use of -
to move each of the floats in xmm0 one position to the right?Code:shufps xmm0 , xmm0 , 0x39
am I correct in the use of -
to move each of the floats in xmm0 one position to the right?Code:shufps xmm0 , xmm0 , 0x39
I wrote a little program that would generate the shuffle values - I find the documentation a bit unclear [both AMD and Intel's - probably because AMD read Intel's and "copied" it]. I suspect you find the same thing, otherwise you wouldn't post this question...
I think you can use a register for the "39" part, so you can easily write something that takes an input of "what shuffle value", and sticks 0x0102030405060708 in one register, and 0x1112131415161718 in the other side, then shuffles, and prints the result.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
Well, I suppose I could just write a short routine that iterates through every possible value until it finds the one that produces the desired results, except that the opcod requires an immediate value, nto an r/m which means hand writign 256 lines, or just writign oen line and using trial adn error til it gives the results i want.
Turns out it does what I want, but instead of shifting to the right, it shifts to the left
Last edited by abachler; 12-16-2007 at 09:50 AM.
Ok, so it must be that I wrote a piece of code that took the constants I wanted to try, so I updated the code according to what constant I _THOUGHT_ was right.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
to effect a shift to the left or right, the values to use are either 0x39 or 0x93, depending on which direction you want them to go.
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
The immediate operand of SHUFPS is of the form aa bb cc dd, where each of those is two bits selecting a word. The 128-bit target register is divided into four 32-bit areas a, b, c, d. a and b (the high-order singles) are filled with 32-bit areas from the source operand (xmm2/m128), c and d with 32-bit areas from the destination operand (xmm1). Which area that is is decided by the values of aa, bb, cc and dd: 00 is fd, 01 is fc, 10 is fb and 11 is fa.
For rotating to the right, you want to choose fa for b, fb for c, fc for d and fd for a. Thus, the bit pattern is 00 11 10 01. That's indeed 0x39.
At least, that's how I understand the documentation.
All the buzzt!
CornedBee
"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
Code:#include <xmmintrin.h> int main(int argc, char* argv[]){ //fclose(fopen("test.txt" , "w+b")); float* temp = &Info[0]; for(DWORD i=0;i<4;++i) temp[i] = i; printf("Before:\n%f\n%f\n%f\n%f\n\n" , temp[0] , temp[1] , temp[2] , temp[3]); __asm { mov edi , temp movups xmm0 , [edi] shufps xmm0 , xmm0 , 0x93 movups [edi] , xmm0 } printf("After:\n%f\n%f\n%f\n%f\n" , temp[0] , temp[1] , temp[2] , temp[3]); return 0; }
This one ought to work on GCC, Intel and MSVC.
Still confusing me, though.Code:#include <stdio.h> #include <xmmintrin.h> #if defined(__GNUC__) #define SSE_ALIGNED(type) type __attribute__((aligned(16))) #else #define SSE_ALIGNED(type) __declspec(align(16)) type #endif SSE_ALIGNED(float) data[] = { 1.0f, 2.0f, 3.0f, 4.0f }; int main(int argc, char* argv[]) { printf("Before:\n%f\n%f\n%f\n%f\n\n" , data[0] , data[1] , data[2] , data[3]); __m128 x = _mm_load_ps(data); x = _mm_shuffle_ps(x, x, 0x93); _mm_store_ps(data, x); printf("After:\n%f\n%f\n%f\n%f\n" , data[0] , data[1] , data[2] , data[3]); return 0; }
Hmm ... I think I get it. My analysis of the opcode itself is correct - it's the move that does the flip. The registers are drawn in big-endian order in all references, but the actual memory is little-endian. The lowest-address float (what would be leftmost when using ascending addresses from left to right) goes to the rightmost position in the register.
All the buzzt!
CornedBee
"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law