shufps

This is a discussion on shufps within the Tech Board forums, part of the Community Boards category; am I correct in the use of - Code: shufps xmm0 , xmm0 , 0x39 to move each of the ...

  1. #1
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,189

    shufps

    am I correct in the use of -

    Code:
     
    shufps xmm0 , xmm0 , 0x39
    to move each of the floats in xmm0 one position to the right?
    Until you can build a working general purpose reprogrammable computer out of basic components from radio shack, you are not fit to call yourself a programmer in my presence. This is cwhizard, signing off.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I wrote a little program that would generate the shuffle values - I find the documentation a bit unclear [both AMD and Intel's - probably because AMD read Intel's and "copied" it]. I suspect you find the same thing, otherwise you wouldn't post this question...

    I think you can use a register for the "39" part, so you can easily write something that takes an input of "what shuffle value", and sticks 0x0102030405060708 in one register, and 0x1112131415161718 in the other side, then shuffles, and prints the result.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,189
    Well, I suppose I could just write a short routine that iterates through every possible value until it finds the one that produces the desired results, except that the opcod requires an immediate value, nto an r/m which means hand writign 256 lines, or just writign oen line and using trial adn error til it gives the results i want.

    Turns out it does what I want, but instead of shifting to the right, it shifts to the left
    Last edited by abachler; 12-16-2007 at 08:50 AM.
    Until you can build a working general purpose reprogrammable computer out of basic components from radio shack, you are not fit to call yourself a programmer in my presence. This is cwhizard, signing off.

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Ok, so it must be that I wrote a piece of code that took the constants I wanted to try, so I updated the code according to what constant I _THOUGHT_ was right.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,189
    to effect a shift to the left or right, the values to use are either 0x39 or 0x93, depending on which direction you want them to go.
    Until you can build a working general purpose reprogrammable computer out of basic components from radio shack, you are not fit to call yourself a programmer in my presence. This is cwhizard, signing off.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by abachler View Post
    to effect a shift to the left or right, the values to use are either 0x39 or 0x93, depending on which direction you want them to go.
    Yes, I seem to remember something like that [but I also had some case where I was shuffling pd type data, which has a different mask value].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    The immediate operand of SHUFPS is of the form aa bb cc dd, where each of those is two bits selecting a word. The 128-bit target register is divided into four 32-bit areas a, b, c, d. a and b (the high-order singles) are filled with 32-bit areas from the source operand (xmm2/m128), c and d with 32-bit areas from the destination operand (xmm1). Which area that is is decided by the values of aa, bb, cc and dd: 00 is fd, 01 is fc, 10 is fb and 11 is fa.

    For rotating to the right, you want to choose fa for b, fb for c, fc for d and fd for a. Thus, the bit pattern is 00 11 10 01. That's indeed 0x39.

    At least, that's how I understand the documentation.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,189
    Code:
    #include <xmmintrin.h>
    
    int main(int argc, char* argv[]){
    
        //fclose(fopen("test.txt" , "w+b"));
        float* temp = &Info[0];
        for(DWORD i=0;i<4;++i) temp[i] = i;
        printf("Before:\n&#37;f\n%f\n%f\n%f\n\n" , temp[0] , temp[1] , temp[2] , temp[3]);
        __asm {
            mov         edi , temp
            movups      xmm0 , [edi]
            shufps      xmm0 , xmm0 , 0x93 
            movups      [edi] , xmm0
            }
        printf("After:\n%f\n%f\n%f\n%f\n" , temp[0] , temp[1] , temp[2] , temp[3]);
        return 0;
        }
    Until you can build a working general purpose reprogrammable computer out of basic components from radio shack, you are not fit to call yourself a programmer in my presence. This is cwhizard, signing off.

  9. #9
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    This one ought to work on GCC, Intel and MSVC.
    Code:
    #include <stdio.h>
    #include <xmmintrin.h>
    
    #if defined(__GNUC__)
    #define SSE_ALIGNED(type) type __attribute__((aligned(16)))
    #else
    #define SSE_ALIGNED(type) __declspec(align(16)) type
    #endif
    
    SSE_ALIGNED(float) data[] = { 1.0f, 2.0f, 3.0f, 4.0f };
    
    int main(int argc, char* argv[])
    {
        printf("Before:\n&#37;f\n%f\n%f\n%f\n\n" , data[0] , data[1] , data[2] , data[3]);
        __m128 x = _mm_load_ps(data);
        x = _mm_shuffle_ps(x, x, 0x93);
        _mm_store_ps(data, x);
        printf("After:\n%f\n%f\n%f\n%f\n" , data[0] , data[1] , data[2] , data[3]);
        return 0;
    }
    Still confusing me, though.


    Hmm ... I think I get it. My analysis of the opcode itself is correct - it's the move that does the flip. The registers are drawn in big-endian order in all references, but the actual memory is little-endian. The lowest-address float (what would be leftmost when using ascending addresses from left to right) goes to the rightmost position in the register.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21