gcc: vector instructions - but how?

This is a discussion on gcc: vector instructions - but how? within the C Programming forums, part of the General Programming Boards category; Hi, I've recently discovered that gcc enables the C programmer to use multimedia extensions of desktop processors (MMX, SSE, SSE2, ...

  1. #1
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166

    Question gcc: vector instructions - but how?

    Hi,

    I've recently discovered that gcc enables the C programmer to use multimedia extensions of desktop processors (MMX, SSE, SSE2, 3DNOW) without having to switch to assembly.
    On the following two sites only the headers of the built-in functions are listed:

    http://gcc.gnu.org/onlinedocs/gcc-3....r%20Extensions

    http://gcc.gnu.org/onlinedocs/gcc-3....in%20Functions

    But I am searching for some information on how to put int/float values into and extracting the single values from the vectors described on the sites above.
    I've only discovered some sourcecode for AltiVec but I guess that is not what I am looking for.

    Thank you for your help.

  2. #2
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    Useful info! After some playing around...
    compiled with "gcc -mmmx myprog.c"

    Code:
    #include <stdlib.h>
    
    typedef int di __attribute__ ((mode(DI)));
    union {
        di xx;
        int ii[2];
    } zz;
    
    int main( int argc, char** argv ) {
        di aa = 0xffeeddcc99887766;
        di bb = 0x445566778899aabb;
        zz.xx = aa;
        printf("%x %x\n", zz.ii[0], zz.ii[1]);
        zz.xx = bb;
        printf("%x %x\n", zz.ii[0], zz.ii[1]);
        zz.xx = __builtin_ia32_pxor (aa, bb);
        printf("%x %x\n", zz.ii[0], zz.ii[1]);
        exit( 0 );
    }
    
    ==== sample run ====
    > ./a.out 
    99887766 ffeeddcc
    8899aabb 44556677
    1111dddd bbbbbbbb
    ==== Notice the little endian ====
    Last edited by rafe; 12-04-2002 at 05:08 PM.

  3. #3
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    THose sites seem to explain enough. Make sure to turn on the MMX (or whatever you are using) in the command line. Just keep hitting the search engines if you are still having trouble.

  4. #4
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    @rafe:
    Thnx alot for the source code!
    Took me quite some time (5 min but it felt like eternity ) to grasp the meaning of it - but now I got it.

    ... or not?
    I still don't know how to compose a vector.
    e.g.: v4hi - a vector of four 16 bit integers

    Any help is appreciated!

    @master5001:
    It is quite frustrating if after hours of searching nothing useful turns up ... but maybe I am just using the wrong keywords.

  5. #5
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    AH! Now I got it!

    Thank you very much again for your source code, rafe!

  6. #6
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    You're welcome. && Just to drive the point home to others not as motivated as you.

    Now I'm going to look at how to get the mmx registers worked the way I want. I'm not allowed to use assembly directly but this is... well a way to cheat that rule. Bury this in some lib && I'll only have to change one module **IF** the code migrates.
    Code:
    #include <stdlib.h>
    
    typedef int di   __attribute__ ((mode(DI)));
    typedef int v4hi __attribute__ ((mode(V4HI)));
    typedef int   dword;
    typedef short word;
    
    union ZZ {
        di    dd;
        v4hi  hh;
        dword ii[2];
        word  ss[4];
    } zz;
    
    int main( int argc, char** argv ) {
        union ZZ aa;
        union ZZ bb;
        aa.dd = 0x1111222233334444;
        bb.dd = 0x1111111111111111;
        zz.dd = aa.dd;
        printf("%08x %08x\n", zz.ii[0], zz.ii[1]);
        zz.dd = bb.dd;
        printf("%08x %08x\n", zz.ii[0], zz.ii[1]);
        zz.hh = __builtin_ia32_psubw (aa.hh, bb.hh);
        printf("%08x %08x\n", zz.ii[0], zz.ii[1]);
        exit( 0 );
    }
    === Another run ===
    > ./a.out 
    33334444 11112222
    11111111 11111111
    22223333 00001111
    === Little endian again ===
    Last edited by rafe; 12-05-2002 at 10:31 AM.

  7. #7
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    Thank you for the 2nd example, rafe.

    But I still got a problem with vectors:

    I've just tried to use the SSE instructions of my Celeron Tualatin with this little program:
    Code:
    #include <stdlib.h>
    
    typedef float v4sf __attribute__ ((mode(V4SF)));
    
    typedef union 
    {
        v4sf v;
        float p[4];
    }TEST;
    
    int main()
    {
       int i;
       TEST test, test2;
       	
       test.p[0] = 11111.12349;
       test.p[1] = 12378.4357;
       test.p[2] = 12343.2387;
       test.p[3] = 23498.23489;
       
       test2.p[0] = 2.5;
       test2.p[1] = 3.2;
       test2.p[2] = 4.9;
       test2.p[3] = 7.3;
       	
       test.v = __builtin_ia32_divps(test.v, test2.v);
       	
       for(i = 0; i < 4; i++)
       	printf("%f ", test.p[i]);
       	
       printf("\n");
    
       return 0;
    }
    compiled with "-msse" option

    But the program crashes with "Illegal instruction (core dumped)":
    Code:
    Exception: STATUS_PRIVILEGED_INSTRUCTION at eip=004010EC
    eax=00000000 ebx=00000000 ecx=610C819C edx=00000002 esi=00000000 edi=00402910
    ebp=0022FEF0 esp=0022FE90 program=C:\test\vector.exe
    cs=001B ds=0023 es=0023 fs=0038 gs=0000 ss=0023
    Stack trace:
    Frame     Function  Args
    0022FEF0  004010EC  (00000001, 615F0740, 0A040330, 0022FF24)
    0022FF40  610072E8  (610CBAA8, FFFFFFFE, 0000002C, 610CB9CC)
    0022FF90  610075CD  (00000000, 00000000, 80430F47, 00000000)
    0022FFB0  00402702  (00401096, 037F0009, 0022FFF0, 77E8CA90)
    0022FFC0  0040103C  (0022E7A4, 0022F844, 7FFDF000, 0022E8CC)
    0022FFF0  77E8CA90  (00401000, 00000000, 000000C8, 00000100)
    End of stack trace
    I am using Cygwin with gcc 3.2 on W2k.
    I hope someone can help me, because I don't know what I have done wrong in the source code ...
    Last edited by Sargnagel; 12-05-2002 at 12:47 PM.

  8. #8
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    Grrr! this was a bear to figure out! Short answer: YOU didn't do anything wrong, this is a gcc snafu.

    Apparently GCC doesn't always align vars in main BUT is does align them in other functions. I'll bet there's a switch for this but I haven't gotten that far yet. Solution: Move things to a function & all is OK with the world again.

    I stumbled on this dialog:
    http://gcc.gnu.org/ml/gcc/2002-07/msg00450.html
    http://gcc.gnu.org/ml/gcc/2002-07/msg00453.html

    Code:
    #include <stdlib.h>
    
    typedef float v4sf __attribute__ ((mode(V4SF)));
    
    typedef union {
        v4sf v;
        float p[4];
    } TEST;
    
    void myadd( ) {
       int i;
       TEST test, test2;
    
       test.p[0] = 1.12;
       test.p[1] = 7.43;
       test.p[2] = 4.23;
       test.p[3] = 9.23;
    
       test2.p[0] = 2.5;
       test2.p[1] = 3.2;
       test2.p[2] = 4.9;
       test2.p[3] = 7.3;
    
       printf("before\n");
       for(i = 0; i < 4; i++) printf("%f ", test2.p[i]);
       printf("\n");
       for(i = 0; i < 4; i++) printf("%f ", test.p[i]);
       printf("\n");
    
    //   __builtin_ia32_addps (test.v, test2.v);
       test.v = __builtin_ia32_addps(test.v, test2.v);
    
       printf("after\n");
       for(i = 0; i < 4; i++) printf("%f ", test.p[i]);
       printf("\n");
       for(i = 0; i < 4; i++) printf("%f ", test2.p[i]);
       printf("\n");
    
    }
    
    int main() {
        myadd();
       return 0;
    }
    
    ==== Sample output ====
    > gcc -msse test.c && a.out 
    before
    2.500000 3.200000 4.900000 7.300000 
    1.120000 7.430000 4.230000 9.230000 
    after
    3.620000 10.630000 9.130000 16.529999 
    2.500000 3.200000 4.900000 7.300000
    ==== End ====
    PS: It works with divps too... it was just trying to get things so that I could chek the math in my head.

  9. #9
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    Good to hear that I did everything right.

    I've been searching, too, but I did not find the answer to this gcc problem.

    Thank you very much again.

    I hope I will not come across more gcc flaws when I modify my source code to use sse instructions ... I only hope the work will result in a nice speedup ...

    ..... AH!!! One question left:
    Is there a predefined vector function that takes a vector of float/int and returns the product of the numbers in the vector? Or do I have to multiply the single values "by hand"?

  10. #10
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    Sargnagel,

    Would you please do me the favor of letting me know if you get any speedup using these functions? Or how much into register manipulations you had to get to realize the speedup?

    This is a rather curious experiment you're performing & I'm quite intersted in your results.

    thanks

  11. #11
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    @rafe:
    No problem. I will inform you of the speedup!
    I hope I will find some time to finish my scientific application this year and after that I want to add the SSE instructions as a little experiment.

    register manipulations
    Well, if you could tell me what they are and how to manipulate the registers at the source code level.
    I would still call myself a beginner although I have started programming C in the first quarter of 2002. But as you've noticed already I am very motivated and I am working on my skills nearly every day.

  12. #12
    Registered User
    Join Date
    Oct 2002
    Posts
    46
    Well, if you could tell me what they are and how to manipulate the registers at the source code level.
    The section in the doc just before the one we were using:
    http://gcc.gnu.org/onlinedocs/gcc-3....it%20Reg%20Var

    Maybe we shouln't go down this rabbit hole just yet & see if you can get away without them. HLL has got to have *some* advantages over assembly
    the project seems like fun... enjoy.

    I hope I will find some time to finish my scientific application this year and after that I want to add the SSE instructions as a little experiment.
    'til then..

  13. #13
    Registered User Sargnagel's Avatar
    Join Date
    Aug 2002
    Posts
    166
    Variables in Specified Registers

    That sounds exciting! I will have a look at it.

    HLL has got to have *some* advantages over assembly
    hehe ... at least I hope so

    The project is great. I have written a program to calculate special protein properties (molecular biology). 100% portability was the main target. I've finished it a few months ago but now I have finally found some time to clean up/optimize the source code and merge several small programs into one application.
    After that I will mess around with SSE.

    I will notify you when I am finished.
    Last edited by Sargnagel; 12-06-2002 at 01:57 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Profiler Valgrind
    By afflictedd2 in forum C++ Programming
    Replies: 4
    Last Post: 07-18-2008, 10:38 AM
  2. Replies: 4
    Last Post: 09-02-2007, 09:47 PM
  3. Compiles on gcc 3.3 but not on gcc 4.0.3
    By cunnus88 in forum C++ Programming
    Replies: 5
    Last Post: 03-29-2007, 01:24 PM
  4. gcc
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 21
    Last Post: 10-22-2003, 04:46 PM
  5. Mixing gcc 2.9* with gcc 3.*
    By rotis23 in forum Linux Programming
    Replies: 1
    Last Post: 07-19-2003, 01:21 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21