If you are on the Intel/AMD plattform, have a look at the following webpages:
http://gcc.gnu.org/onlinedocs/gcc/X8...Functions.html
http://gcc.gnu.org/onlinedocs/gcc/Ve...r%20Extensions
example source code for SSE instructions:
Code:
typedef float v4sf __attribute__ ((mode(V4SF)));
v4sf v1, v2;
float data[8], tmp[4];
/* fill array data */
...
v1 = __builtin_ia32_loadaps(data);
v2 = __builtin_ia32_loadaps(&data[4]);
v1 = __builtin_ia32_mulps(v1, v2);
__builtin_ia32_storeaps(tmp, v1);
Now, the content of the array tmp is as follows:
tmp[0] = data[0] * data[4];
tmp[1] = data[1] * data[5];
tmp[1] = data[2] * data[6];
tmp[1] = data[3] * data[7];