I am using GCC 4.6.2 on Mac OS X and am experimenting with vector extensions. The idea is to perform basic adjustments to still images for the purpose of evaluating the speed advantage of using vectors. However, I'm running into some unexpected behavior.
I have two functions. One processes a buffer of still image data without vectors (simple for loop). The second processes RGBA as a single vector, using a foor loop with 1/4 of the iterations of the first loop. The idea is to process four pixels at a time using the XMM SSE registers.
Here are the loops:
Code:
int i,j;
/*adjustment struct used to adjust color*/
struct {
uint8_t b;
uint8_t g;
uint8_t r;
uint8_t a;
} adjust;
/*image dimensions*/
int width,height;
/*calculate size of image data*/
long imgdata_size = (width*height)*sizeof(uint32_t);
/***scalar version***/
uint32_t *buf = (uint32_t*)malloc(imgdata_size);
/*
read image from disk, etc...
*/
for(i=0;i<height;i++){
for(j=0;j<width;j++){
buf[(i*width)+j] += *((uint32_t*)&adjust);
}
}
free(buf);
/***vector version***/
typedef uint32_t v4si __attribute__ ((vector_size (16)));
/*put four copies of adjustment struct into 128 bit vector*/
v4si adjust_vec = {*(uint32_t*)&adjust,*(uint32_t*)&adjust,*(uint32_t*)&adjust,*(uint32_t*)&adjust};
v4si *vecbuf = (v4si*)malloc(imgdata_size);
/*
read image from disk, etc...
*/
for(i=0;i<imgdata_size/sizeof(v4si);i++){
vecbuf[i] += adjust_vec;
}
free(vecbuf);
Now, both of these loops run OK. However, they both run at the same speed! How's that? One is using scalar values, one is using vectors. Shouldn't the vector version blow the scalar version away?
I checked the asm generated by GCC, and in fact GCC is vectorizing both. There is heavy use of XMM registers in both versions. The only way I can force GCC to not vectorize is if I compile as 32 bit. Compiling as 64 bit blows away 32 bit even if -O3 is used for both. Are 128 bit registers not available in 32 bit mode? If I try to turn off optimization in 64 bit mode, GCC still uses XMM registers. I can't keep it from using XMM no matter what I do, even if I explicitly set -O0 and -mno-sse (GCC complains "SSE register return with SSE disabled").
I guess the issue is that I don't see a lot of point in using vector extensions when GCC is vectorizing the code anyway. If this is the case, then what's the point of vector extensions? Is GCC just that good that it doesn't need vectors explicitly defined?