anyone tried avx?
i'm trying to port some 128b simd sse2 code for 256b AVX, but it is several times slower. i removed all 128b operations and datatypes to avoid state switching, and i installed win7 sp1 and vs2010sp1 and compiled with /arch:avx. still, it's about 4x slower than my previous 128b simd code.
not finding anything new on the interwebs now, so i thought i'd ask here.
looks like it may be a visual studio bug.
i made a test bed application, and if i put AVX code into a new project created in a "fresh" instance of visual studio, then i get the expected performance.
if i then replace the AVX code with the corresponding SSE2 code, i get what i expect from SSE2.
here's where it gets interesting: undoing those changes (back to AVX), rebuilding, and rerunning the application results in the same SSE2 execution times.
opening (another) clean project with (another) clean instance of VS yields the original AVX times again.
*sigh* looks like microfail.
Does performing a "clean" make any difference?
negative. clean & rebuild doesn't change anything.