i'm trying to port some 128b simd sse2 code for 256b AVX, but it is several times slower. i removed all 128b operations and datatypes to avoid state switching, and i installed win7 sp1 and vs2010sp1 and compiled with /arch:avx. still, it's about 4x slower than my previous 128b simd code.
not finding anything new on the interwebs now, so i thought i'd ask here.



LinkBack URL
About LinkBacks


