I retouched the assembly to perform an optimized loop
Code:
int main(int argc, char* argv[]){
double Input = 1.000;
__declspec(align(16)) double SIMD_Input[] = {1.0 , 1.0};
DWORD temp;
Input = 1.0;
Start = GetTickCount();
Stop = Start + 100000;
temp = 0;
while(temp < 16777216){
Input = sin(atan(Input));
temp++;
}
printf("%d\t%f\n" , (GetTickCount() - Start) , Input);
Input = 1.0;
Start = GetTickCount();
Stop = Start + 100000;
__asm push ecx
__asm mov ecx , 0x01000000
begin: __asm fld Input
__asm fld1
__asm fpatan
__asm fsin
__asm fstp Input
__asm loop begin
__asm pop ecx
printf("%d\t%f\n" , (GetTickCount() - Start) , Input);
return 0;
}
the new results are:
4015 ms for the C/C++
3094 ms for the assembly
anyone care to suggest an optimized loop in C/C++ if you think thats the problem?