This is my first try with GCC inline assembly. I followed an example from this paper (page 3) and wrote a small test program:
Code:
#include <stdio.h>

int main()
{
	int i;
	float a[4] = {2.0}, b[4] = {3.0}, c[4] = {-1.0};
	
	__asm__("movaps %1, %%xmm0 \n\t"	/* copy vector a[] to SSE register xmm0 */
		"movaps %2, %%xmm1 \n\t"	/* copy vector b[] to SSE register xmm1 */
		"divps %%xmm0, %%xmm1 \n\t" /* divide xmm0 by xmm1 and write result to xmm1 */
		"movaps %%xmm1, %0"			/* copy xmm1 to vector c[]	*/
		: "=m"	(c[0])	/*	output %0	*/
		: "m"	(a[0]),	/*	input %1	*/
		  "m"	(b[0]));/*	input %2	*/
	
	for(i = 0; i < 4; ++i)
		printf("%f", c[i]);
	printf("\n");
	
	return 0;	
}
It compiles well with GCC 3.3.1 (cygwin):
gcc -pedantic -W -Wall -masm=intel -o test.exe divtest.c

But the test program just crashes with "illegal instruction".
I guess there is an obvious mistake in my source code, but as I already wrote this is my first try.
Yes, my CPU supports SSE (Intel Celeron Tualatin (PIII core)).
Thank you for your help.

The stack trace:
Code:
Exception: STATUS_PRIVILEGED_INSTRUCTION at eip=004010DF
eax=BF800000 ebx=00000004 ecx=610CB16C edx=00000002 esi=00000000 edi=00000000
ebp=0022FEF0 esp=0022FE80 program=C:\test.exe
cs=001B ds=0023 es=0023 fs=0038 gs=0000 ss=0023
Stack trace:
Frame     Function  Args
0022FEF0  004010DF  (00000001, 616020CC, 0A040330, 0022FF24)
0022FF40  61005018  (610CFEE0, FFFFFFFE, 000003D4, 610CFE04)
0022FF90  610052ED  (00000000, 00000000, 8043138F, 00000000)
0022FFB0  004014C1  (00401055, 037F0009, 0022FFF0, 77E787F5)
0022FFC0  0040103C  (00000000, 00000000, 7FFDF000, 00000000)
0022FFF0  77E787F5  (00401000, 00000000, 000000C8, 00000100)
End of stack trace
Here is the example from the paper I mentioned above:
Code:
for (i=0;i<100;i+=4)
__asm__ __volatile__ (
"movaps %1, %%xmm0 \n\t"
"movaps %2, %%xmm1 \n\t"
"addps %%xmm0, %%xmm1 \n\t"
"movaps %%xmm1, %0"
:
"=m" (a[i])
:
"m" (b[i]),
"m" (c[i]));