Thread: SSE programming - Segmentation fault and how to print m128 variables

  1. #1
    Registered User
    Join Date
    Jul 2008
    Posts
    8

    SSE programming - Segmentation fault and how to print m128 variables

    Hi,
    I have questions and issues specific to SSE optimization in C. My code is as follows:

    Code:
    #include <stdlib.h>
    #include "xmmintrin.h"
    #include <stdio.h>
    #define NUM_ELEMS (32*1024)
    #define NUM_ITERS 10000
    
    /* Note: xmmintrin.h is a standard header file available under the GNU Open
             GPL license.  It contains definitions for C functions that wrap the SSE
             instruction set
    
    */
    
    float a ;
    float* x ;
    float* y ;
    
    
    int main(int argc, char **argv) {
    
        srand(1);  
        x = (float*) malloc(NUM_ELEMS * sizeof(float));
        y = (float*) malloc(NUM_ELEMS * sizeof(float));
        
        __m128 m1, m2, m3, m4;
    
        // type cast x, y and a to efficient intrinsic __m128 data type
        __m128* x = (__m128*) x ;
        __m128* y = (__m128*) y ;
        __m128 a = (__m128) a ;        
        
        for (int i = 0; i < NUM_ELEMS; i++) {   
    	
            *x = _mm_set_ps1((float)rand()/100000) ;
            *y = _mm_set_ps1((float)rand()/100000) ;
    
        }   
    
        for (int k = 0; k < NUM_ITERS; k++) {
            
            a = _mm_set_ps1(0.0) ;
    
            for ( int i = 0; i < NUM_ELEMS; i++ )
            {
                //a += (x[i] + y[i]) * (x[i] - y[i]);
            
                m1 = _mm_add_ps(*x,*y);        // m1 = x[i] + y[i] 
    
                m2 = _mm_sub_ps(*x,*y);        // m2 = x[i] - y[i]
    
                m3 = _mm_add_ps(m1, m2);        // m3 = (x[i] + y[i]) * (x[i] - y[i])
    
                a = _mm_add_ps(a, m3) ;  	    // a+= x[i] + y[i] * (x[i] - y[i])         
    
                x++ ;
                y++ ;
                
            }
         }
    
         //a = (float) a ;
    
         //fprintf(stderr, "a = %f\n", a);  
    
         return 0;
    
    }


    Q 1. I am getting a Segmentation fault at:

    *x = _mm_set_ps1((float)rand()/100000) ;

    at the very first loop iteration.


    The GDB output is:

    Program received signal SIGSEGV, Segmentation fault.
    0x00000000004006df in main (argc=2, argv=0x7fff303d8548) at prog-sse.c:45
    45 *x = _mm_set_ps1((float)rand()/100000) ;

    Why is it giving me a segmentation fault at memory that I have allocated?




    Q.2. How do you print values in m128 variables, i.e. what is the string replacement code for printf

    printf("value of my m128 variable is %?", My_m128_Variable) ;

    or do we have to copy it to a float buffer variable and print the buffer?


    Thanks.


    Saad

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    You could always check to see whether malloc returned NULL.

    Once you declare the new variable x, that immediately hides the global x -- which means that that initialization you're doing, doesn't use the old value of x, but the (garbage) value held by the new one. Use different letters.

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    malloc isn't necessarily aligned sufficiently for this. You may need to use _aligned_malloc instead, or the gcc equivalent.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    Registered User
    Join Date
    Jul 2008
    Posts
    8

    Segmentation fault - Progress

    Hi,
    I have implemented both of your suggestions.

    I have changed the names of the _m128 variables from x and y to m128x and m128y (per iMalc's suggestion.

    I am also now using the gcc equivalent of _aligned_malloc (which is posix_memalign) per tabstop's input. This time, the program progresses past the first segmentation fault. It is now Seg faulting at the 8886 'th iteration of the first loop (the loop that initializes the array elements to rand / 100000.

    I wonder if it is because the program has run out of memory, or is there a limit to the amount of memory that can be assigned to an array in C. In either or another case, what is the workaround/fix?

    Here is the revised code.

    Thanks.

    Code:
    #include <stdlib.h>
    #include "xmmintrin.h"
    #include <stdio.h>
    #define NUM_ELEMS (32*1024)
    #define NUM_ITERS 10000
    
    /* Note: xmmintrin.h is a standard header file available under the GNU Open
             GPL license.  It contains definitions for C functions that wrap the SSE
             instruction set
    
    */
    
    float a ;
    float* x ;
    float* y ;
    
    
    int main(int argc, char **argv) {
    
        int err = 0 ;
    
        srand(1);  
        //x = (float*) (NUM_ELEMS * sizeof(float));
        //y = (float*) malloc(NUM_ELEMS * sizeof(float));
    
            if (posix_memalign ((void) &x, 16, NUM_ELEMS * sizeof(float))) {
    	fprintf(stderr, "Error in aligned memory allocation");
            exit(-1) ;
        }
        
        if (posix_memalign ((void) &y, 16, NUM_ELEMS * sizeof(float))) {
    	fprintf(stderr, "Error in aligned memory allocation");
    	exit(-1) ;
        }
    
    
    
        __m128 m1, m2, m3, m4;
    
        // type cast x, y and a to efficient intrinsic __m128 data type
        __m128* m128x = (__m128*) x ;
        __m128* m128y = (__m128*) y ;
        __m128 a = (__m128) a ;        
    
    
        printf("Here2") ;
        
        for (int i = 0; i < NUM_ELEMS; i++) {   
    	
            *m128x = _mm_set_ps1((float)rand()/100000) ;
            *m128y = _mm_set_ps1((float)rand()/100000) ;
    	m128x++ ;
    	m128y++ ;
        }   
    
        printf("Here3") ;
    
        for (int k = 0; k < NUM_ITERS; k++) {
            
            a = _mm_set_ps1(0.0) ;
    
            for ( int i = 0; i < NUM_ELEMS; i++ )
            {
                //a += (x[i] + y[i]) * (x[i] - y[i]);
            
                m1 = _mm_add_ps(*m128x,*m128y);        // m1 = x[i] + y[i] 
    
                m2 = _mm_sub_ps(*m128x,*m128y);        // m2 = x[i] - y[i]
    
                m3 = _mm_add_ps(m1, m2);        // m3 = (x[i] + y[i]) * (x[i] - y[i])
    
                a = _mm_add_ps(a, m3) ;  	    // a+= x[i] + y[i] * (x[i] - y[i])         
    
                m128x++ ;
                m128y++ ;
                
            }
         }
    
         /*
         fprintf(stderr, "a = %f\n", a.m128_f32[0]);  
         fprintf(stderr, "a = %f\n", a.m128_f32[1]);  
         fprintf(stderr, "a = %f\n", a.m128_f32[2]);  
         fprintf(stderr, "a = %f\n", a.m128_f32[3]);  
         */
    
         return 0;
    
    }

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > if (posix_memalign ((void) &x, 16, NUM_ELEMS * sizeof(float)))
    No, you need sizeof your __m128's, not sizeof float

    You ran off the end of your array a long time ago, then started wandering through a minefield.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed