Thread: I can't figure out why this C code is running so slow...

  1. #1
    Registered User
    Join Date
    Jan 2010
    Posts
    4

    I can't figure out why this C code is running so slow...

    Hi,

    I recently learned a fair amount of inline assembly and decided to do a little test. I wrote a selection sort in C that sorts 100,000 integers. I then wrote a inline assembly version of the same thing. I figured that the assembly version was going to be a little slower (I was pretty sloppy and didn't worry about optimization too much), but when I ran the tests I was shocked to find out that the assembly version was running twice as fast!

    The C code is
    Code:
       for (i = 0; i < n-1; ++i) {
          min = i;
          
          for (j = i+1; j < n; ++j)
             if (data[j] < data[min])
                min = j;
          
          if (i != min) {
             j = data[i];
             data[i] = data[min];
             data[min] = j;
          }
       }
    I wrote the assembly code by looking at the C code and working my way through it. It's not heavily optimized, but it's not written extremely poorly either. My compiler is gcc 4.2.1, and I'm compiling both the C and inline assembly versions with -Wall -arch i386 -O2.

    time ./test_asm < randfile.txt > outfile.txt
    real 0m6.431s
    user 0m6.381s
    sys 0m0.012s

    time ./test_c < randfile.txt > outfile.txt
    real 0m13.518s
    user 0m13.477s
    sys 0m0.016s

    Is assembly really THAT much faster??

    EDIT: I am not attempting to start a flame war...
    Last edited by Stachelsk; 01-27-2010 at 11:51 PM.

  2. #2
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    If you want provide both codes and we can cross those results.

    EDIT: C produces assembly code, so you can check that as well and compare it with yours
    Last edited by C_ntua; 01-28-2010 at 12:00 AM.

  3. #3
    Registered User
    Join Date
    Jan 2010
    Posts
    4
    Code:
    /*******************************************************************************
     * Use this file to compare the amount of time it takes to run selection sort  *
     * code written in pure C to that of pure assembly. Use the linux command time *
     * in combination with this program to produce results.                        *
     ******************************************************************************/
    #include <stdio.h>
    #define SIZE 100000
    
    /* Uncomment the following line to use the assembly code. */
    #define USE_ASSEMBLY_CODE
    
    /*******************************************************************************
     * main - Read data from standard input, sort it, write it to standard output. *
     ******************************************************************************/
    int main() {
       int n, i, j, min;
       int data[SIZE];
    
       /* Read in the data */
       n = 0;
       while (scanf("%d", &data[n]) == 1)
          ++n;
       
    #ifndef USE_ASSEMBLY_CODE  
       for (i = 0; i < n-1; ++i) {
          min = i;
          
          for (j = i+1; j < n; ++j)
             if (data[j] < data[min])
                min = j;
          
          if (i != min) {
             j = data[i];
             data[i] = data[min];
             data[min] = j;
          }
       }
       
    #else
       __asm__ ("pushl   %%ebx\n\t"
                "xorl    %%eax, %%eax\n\t"
                "movl    %1, %%edx\n\t"
                "decl    %%edx\n\t"
                
                "check_outer:\n\t"
                "cmpl    %%edx, %%eax\n\t"
                "jge     done\n\t"
                
                "outer_loop:\n\t"
                "movl    %%eax, %%ecx\n\t"
    
                      "pushl   %%eax\n\t"
                      "incl    %%eax\n\t"
                      "check_inner:\n\t"
                      "cmpl    %1, %%eax\n\t"
                      "jge     resume_outer\n\t"
                
                      "inner_loop:\n\t"
                      "movl    (%0, %%eax, 4), %%ebx\n\t"
                      "cmpl    (%0, %%ecx, 4), %%ebx\n\t"
                      "jge     skip_inner_fix\n\t"
                      "movl    %%eax, %%ecx\n\t"
                
                      "skip_inner_fix:\n\t"
                      "incl    %%eax\n\t"
                      "jmp     check_inner\n\t"
                
                "resume_outer:\n\t"
                "popl    %%eax\n\t"
                
                "cmpl    %%eax, %%ecx\n\t"
                "je      skip_swap\n\t"
                "movl    (%0, %%eax, 4), %%ebx\n\t"
                "xchgl   %%ebx, (%0, %%ecx, 4)\n\t"
                "movl    %%ebx, (%0, %%eax, 4)\n\t"
                
                "skip_swap:\n\t"
                "incl    %%eax\n\t"
                "jmp     check_outer\n\t"
                
                "done:\n\t"
                "popl    %%ebx"
                :/* nothing to output */
                :"r" (&data), "r" (n)
                :"eax", "edx", "ecx"
               );
    #endif
       
       /* Print out the data */
       for (i = 0; i < n; ++i)
          printf("%d\n", data[i]);
       
       return 0;
    }
    I generated the numbers using
    Code:
    #include <cstdlib> 
    #include <iostream>
    
    using namespace std;
    
    int main() 
    { 
       for (int i = 0; i < 100000; i++)
          cout << rand() << endl;
       
       cout << "a";
    }
    Thanks for the help! I'm wondering if it's just my old GNU toolchain?

    EDIT: I noticed that the C code is hammering the memory. They're about the same number of instructions, though.
    Last edited by Stachelsk; 01-28-2010 at 12:27 AM.

  4. #4
    Registered User jeffcobb's Avatar
    Join Date
    Dec 2009
    Location
    Henderson, NV
    Posts
    875
    I would also like to see your compiler switches...
    C/C++ Environment: GNU CC/Emacs
    Make system: CMake
    Debuggers: Valgrind/GDB

  5. #5
    Registered User
    Join Date
    Jan 2010
    Posts
    412
    Quote Originally Posted by jeffcobb View Post
    I would also like to see your compiler switches...
    So would I. I'm guessing he is compiling the c version without optimizations.
    Compiling his code under mvsc with full optimizations enabled and the c version is almost twice as fast as the inline assembler.

    Edit:
    Reran the tests with gcc under cygwin
    Code:
    $ time ./asm_sort_gcc3 < in.txt > out.txt
    
    real    0m5.082s
    user    0m5.015s
    sys     0m0.030s
    
    $ time ./asm_sort_gcc4 < in.txt > out.txt
    
    real    0m5.088s
    user    0m4.984s
    sys     0m0.046s
    
    $ time ./c_sort_gcc3 < in.txt > out.txt
    
    real    0m3.660s
    user    0m3.578s
    sys     0m0.015s
    
    $ time ./c_sort_gcc4 < in.txt > out.txt
    
    real    0m9.596s
    user    0m9.530s
    sys     0m0.015s
    
    $ gcc-3 --version
    gcc-3 (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
    
    
    $ gcc-4 --version
    gcc-4 (GCC) 4.3.4 20090804 (release) 1
    All versions of the binary compiled with "-Wall -O3"
    I don't know much about gcc so it might be that I'm using the wrong switches, but if not then there seems to be a bug in gcc4's optimization because the gcc3 version is almost 3 times faster.
    Last edited by _Mike; 01-28-2010 at 02:03 AM. Reason: Fixing typos..

  6. #6
    Registered User
    Join Date
    Jan 2010
    Posts
    4
    I'm using -Wall -arch i386 -O2 for my compiler switches.

    I think the problem is that for some reason, the gcc that ships with Snow Leopard is horrendous. I looked at the assembly code output from an Ubuntu box and it was MUCH more efficient. For some reason, the gcc I have on this machine really, really likes memory accesses.

    Here's the assembly that my gcc produced (I'm on Snow Leopard):
    Code:
    	decl	%edx
    	movl	%edx, -400028(%ebp)
    	movl	$0, -400032(%ebp)
    	movl	-400028(%ebp), %eax
    	cmpl	%eax, -400032(%ebp)
    	jge	L23
    L6:
    	movl	-400032(%ebp), %edi
    	incl	%edi
    	cmpl	-400036(%ebp), %edi
    	jge	L7
    	movl	-400032(%ebp), %eax
    	movl	-400040(%ebp), %edx
    	leal	(%edx,%eax,4), %ecx
    	movl	%edi, %edx
    	movl	%eax, %esi
    	.align 4,0x90
    L9:
    	movl	4(%ecx), %eax
    	cmpl	-400024(%ebp,%esi,4), %eax
    	cmovl	%edx, %esi
    	incl	%edx
    	addl	$4, %ecx
    	cmpl	-400036(%ebp), %edx
    	jl	L9
    	cmpl	%esi, -400032(%ebp)
    	je	L7
    	movl	-400040(%ebp), %ecx
    	movl	-4(%ecx,%edi,4), %edx
    	movl	-400024(%ebp,%esi,4), %eax
    	movl	%eax, -4(%ecx,%edi,4)
    	movl	%edx, -400024(%ebp,%esi,4)
    L7:
    	movl	%edi, -400032(%ebp)
    	movl	-400028(%ebp), %eax
    	cmpl	%eax, -400032(%ebp)
    	jl	L6
    So I think my problem is that my toolchain sucks. Thanks for the help!

  7. #7
    Registered User
    Join Date
    Jan 2010
    Posts
    412
    Quote Originally Posted by Stachelsk View Post
    I'm using -Wall -arch i386 -O2 for my compiler switches.

    I think the problem is that for some reason, the gcc that ships with Snow Leopard is horrendous. I looked at the assembly code output from an Ubuntu box and it was MUCH more efficient. For some reason, the gcc I have on this machine really, really likes memory accesses.
    Same version of gcc on both os-x and ubuntu? Theoretically the compilers (assuming they are the same version) should produce the exact same code as long as the cpu architecture stays the same, shouldn't it?

  8. #8
    Registered User
    Join Date
    Jan 2010
    Posts
    4
    Quote Originally Posted by _Mike View Post
    Same version of gcc on both os-x and ubuntu? Theoretically the compilers (assuming they are the same version) should produce the exact same code as long as the cpu architecture stays the same, shouldn't it?
    Nope, sorry, should have mentioned. OSX is gcc 4.2.1 and Ubuntu is gcc 4.3.3

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. can't figure out what's with this code
    By silverstein101 in forum C++ Programming
    Replies: 8
    Last Post: 04-16-2008, 12:45 AM
  2. Replies: 27
    Last Post: 10-25-2007, 10:47 AM
  3. Running Exe in C++ Code.
    By Ti22 in forum C++ Programming
    Replies: 5
    Last Post: 03-23-2006, 01:29 PM
  4. Obfuscated Code Contest
    By Stack Overflow in forum Contests Board
    Replies: 51
    Last Post: 01-21-2005, 04:17 PM
  5. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM