Thread: code execution speed - which is fastest?

  1. #1
    Registered User
    Join Date
    Apr 2015

    code execution speed - which is fastest?

    i have 2 different methods of writing to a buffer:

    Method 1:
    source = " ";
    while ((c = *source++) != '\0') {
        if (!space--) { i = 1; break; }
        *dest++ = c;
    if (i) break;
    Method 2:
    if ((space -= 6) < 0) { i = 1; break; }
    *dest++ = '&';
    *dest++ = 'n';
    *dest++ = 'b';
    *dest++ = 's';
    *dest++ = 'p';
    *dest++ = ';';
    variable space is the remaining bytes in the buffer. if its at zero then the buffer is full.
    My question is which of the 2 above methods is fastest?
    Last edited by onelinecode; 04-07-2015 at 11:22 AM.

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Measure by testing on typical data.

    That said, it looks like the two methods are not identical in net effect: method 1 will write to the destination until there is no space left whereas method 2 will write to the destination only if there is space for the entire source string.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Ticked and off
    Join Date
    Oct 2011
    La-la land
    Generally, when you write loops that profiling shows to be bottlenecks, try to see if you can write them in a simpler form, accessing fewer variables.

    Consider the following example code:
    #include <stdlib.h>
    #include <errno.h>
    typedef struct {
        size_t  size;
        size_t  used;
        char    data[];
    } mystr;
    int mystr_cat1(mystr *const s, const char *z)
        if (s == NULL)
            return EINVAL;
        if (z != NULL) {
            size_t n = s->size - s->used;
            while (*z)
                if (n-->0)
                    s->data[s->used++] = *(z++);
                    return ENOMEM;
        return 0;
    int mystr_cat2(mystr *const s, const char *z)
        if (s == NULL)
            return EINVAL;
        if (z == NULL)
            return 0;
        while (*z)
            if (s->size >= s->used)
                return ENOMEM;
                s->data[s->used++] = *(z++);
        return 0;
    int mystr_cat3(mystr *const s, const char *z)
        if (s == NULL)
            return EINVAL;
        if (z != NULL) {
            char *p = s->data + s->used;
            char *const q = s->data + s->size;
            while (*z)
                if (p >= q)
                    return ENOMEM;
                    *(p++) = *(z++);
        return 0;
    It is a simplified version of a C99 mutable string/data type I sometimes use.

    Using GCC-4.8.2 and heavy optimization for generic x86 (-O3 -fomit-frame-pointer -ftree-vectorize -mtune=i686 -m32), the three functions compile to the following assembly code:
    mystr_cat1:                        |   mystr_cat2:                        |   mystr_cat3:
        pushl   %esi                   |       pushl   %esi                   |       pushl   %ebx
        movl    $22, %eax              |       movl    $22, %eax              |       movl    8(%esp), %ebx
        pushl   %ebx                   |       pushl   %ebx                   |       movl    $22, %eax
        movl    12(%esp), %ebx         |       movl    12(%esp), %ecx         |       movl    12(%esp), %edx
        movl    16(%esp), %edx         |       movl    16(%esp), %edx         |       testl   %ebx, %ebx
        testl   %ebx, %ebx             |       testl   %ecx, %ecx             |       je      .L28
        je      .L2                    |       je      .L18                   |       testl   %edx, %edx
        testl   %edx, %edx             |       testl   %edx, %edx             |       je      .L30
        je      .L4                    |       je      .L20                   |       movl    4(%ebx), %eax
        movl    4(%ebx), %eax          |       cmpb    $0, (%edx)             |       movzbl  (%edx), %ecx
        movl    (%ebx), %ecx           |       je      .L20                   |       addl    %ebx, %eax
        subl    %eax, %ecx             |       movl    (%ecx), %esi           |       addl    (%ebx), %ebx
        cmpb    $0, (%edx)             |       movl    4(%ecx), %eax          |       addl    $8, %eax
        je      .L4                    |       cmpl    %eax, %esi             |       addl    $8, %ebx
        testl   %ecx, %ecx             |       jb      .L22                   |       testb   %cl, %cl
        movl    %ecx, %esi             |       jmp     .L21                   |       je      .L30
        jne     .L12                   |   .L23:                              |       cmpl    %ebx, %eax
        jmp     .L7                    |       cmpl    %esi, %eax             |       jb      .L32
    .L8:                               |       jbe     .L21                   |       jmp     .L31
        subl    $1, %esi               |   .L22:                              |   .L33:
        je      .L7                    |       addl    $1, %eax               |       cmpl    %ebx, %eax
    .L12:                              |       addl    $1, %edx               |       je      .L31
        addl    $1, %eax               |       movl    %eax, 4(%ecx)          |   .L32:
        addl    $1, %edx               |       movzbl  -1(%edx), %ebx         |       addl    $1, %eax
        movl    %eax, 4(%ebx)          |       movb    %bl, 7(%ecx,%eax)      |       addl    $1, %edx
        movzbl  -1(%edx), %ecx         |       cmpb    $0, (%edx)             |       movb    %cl, -1(%eax)
        movb    %cl, 7(%ebx,%eax)      |       jne     .L23                   |       movzbl  (%edx), %ecx
        cmpb    $0, (%edx)             |   .L20:                              |       testb   %cl, %cl
        jne     .L8                    |       xorl    %eax, %eax             |       jne     .L33
    .L4:                               |   .L18:                              |   .L30:
        xorl    %eax, %eax             |       popl    %ebx                   |       xorl    %eax, %eax
    .L2:                               |       popl    %esi                   |   .L28:
        popl    %ebx                   |       ret                            |       popl    %ebx
        popl    %esi                   |   .L21:                              |       ret
        ret                            |       movl    $12, %eax              |   .L31:
    .L7:                               |       popl    %ebx                   |       movl    $12, %eax
        movl    $12, %eax              |       popl    %esi                   |       popl    %ebx
        popl    %ebx                   |       ret                            |       ret
        popl    %esi                   |                                      |   
        ret                            |                                      |
    On generic x86-64 (-O3 -fomit-frame-pointer -ftree-vectorize -mtune=generic -m64), the assembly is
    mystr_cat1:                        |   mystr_cat2:                        |   mystr_cat3:
        testq   %rdi, %rdi             |       testq   %rdi, %rdi             |       testq   %rdi, %rdi
        movl    $22, %eax              |       movl    $22, %eax              |       movl    $22, %eax
        je      .L2                    |       je      .L17                   |       je      .L26
        testq   %rsi, %rsi             |       testq   %rsi, %rsi             |       testq   %rsi, %rsi
        je      .L4                    |       je      .L19                   |       je      .L28
        movq    8(%rdi), %rdx          |       cmpb    $0, (%rsi)             |       movq    (%rdi), %rdx
        movq    (%rdi), %rcx           |       je      .L19                   |       movq    8(%rdi), %rax
        subq    %rdx, %rcx             |       movq    (%rdi), %rcx           |       leaq    16(%rdi,%rdx), %rcx
        cmpb    $0, (%rsi)             |       movq    8(%rdi), %rax          |       movzbl  (%rsi), %edx
        je      .L4                    |       cmpq    %rax, %rcx             |       leaq    16(%rdi,%rax), %rax
        subq    %rdx, %rsi             |       jae     .L20                   |       testb   %dl, %dl
        testq   %rcx, %rcx             |       subq    %rax, %rsi             |       je      .L28
        jne     .L14                   |       jmp     .L21                   |       cmpq    %rcx, %rax
        jmp     .L7                    |   .L22:                              |       jb      .L30
    .L8:                               |       cmpq    %rcx, %rax             |       jmp     .L29
        subq    $1, %rcx               |       jbe     .L20                   |   .L31:
        je      .L7                    |   .L21:                              |       cmpq    %rcx, %rax
        movq    %rax, %rdx             |       addq    $1, %rax               |       je      .L29
    .L14:                              |       movq    %rax, 8(%rdi)          |   .L30:
        leaq    1(%rdx), %rax          |       movzbl  -1(%rsi,%rax), %edx    |       addq    $1, %rax
        movq    %rax, 8(%rdi)          |       movb    %dl, 15(%rdi,%rax)     |       addq    $1, %rsi
        movzbl  -1(%rsi,%rax), %r8d    |       cmpb    $0, (%rsi,%rax)        |       movb    %dl, -1(%rax)
        movb    %r8b, 15(%rdi,%rax)    |       jne     .L22                   |       movzbl  (%rsi), %edx
        cmpb    $0, 1(%rsi,%rdx)       |   .L19:                              |       testb   %dl, %dl
        jne     .L8                    |       xorl    %eax, %eax             |       jne     .L31
    .L4:                               |       ret                            |   .L28:
        xorl    %eax, %eax             |   .L17:                              |       xorl    %eax, %eax
        ret                            |       rep ret                        |       ret
    .L2:                               |   .L20:                              |   .L26:
        rep ret                        |       movl    $12, %eax              |       rep ret
    .L7:                               |       ret                            |   .L29:
        movl    $12, %eax              |                                      |       movl    $12, %eax
        ret                            |                                      |       ret
    We cannot tell much from just the length of the assembly -- especially x86-64 instruction timing is so complicated that a difference of a couple of instructions does not tell us anything at all.

    We can tell that on i686, mystr_cat3() used only one word of stack, whereas the others used two. You'd need to run some microbenchmarks to say anything about the speed. The object code lengths for the functions are 96, 80, and 79 bytes, respectively.

    On x86-64, no stack is used, because there are more registers available. mystr_cat1() and mystr_cat2() use funky addressing modes (%rsi+%rax+constant and %rdi+%rax+constant),whereas mystr_cat3() uses straight pointers. The object code lengths of the functions are 128, 112, and 97 bytes.

    Although I have not benchmarked the exact functions above, prior experience with GCC (and also ICC, Portland Group, and Pathscale compilers, although it's been a few years since my last extensive checks) indicates the latter form of these is most likely to yield good compiled code: it has the simplest form, using the fewest variables within the loop (including the loop condition).

    All this is to say: strive for simplicity. Do clever tricks only when profiling shows they really help with the run time; they tend to be difficult to maintain.

    Don't worry about whether code is fast when you write it; worry about whether it is as simple (and robust) as you can make it. Error checking is good, it makes code more robust (especially if you write sensible error messages), but otherwise, if you can make some code simpler, do so. For loops, that means thinking about whether you can avoid using one of the variables altogether.

    Most speedups in real life are achieved by switching to a faster algorithm. It is rarely useful to tweak such details as here (how best to iterate over some loop), in real-life code. Whenever I find my programs spending most of their time in some sort of copy loop, it seems more than likely that there is some way to avoid that copy altogether.
    Last edited by Nominal Animal; 04-07-2015 at 01:05 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Execution speed if statement
    By Satya in forum C Programming
    Replies: 3
    Last Post: 05-03-2014, 10:41 AM
  2. Replies: 3
    Last Post: 11-15-2012, 02:50 AM
  3. Execution speed slow initially, then speeds up
    By megafiddle in forum Windows Programming
    Replies: 22
    Last Post: 12-08-2011, 11:16 PM
  4. Execution speed test in c++
    By nick2price in forum C++ Programming
    Replies: 1
    Last Post: 03-12-2009, 04:23 PM
  5. Questions on Speed of Execution
    By wavering in forum C Programming
    Replies: 22
    Last Post: 01-20-2002, 02:04 PM