Thread: Why the compiler creates a label and jumps for the code in the "else" block?

  1. #1
    Registered User
    Join Date
    Oct 2021
    Posts
    138

    Why the compiler creates a label and jumps for the code in the "else" block?

    Ok, the title may sound a little bit confusing so I will explain. So first of all let's see the example code I'm going to demonstrate:

    Code:
    #include "stdio.h"
    
    int main() {
      int val;
      scanf("%d", &val);
      int val2 = 3;
    
      if (val > val2) {
        printf("val is bigger than val2!\n");
      }
    
      else {
        printf("val is smaller than val2!\n");
      }
    }

    Now that's a pretty simple code snippet. Gets a value form the user, compares two values and print a message from one of the two branches.

    Well, I'm a n00b when it comes to assembly (hence this question) and I wasn't able to make "scanf" work but I run a version where instead of "scanf", it uses another integer and it compares them. Here is the code (GAS syntax):

    Code:
    .global _start, printf
    
    .data
      SCANF_PROMPT: .string "%d"
      PROMPT1: .string "val is bigger than val2!\n"
      PROMPT2: .string "val is smaller than val2!\n"
    
    .text
      _start:
        mov $50, %r8
        mov $1,  %r9
    
        cmp %r9, %r8
        jg _lab
    
        mov $0, %rax
        mov $PROMPT2, %rdi
        call printf
    
        _back:
        mov $0, %rax
        mov $0, %rdi
        call exit
    
      _lab:
        mov $0, %rax
        mov $PROMPT1, %rdi
        call printf
    
        jmp _back
    Compile this using
    Code:
    as test.asm -o test.o -g && ld test.o -o test  -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -g
    on Linux and run the program to get the first branch, as expected!

    Now what's the problem you will ask. So, If you see my code, I created a label called "_lab" for the first branch (if <condition>). So if the condition is true, it will jump to the "_lab", execute the code and then jump back. However! For the second branch (else), I simply let the code keep executing and then created another label called "_back" in the end of the "else" branch. This is where the "if" branch will return when the code there is over.

    So "else" is not a true branch by definition. It makes sense to me as branching and jumping is slow and I don't see a reason for "else" to be a branch as there is no condition to check in "else" so it can just be used this way and avoid the unnecessary overhead.

    However, I tried to generate assembly using "GCC" and "Clang" and they seem to make two comparisons and treat both of them as labels and jump. Unless I don't understand the code (cause like I said I'm a newbie and I couldn't even make the "scanf" work), why do they do it this way? Of course I'm not implying that I'm smarter than the thousands of more experienced and hard working programmers that have contributed in these projects so I wonder where I'm wrong and what I can't see.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,666
    > However, I tried to generate assembly using "GCC" and "Clang" and they seem to make two comparisons and treat both of them as labels and jump.
    Well if you turn on optimisation, you end up with assembler that looks a lot more like what you crafted by hand.

    The primary purpose of unoptimised code is to
    a) make compilation quick
    b) make debugging sane

    Trying to debug optimised code can be a weird experience.

    gcc -S foo.c
    Code:
    main:
    .LFB0:
            .cfi_startproc
            endbr64
            pushq   %rbp
            .cfi_def_cfa_offset 16
            .cfi_offset 6, -16
            movq    %rsp, %rbp
            .cfi_def_cfa_register 6
            subq    $16, %rsp
            movq    %fs:40, %rax
            movq    %rax, -8(%rbp)
            xorl    %eax, %eax
            leaq    -16(%rbp), %rax
            movq    %rax, %rsi
            leaq    .LC0(%rip), %rdi
            movl    $0, %eax
            call    __isoc99_scanf@PLT
            movl    $3, -12(%rbp)
            movl    -16(%rbp), %eax
            cmpl    %eax, -12(%rbp)
            jge     .L2
            leaq    .LC1(%rip), %rdi
            call    puts@PLT
            jmp     .L3
    .L2:
            leaq    .LC2(%rip), %rdi
            call    puts@PLT
    .L3:
            movl    $0, %eax
            movq    -8(%rbp), %rdx
            xorq    %fs:40, %rdx
            je      .L5
            call    __stack_chk_fail@PLT
    .L5:
            leave
            .cfi_def_cfa 7, 8
            ret
    vs gcc -S -O2 foo.c
    Code:
    main:
    .LFB23:
            .cfi_startproc
            endbr64
            subq    $24, %rsp
            .cfi_def_cfa_offset 32
            leaq    .LC0(%rip), %rdi
            movq    %fs:40, %rax
            movq    %rax, 8(%rsp)
            xorl    %eax, %eax
            leaq    4(%rsp), %rsi
            call    __isoc99_scanf@PLT
            cmpl    $3, 4(%rsp)
            jle     .L2
            leaq    .LC1(%rip), %rdi
            call    puts@PLT
    .L3:
            movq    8(%rsp), %rax
            xorq    %fs:40, %rax
            jne     .L7
            xorl    %eax, %eax
            addq    $24, %rsp
            .cfi_remember_state
            .cfi_def_cfa_offset 8
            ret
    .L2:
            .cfi_restore_state
            leaq    .LC2(%rip), %rdi
            call    puts@PLT
            jmp     .L3
    .L7:
            call    __stack_chk_fail@PLT
            .cfi_endproc
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    The reason why compilers like GCC and CLANG (I don't deal with MSVC. It is a ........ty compiler) chooses to do conditional jumps forward to ELSE block is because it is assumed the THEN block will happen more often. Conditional jumps forward implies on a branch predictor penalty (static branch prediction algorithm, inside your processor assumes, in advance, that all forward conditional jumps are **NOT** taken), or about extra 5 clock cycles (at least, there is a L1I cache miss problem to be evaluated, depending on the code).

    So...
    Code:
    if (x < 0) f(); else g();
    This code assumes x < 0 most of the time and, necessarily, it is encoded as
    Code:
    # assuming eax=x:
      testl %eax,%eax
      jns  1f      # Branch predictor penalty if x >= 0!
                   # The processor assumes x < 0 most of the time.
      call f
      jmp 2f     # no penalty here.
    1:
      call g
    2:
    With GCC and CLANG you can force the kind of conditional jump using the builtin function __builtin_expect(), as in:
    Code:
    if ( __builtin_expect( x < 0, 0 ) ) f(); else g();
    Now x >= 0 is expected most of the time...

    Sometimes you can see macros like likely() and unlikely() in certain codes:
    Code:
    #define likely(x) __builtin_expect( !!(x), 1 )
    #define unlikely(x) __builtin_expect( !!(x), 0 )
    As, in, for example, Linux kernel source code.

    PS: This is totally unecessary with loops, since GCC/CLANG tends to put the condition later in code... for example:
    Code:
    while ( x >= 0 ) { f(); x--; }
    Will create something like this:
    Code:
    ...
      # Assuming x in ECX.
      jmp   2f
    1:
      call  f
      # Notice: the test is at the bottom 'cause conditional jmps backwards are assumed
      # always taken, in advance.
      decl   %ecx
    2:
      test %ecx,%ecx
      jns  1b
    Last edited by flp1969; 03-09-2022 at 07:35 AM.

  4. #4
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Salem View Post
    > However, I tried to generate assembly using "GCC" and "Clang" and they seem to make two comparisons and treat both of them as labels and jump.
    Well if you turn on optimisation, you end up with assembler that looks a lot more like what you crafted by hand.

    The primary purpose of unoptimised code is to
    a) make compilation quick
    b) make debugging sane

    Trying to debug optimised code can be a weird experience.

    gcc -S foo.c
    Code:
    main:
    .LFB0:
            .cfi_startproc
            endbr64
            pushq   %rbp
            .cfi_def_cfa_offset 16
            .cfi_offset 6, -16
            movq    %rsp, %rbp
            .cfi_def_cfa_register 6
            subq    $16, %rsp
            movq    %fs:40, %rax
            movq    %rax, -8(%rbp)
            xorl    %eax, %eax
            leaq    -16(%rbp), %rax
            movq    %rax, %rsi
            leaq    .LC0(%rip), %rdi
            movl    $0, %eax
            call    __isoc99_scanf@PLT
            movl    $3, -12(%rbp)
            movl    -16(%rbp), %eax
            cmpl    %eax, -12(%rbp)
            jge     .L2
            leaq    .LC1(%rip), %rdi
            call    puts@PLT
            jmp     .L3
    .L2:
            leaq    .LC2(%rip), %rdi
            call    puts@PLT
    .L3:
            movl    $0, %eax
            movq    -8(%rbp), %rdx
            xorq    %fs:40, %rdx
            je      .L5
            call    __stack_chk_fail@PLT
    .L5:
            leave
            .cfi_def_cfa 7, 8
            ret
    vs gcc -S -O2 foo.c
    Code:
    main:
    .LFB23:
            .cfi_startproc
            endbr64
            subq    $24, %rsp
            .cfi_def_cfa_offset 32
            leaq    .LC0(%rip), %rdi
            movq    %fs:40, %rax
            movq    %rax, 8(%rsp)
            xorl    %eax, %eax
            leaq    4(%rsp), %rsi
            call    __isoc99_scanf@PLT
            cmpl    $3, 4(%rsp)
            jle     .L2
            leaq    .LC1(%rip), %rdi
            call    puts@PLT
    .L3:
            movq    8(%rsp), %rax
            xorq    %fs:40, %rax
            jne     .L7
            xorl    %eax, %eax
            addq    $24, %rsp
            .cfi_remember_state
            .cfi_def_cfa_offset 8
            ret
    .L2:
            .cfi_restore_state
            leaq    .LC2(%rip), %rdi
            call    puts@PLT
            jmp     .L3
    .L7:
            call    __stack_chk_fail@PLT
            .cfi_endproc
    Thanks for the info. Tbh, I don't understand the code but still it looks like a lot of labels and jumps for me. Unless it does some startup stuff that I don't get but if all that is for the "if" statement that it still doesn't make sense. Of course I'm talking about the optimized version all this time.

    I will learn and I will probably be able to understand it in the future. Thanks a lot for the great info!

  5. #5
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by rempas View Post
    Thanks for the info. Tbh, I don't understand the code but still it looks like a lot of labels and jumps for me. Unless it does some startup stuff that I don't get but if all that is for the "if" statement that it still doesn't make sense. Of course I'm talking about the optimized version all this time.
    Maybe some comments would help:
    Code:
      .section  .rodata
    
    .LC0:
      .string "%d"
    .LC1:
      .string "val is bigger than val2!"
    .LC2:
      .string "val is smaller than val2!"
    
      .text
    
      .globl main
    main:
      endbr64                     # Modern compilers put this by default
                                  # trying to prevent branch prediction side efects.
    
      subq    $24, %rsp           # reserve space for 'val' (aligning the stack pointer).
    
      leaq    .LC0(%rip), %rdi    # puts "%d" pointer in RDI (used by __isoc99_scanf@PLT).
    
      movq    %fs:40, %rax        # used by the stack protector...
      movq    %rax, 8(%rsp)
    
      xorl    %eax, %eax          # __isoc99 needs to know if there are any floating
                                  # point in use... (no? EAX=0).
    
      leaq    4(%rsp), %rsi       # load 'val' pointer in RSI.
    
      call    __isoc99_scanf@PLT  # scanf( "%d", &val );
    
      cmpl    $3, 4(%rsp)         # val <= 3 ?
      jle     .L2                 # yes, jump to .L2
    
      leaq    .LC1(%rip), %rdi    # no, load pointer to "val is bigger..." to RDI
      call    puts@PLT            # and call puts@PLT
    
      # Here is the end of the routine...
    .L3:
      movq    8(%rsp), %rax       # used by the stack protector.
      xorq    %fs:40, %rax
      jne     .L7
    
      xorl    %eax, %eax          # main returns 0 (by default with GCC).
    
      addq    $24, %rsp           # dealloc the space previously allocated on stack.
      ret                         # return to the caller.
    
      # if val > 3, gets here.
    .L2:
      leaq    .LC2(%rip), %rdi    # load pointer to "val is smaller..." to RDI
      call    puts@PLT            # and call puts@PLT
      jmp     .L3                 # and jumps to exit (.L3)
    
      # Used by the stack protector.
    .L7:
      call    __stack_chk_fail@PLT
    But if you compile with:
    Code:
    $ gcc -O2 -S -mmanual-endbr -fno-stack-protector foo.c
    Probably will get a simplier code:
    Code:
      .section  .rodata
    
    .LC0:
      .string "%d"
    .LC1:
      .string "val is bigger than val2!"
    .LC2:
      .string "val is smaller than val2!"
    
      .text
    
      .globl main
    main:
      subq  $24, %rsp
      xorl  %eax, %eax
      leaq  .LC0(%rip), %rdi
      leaq  12(%rsp), %rsi
      call  __isoc99_scanf@PLT
      cmpl  $3, 12(%rsp)
      jle .L2
      leaq  .LC1(%rip), %rdi
      call  puts@PLT
    .L3:
      xorl  %eax, %eax
      addq  $24, %rsp
      ret
    .L2:
      leaq  .LC2(%rip), %rdi
      call  puts@PLT
      jmp .L3
    But, of course, sometimes an optimized code is harder to debug than a non optimized one, as Salem already told you...

    []s
    Fred
    Last edited by flp1969; 03-09-2022 at 11:03 AM.

  6. #6
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by flp1969 View Post
    Maybe some comments would help ... But, of course, sometimes an optimized code is harder to debug than a non optimized one, as Salem already told you...

    []s
    Fred
    Thanks! I've also seen your first reply and it took me some minutes to analyze it but I got it! Still, I think that not jumping is faster even when we jump without the branching penalty.

    Well, I guess, not having an else statement in your code will do. It also makes you code shorter when you would have to return/break/continue in the end of every branch. So for example:

    Code:
    int test_fn() {
      int x = 10;
     
      if (x < 100)
        return x;
      
      // If the code reaches here, the condition was not true
      // so there is no reason to have an "else" here.
      return 100;
    }
    I guess the above code will create similar code to the one I've written by hand. I don't know man, I have a lot to learn and I hope it will be a fun trip...

  7. #7
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    It depends... if x < 100 happens most of the time this code is faster than if doesn't happen most of the time...
    Code:
    if ( x < 100 )
      return x;
    return 100;
    Will loosely be compiled as
    Code:
    # assuming x in eax
      cmpl $100,%eax
      jge .skip
      ret
    .skip:
      movl $100,%eax
      ret
    Here if x >= 100 there is a branch prediction penalty.

    General rule: your condition in an if block should happen most of the time to avoid penalties.

    If, in your case, x >= 100 most of the time, the code
    Code:
    if ( x >= 100 )
      return 100;
    return x;
    Is better. Of course, it is not what will happen here (because you are initializing x with 10), but think in a case where x is 'unknown', like:
    Code:
    int f( int x )
    {
      // if x < 100 most of the time this is a faster function.
      if ( x < 100 )
        return x;
      return 100;
    }
    Your initial example, of course, is compiled as
    Code:
    test_fn:
      movl %edi,%eax
      ret
    Last edited by flp1969; 03-09-2022 at 03:53 PM.

  8. #8
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Yep, I know, I know... the compiler can use cmovcc instruction to avoid conditional jumps...

  9. #9
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by flp1969 View Post
    It depends... if x < 100 happens most of the time this code is faster than if doesn't happen most of the time...
    Code:
    if ( x < 100 )
      return x;
    return 100;
    Will loosely be compiled as
    Code:
    # assuming x in eax
      cmpl $100,%eax
      jge .skip
      ret
    .skip:
      movl $100,%eax
      ret
    Here if x >= 100 there is a branch prediction penalty.

    General rule: your condition in an if block should happen most of the time to avoid penalties.

    If, in your case, x >= 100 most of the time, the code
    Code:
    if ( x >= 100 )
      return 100;
    return x;
    Is better. Of course, it is not what will happen here (because you are initializing x with 10), but think in a case where x is 'unknown', like:
    Code:
    int f( int x )
    {
      // if x < 100 most of the time this is a faster function.
      if ( x < 100 )
        return x;
      return 100;
    }
    Your initial example, of course, is compiled as
    Code:
    test_fn:
      movl %edi,%eax
      ret
    Wait! So If I get this correctly, you are telling me that if the most cases, the code inside the "if" block gets executed, it will run faster than if it didn't jumped and kept going (which in this case it will just return). So wait a minute! The case that makes the jump is faster than the case that doesn't jump at all. How is this possible???

    I also didn't knew about the "cmovcc" instruction. I'll take a look on it!

  10. #10
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    The page OSDev wiki Implementing Conditional Statements and Loops may help here.

  11. #11
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by rempas View Post
    Wait! So If I get this correctly, you are telling me that if the most cases, the code inside the "if" block gets executed, it will run faster than if it didn't jumped and kept going (which in this case it will just return). So wait a minute! The case that makes the jump is faster than the case that doesn't jump at all. How is this possible???
    It helps if you paid attention... The forward jump (with penalty if taken) is made inverting the condition:
    Code:
    if ( x < 100 ) f();
    Code:
    ...
      cmp eax,100
      jge .skip     ; JUMPS IF GREATER OR EQUAL THAN 100 (PENALTY)!
      call f
    .skip:

  12. #12
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Schol-R-LEA-2 View Post
    The page OSDev wiki Implementing Conditional Statements and Loops may help here.
    Thank you! I will read it!

  13. #13
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by flp1969 View Post
    It helps if you paid attention... The forward jump (with penalty if taken) is made inverting the condition:
    Code:
    if ( x < 100 ) f();
    Code:
    ...
      cmp eax,100
      jge .skip     ; JUMPS IF GREATER OR EQUAL THAN 100 (PENALTY)!
      call f
    .skip:
    That's what I'm saying! Why put the code in the "else" branch in the skip and have to jump and not do the opposite and have to jump in the "if" branch like I did in my hand written assembly? Then, like I said, there will be a label in the end of the "else" branch code where the code in "if" will return. So this is just doing things opposite that what I would logically expect them. It doesn't make much sense to "jump" to a "skip" in the "else" branch cause like I said before, the "else" branch doesn't have a condition to check so the one that would jump should be the "if" branch. So while I do understand what compilers do, It's really confusing as it is the opposite of what should happen.

  14. #14
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    The important thing to see here is that, by reversing the comparison, you can fall through to the consequent. Note that a jump isn't fully avoided; if there is an alternate clause (the else part), then the consequent needs to jump past the alternate code after the consequent finishes.

    Perhaps this would make it clearer: the naive approach is to compile the conditional exactly, which will jump to the consequent:

    Code:
        if (x < 100) goto consequent;
        goto conditional_end;
    consequent:
         f();
    conditional_end:
    If you optimize the code by reversing the condition, you get:


    Code:
        if (x >= 100) goto conditional_end;
         f();
    conditional_end:
    This eliminates the jump for the consequent case, which (assuming that the consequent is the expected case) will usually make the code faster. Why? because the conditional is compiled as the first jump (jl or jge as the case may be), rather than an unconditional jmp. These compile as

    Code:
            cmp rax, rbx
            jl main.if0.true
            jmp main.if0.end
    main.if0.true:
            call f
    main.if0.end:
    versus
    Code:
            cmp rax, rbx
            jge main.if0.end
            call f
    main.if0.end:

    For comparison, here is the equivalent in MIPS assembly (assuming that the appropriate pseudo-instructions are supported):

    Code:
            blt $t0, $t1, main.if0.true
            j main.if0.end
    main.if0.true:
            jal f
    main.if0.end:
    versus
    Code:
            bge $t0, $t1, main.if0.end
            jal f
    main.if0.end:
    Last edited by Schol-R-LEA-2; 03-11-2022 at 02:21 PM.

  15. #15
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Schol-R-LEA-2 View Post
    The important thing to see here is that, by reversing the comparison, you can fall through to the consequent. Note that a jump isn't fully avoided; if there is an alternate clause (the else part), then the consequent needs to jump past the alternate code after the consequent finishes.

    Perhaps this would make it clearer: the naive approach is to compile the conditional exactly, which will jump to the consequent:

    Code:
        if (x < 100) goto consequent;
        goto conditional_end;
    consequent:
         f();
    conditional_end:
    If you optimize the code by reversing the condition, you get:


    Code:
        if (x >= 100) goto conditional_end;
         f();
    conditional_end:
    This eliminates the jump for the consequent case, which (assuming that the consequent is the expected case) will usually make the code faster. Why? because the conditional is compiled as the first jump (jl or jge as the case may be), rather than an unconditional jmp. These compile as

    Code:
            cmp rax, rbx
            jl main.if0.true
            jmp main.if0.end
    main.if0.true:
            call f
    main.if0.end:
    versus
    Code:
            cmp rax, rbx
            jge main.if0.end
            call f
    main.if0.end:

    For comparison, here is the equivalent in MIPS assembly (assuming that the appropriate pseudo-instructions are supported):

    Code:
            blt $t0, $t1, main.if0.true
            j main.if0.end
    main.if0.true:
            jal f
    main.if0.end:
    versus
    Code:
            bge $t0, $t1, main.if0.end
            jal f
    main.if0.end:
    This makes it very clear! I also see where I was wrong with my implementation! Thanks a lot!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Updating properties on "active x control pad" created label
    By zainka in forum Windows Programming
    Replies: 1
    Last Post: 05-07-2015, 06:41 AM
  2. Replies: 6
    Last Post: 01-15-2013, 10:31 AM
  3. Replies: 2
    Last Post: 04-05-2008, 01:28 AM
  4. Replies: 17
    Last Post: 12-15-2006, 11:02 AM
  5. "CWnd"-"HWnd","CBitmap"-"HBitmap"...., What is mean by "
    By L.O.K. in forum Windows Programming
    Replies: 2
    Last Post: 12-04-2002, 07:59 AM

Tags for this Thread