Why the compiler creates a label and jumps for the code in the "else" block?

**rempas** · 03-09-2022

Ok, the title may sound a little bit confusing so I will explain. So first of all let's see the example code I'm going to demonstrate:

Code:

#include "stdio.h"

int main() {
  int val;
  scanf("%d", &val);
  int val2 = 3;

  if (val > val2) {
    printf("val is bigger than val2!\n");
  }

  else {
    printf("val is smaller than val2!\n");
  }
}

Now that's a pretty simple code snippet. Gets a value form the user, compares two values and print a message from one of the two branches.

Well, I'm a n00b when it comes to assembly (hence this question) and I wasn't able to make "scanf" work but I run a version where instead of "scanf", it uses another integer and it compares them. Here is the code (GAS syntax):

Code:

.global _start, printf

.data
  SCANF_PROMPT: .string "%d"
  PROMPT1: .string "val is bigger than val2!\n"
  PROMPT2: .string "val is smaller than val2!\n"

.text
  _start:
    mov $50, %r8
    mov $1,  %r9

    cmp %r9, %r8
    jg _lab

    mov $0, %rax
    mov $PROMPT2, %rdi
    call printf

    _back:
    mov $0, %rax
    mov $0, %rdi
    call exit

  _lab:
    mov $0, %rax
    mov $PROMPT1, %rdi
    call printf

    jmp _back

Compile this using

Code:

as test.asm -o test.o -g && ld test.o -o test  -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -g

on Linux and run the program to get the first branch, as expected!

Now what's the problem you will ask. So, If you see my code, I created a label called "_lab" for the first branch (if <condition>). So if the condition is true, it will jump to the "_lab", execute the code and then jump back. However! For the second branch (else), I simply let the code keep executing and then created another label called "_back" in the end of the "else" branch. This is where the "if" branch will return when the code there is over.

So "else" is not a true branch by definition. It makes sense to me as branching and jumping is slow and I don't see a reason for "else" to be a branch as there is no condition to check in "else" so it can just be used this way and avoid the unnecessary overhead.

However, I tried to generate assembly using "GCC" and "Clang" and they seem to make two comparisons and treat both of them as labels and jump. Unless I don't understand the code (cause like I said I'm a newbie and I couldn't even make the "scanf" work), why do they do it this way? Of course I'm not implying that I'm smarter than the thousands of more experienced and hard working programmers that have contributed in these projects so I wonder where I'm wrong and what I can't see.

**Salem** · 03-09-2022

> However, I tried to generate assembly using "GCC" and "Clang" and they seem to make two comparisons and treat both of them as labels and jump.
Well if you turn on optimisation, you end up with assembler that looks a lot more like what you crafted by hand.

The primary purpose of unoptimised code is to
a) make compilation quick
b) make debugging sane

Trying to debug optimised code can be a weird experience.

gcc -S foo.c

Code:

main:
.LFB0:
        .cfi_startproc
        endbr64
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movq    %fs:40, %rax
        movq    %rax, -8(%rbp)
        xorl    %eax, %eax
        leaq    -16(%rbp), %rax
        movq    %rax, %rsi
        leaq    .LC0(%rip), %rdi
        movl    $0, %eax
        call    __isoc99_scanf@PLT
        movl    $3, -12(%rbp)
        movl    -16(%rbp), %eax
        cmpl    %eax, -12(%rbp)
        jge     .L2
        leaq    .LC1(%rip), %rdi
        call    puts@PLT
        jmp     .L3
.L2:
        leaq    .LC2(%rip), %rdi
        call    puts@PLT
.L3:
        movl    $0, %eax
        movq    -8(%rbp), %rdx
        xorq    %fs:40, %rdx
        je      .L5
        call    __stack_chk_fail@PLT
.L5:
        leave
        .cfi_def_cfa 7, 8
        ret

vs gcc -S -O2 foo.c

Code:

main:
.LFB23:
        .cfi_startproc
        endbr64
        subq    $24, %rsp
        .cfi_def_cfa_offset 32
        leaq    .LC0(%rip), %rdi
        movq    %fs:40, %rax
        movq    %rax, 8(%rsp)
        xorl    %eax, %eax
        leaq    4(%rsp), %rsi
        call    __isoc99_scanf@PLT
        cmpl    $3, 4(%rsp)
        jle     .L2
        leaq    .LC1(%rip), %rdi
        call    puts@PLT
.L3:
        movq    8(%rsp), %rax
        xorq    %fs:40, %rax
        jne     .L7
        xorl    %eax, %eax
        addq    $24, %rsp
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        ret
.L2:
        .cfi_restore_state
        leaq    .LC2(%rip), %rdi
        call    puts@PLT
        jmp     .L3
.L7:
        call    __stack_chk_fail@PLT
        .cfi_endproc

**flp1969** · 03-09-2022

The reason why compilers like GCC and CLANG (I don't deal with MSVC. It is a ........ty compiler) chooses to do conditional jumps forward to ELSE block is because it is assumed the THEN block will happen more often. Conditional jumps forward implies on a branch predictor penalty (static branch prediction algorithm, inside your processor assumes, in advance, that all forward conditional jumps are **NOT** taken), or about extra 5 clock cycles (at least, there is a L1I cache miss problem to be evaluated, depending on the code).

So...

Code:

if (x < 0) f(); else g();

This code assumes x < 0 most of the time and, necessarily, it is encoded as

Code:

# assuming eax=x:
  testl %eax,%eax
  jns  1f      # Branch predictor penalty if x >= 0!
               # The processor assumes x < 0 most of the time.
  call f
  jmp 2f     # no penalty here.
1:
  call g
2:

With GCC and CLANG you can force the kind of conditional jump using the builtin function __builtin_expect(), as in:

Code:

if ( __builtin_expect( x < 0, 0 ) ) f(); else g();

Now x >= 0 is expected most of the time...

Sometimes you can see macros like likely() and unlikely() in certain codes:

Code:

#define likely(x) __builtin_expect( !!(x), 1 )
#define unlikely(x) __builtin_expect( !!(x), 0 )

As, in, for example, Linux kernel source code.

PS: This is totally unecessary with loops, since GCC/CLANG tends to put the condition later in code... for example:

Code:

while ( x >= 0 ) { f(); x--; }

Will create something like this:

Code:

...
  # Assuming x in ECX.
  jmp   2f
1:
  call  f
  # Notice: the test is at the bottom 'cause conditional jmps backwards are assumed
  # always taken, in advance.
  decl   %ecx
2:
  test %ecx,%ecx
  jns  1b

**rempas** · 03-09-2022

Originally Posted by Salem

> However, I tried to generate assembly using "GCC" and "Clang" and they seem to make two comparisons and treat both of them as labels and jump.
Well if you turn on optimisation, you end up with assembler that looks a lot more like what you crafted by hand.

The primary purpose of unoptimised code is to
a) make compilation quick
b) make debugging sane

Trying to debug optimised code can be a weird experience.

gcc -S foo.c

Code:

main:
.LFB0:
        .cfi_startproc
        endbr64
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movq    %fs:40, %rax
        movq    %rax, -8(%rbp)
        xorl    %eax, %eax
        leaq    -16(%rbp), %rax
        movq    %rax, %rsi
        leaq    .LC0(%rip), %rdi
        movl    $0, %eax
        call    __isoc99_scanf@PLT
        movl    $3, -12(%rbp)
        movl    -16(%rbp), %eax
        cmpl    %eax, -12(%rbp)
        jge     .L2
        leaq    .LC1(%rip), %rdi
        call    puts@PLT
        jmp     .L3
.L2:
        leaq    .LC2(%rip), %rdi
        call    puts@PLT
.L3:
        movl    $0, %eax
        movq    -8(%rbp), %rdx
        xorq    %fs:40, %rdx
        je      .L5
        call    __stack_chk_fail@PLT
.L5:
        leave
        .cfi_def_cfa 7, 8
        ret

vs gcc -S -O2 foo.c

Code:

main:
.LFB23:
        .cfi_startproc
        endbr64
        subq    $24, %rsp
        .cfi_def_cfa_offset 32
        leaq    .LC0(%rip), %rdi
        movq    %fs:40, %rax
        movq    %rax, 8(%rsp)
        xorl    %eax, %eax
        leaq    4(%rsp), %rsi
        call    __isoc99_scanf@PLT
        cmpl    $3, 4(%rsp)
        jle     .L2
        leaq    .LC1(%rip), %rdi
        call    puts@PLT
.L3:
        movq    8(%rsp), %rax
        xorq    %fs:40, %rax
        jne     .L7
        xorl    %eax, %eax
        addq    $24, %rsp
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        ret
.L2:
        .cfi_restore_state
        leaq    .LC2(%rip), %rdi
        call    puts@PLT
        jmp     .L3
.L7:
        call    __stack_chk_fail@PLT
        .cfi_endproc

Thanks for the info. Tbh, I don't understand the code but still it looks like a lot of labels and jumps for me. Unless it does some startup stuff that I don't get but if all that is for the "if" statement that it still doesn't make sense. Of course I'm talking about the optimized version all this time.

I will learn and I will probably be able to understand it in the future. Thanks a lot for the great info!

**flp1969** · 03-09-2022

Originally Posted by rempas

Thanks for the info. Tbh, I don't understand the code but still it looks like a lot of labels and jumps for me. Unless it does some startup stuff that I don't get but if all that is for the "if" statement that it still doesn't make sense. Of course I'm talking about the optimized version all this time.

Maybe some comments would help:

Code:

  .section  .rodata

.LC0:
  .string "%d"
.LC1:
  .string "val is bigger than val2!"
.LC2:
  .string "val is smaller than val2!"

  .text

  .globl main
main:
  endbr64                     # Modern compilers put this by default
                              # trying to prevent branch prediction side efects.

  subq    $24, %rsp           # reserve space for 'val' (aligning the stack pointer).

  leaq    .LC0(%rip), %rdi    # puts "%d" pointer in RDI (used by __isoc99_scanf@PLT).

  movq    %fs:40, %rax        # used by the stack protector...
  movq    %rax, 8(%rsp)

  xorl    %eax, %eax          # __isoc99 needs to know if there are any floating
                              # point in use... (no? EAX=0).

  leaq    4(%rsp), %rsi       # load 'val' pointer in RSI.

  call    __isoc99_scanf@PLT  # scanf( "%d", &val );

  cmpl    $3, 4(%rsp)         # val <= 3 ?
  jle     .L2                 # yes, jump to .L2

  leaq    .LC1(%rip), %rdi    # no, load pointer to "val is bigger..." to RDI
  call    puts@PLT            # and call puts@PLT

  # Here is the end of the routine...
.L3:
  movq    8(%rsp), %rax       # used by the stack protector.
  xorq    %fs:40, %rax
  jne     .L7

  xorl    %eax, %eax          # main returns 0 (by default with GCC).

  addq    $24, %rsp           # dealloc the space previously allocated on stack.
  ret                         # return to the caller.

  # if val > 3, gets here.
.L2:
  leaq    .LC2(%rip), %rdi    # load pointer to "val is smaller..." to RDI
  call    puts@PLT            # and call puts@PLT
  jmp     .L3                 # and jumps to exit (.L3)

  # Used by the stack protector.
.L7:
  call    __stack_chk_fail@PLT

But if you compile with:

Code:

$ gcc -O2 -S -mmanual-endbr -fno-stack-protector foo.c

Probably will get a simplier code:

Code:

  .section  .rodata

.LC0:
  .string "%d"
.LC1:
  .string "val is bigger than val2!"
.LC2:
  .string "val is smaller than val2!"

  .text

  .globl main
main:
  subq  $24, %rsp
  xorl  %eax, %eax
  leaq  .LC0(%rip), %rdi
  leaq  12(%rsp), %rsi
  call  __isoc99_scanf@PLT
  cmpl  $3, 12(%rsp)
  jle .L2
  leaq  .LC1(%rip), %rdi
  call  puts@PLT
.L3:
  xorl  %eax, %eax
  addq  $24, %rsp
  ret
.L2:
  leaq  .LC2(%rip), %rdi
  call  puts@PLT
  jmp .L3

But, of course, sometimes an optimized code is harder to debug than a non optimized one, as Salem already told you...

[]s
Fred

**rempas** · 03-09-2022

Originally Posted by flp1969

Maybe some comments would help ... But, of course, sometimes an optimized code is harder to debug than a non optimized one, as Salem already told you...

[]s
Fred

Thanks! I've also seen your first reply and it took me some minutes to analyze it but I got it! Still, I think that not jumping is faster even when we jump without the branching penalty.

Well, I guess, not having an else statement in your code will do. It also makes you code shorter when you would have to return/break/continue in the end of every branch. So for example:

Code:

int test_fn() {
  int x = 10;
 
  if (x < 100)
    return x;
  
  // If the code reaches here, the condition was not true
  // so there is no reason to have an "else" here.
  return 100;
}

I guess the above code will create similar code to the one I've written by hand. I don't know man, I have a lot to learn and I hope it will be a fun trip...

**flp1969** · 03-09-2022

It depends... if x < 100 happens most of the time this code is faster than if doesn't happen most of the time...

Code:

if ( x < 100 )
  return x;
return 100;

Will loosely be compiled as

Code:

# assuming x in eax
  cmpl $100,%eax
  jge .skip
  ret
.skip:
  movl $100,%eax
  ret

Here if x >= 100 there is a branch prediction penalty.

General rule: your condition in an if block should happen most of the time to avoid penalties.

If, in your case, x >= 100 most of the time, the code

Code:

if ( x >= 100 )
  return 100;
return x;

Is better. Of course, it is not what will happen here (because you are initializing x with 10), but think in a case where x is 'unknown', like:

Code:

int f( int x )
{
  // if x < 100 most of the time this is a faster function.
  if ( x < 100 )
    return x;
  return 100;
}

Your initial example, of course, is compiled as

Code:

test_fn:
  movl %edi,%eax
  ret

**flp1969** · 03-09-2022

Yep, I know, I know... the compiler can use cmovcc instruction to avoid conditional jumps...

**rempas** · 03-10-2022

Originally Posted by flp1969

It depends... if x < 100 happens most of the time this code is faster than if doesn't happen most of the time...

Code:

if ( x < 100 )
  return x;
return 100;

Will loosely be compiled as

Code:

# assuming x in eax
  cmpl $100,%eax
  jge .skip
  ret
.skip:
  movl $100,%eax
  ret

Here if x >= 100 there is a branch prediction penalty.

General rule: your condition in an if block should happen most of the time to avoid penalties.

If, in your case, x >= 100 most of the time, the code

Code:

if ( x >= 100 )
  return 100;
return x;

Is better. Of course, it is not what will happen here (because you are initializing x with 10), but think in a case where x is 'unknown', like:

Code:

int f( int x )
{
  // if x < 100 most of the time this is a faster function.
  if ( x < 100 )
    return x;
  return 100;
}

Your initial example, of course, is compiled as

Code:

test_fn:
  movl %edi,%eax
  ret

Wait! So If I get this correctly, you are telling me that if the most cases, the code inside the "if" block gets executed, it will run faster than if it didn't jumped and kept going (which in this case it will just return). So wait a minute! The case that makes the jump is faster than the case that doesn't jump at all. How is this possible???

I also didn't knew about the "cmovcc" instruction. I'll take a look on it!

**Schol-R-LEA-2** · 03-11-2022

The page OSDev wiki Implementing Conditional Statements and Loops may help here.

**flp1969** · 03-11-2022

Originally Posted by rempas

Wait! So If I get this correctly, you are telling me that if the most cases, the code inside the "if" block gets executed, it will run faster than if it didn't jumped and kept going (which in this case it will just return). So wait a minute! The case that makes the jump is faster than the case that doesn't jump at all. How is this possible???

It helps if you paid attention... The forward jump (with penalty if taken) is made inverting the condition:

Code:

if ( x < 100 ) f();

Code:

...
  cmp eax,100
  jge .skip     ; JUMPS IF GREATER OR EQUAL THAN 100 (PENALTY)!
  call f
.skip:

**rempas** · 03-11-2022

Originally Posted by Schol-R-LEA-2

The page OSDev wiki Implementing Conditional Statements and Loops may help here.

Thank you! I will read it!

**rempas** · 03-11-2022

Originally Posted by flp1969

It helps if you paid attention... The forward jump (with penalty if taken) is made inverting the condition:

Code:

if ( x < 100 ) f();

Code:

...
  cmp eax,100
  jge .skip     ; JUMPS IF GREATER OR EQUAL THAN 100 (PENALTY)!
  call f
.skip:

That's what I'm saying! Why put the code in the "else" branch in the skip and have to jump and not do the opposite and have to jump in the "if" branch like I did in my hand written assembly? Then, like I said, there will be a label in the end of the "else" branch code where the code in "if" will return. So this is just doing things opposite that what I would logically expect them. It doesn't make much sense to "jump" to a "skip" in the "else" branch cause like I said before, the "else" branch doesn't have a condition to check so the one that would jump should be the "if" branch. So while I do understand what compilers do, It's really confusing as it is the opposite of what should happen.

**Schol-R-LEA-2** · 03-11-2022

The important thing to see here is that, by reversing the comparison, you can fall through to the consequent. Note that a jump isn't fully avoided; if there is an alternate clause (the else part), then the consequent needs to jump past the alternate code after the consequent finishes.

Perhaps this would make it clearer: the naive approach is to compile the conditional exactly, which will jump to the consequent:

Code:

    if (x < 100) goto consequent;
    goto conditional_end;
consequent:
     f();
conditional_end:

If you optimize the code by reversing the condition, you get:

Code:

    if (x >= 100) goto conditional_end;
     f();
conditional_end:

This eliminates the jump for the consequent case, which (assuming that the consequent is the expected case) will usually make the code faster. Why? because the conditional is compiled as the first jump (jl or jge as the case may be), rather than an unconditional jmp. These compile as

Code:

        cmp rax, rbx
        jl main.if0.true
        jmp main.if0.end
main.if0.true:
        call f
main.if0.end:

versus

Code:

        cmp rax, rbx
        jge main.if0.end
        call f
main.if0.end:

For comparison, here is the equivalent in MIPS assembly (assuming that the appropriate pseudo-instructions are supported):

Code:

        blt $t0, $t1, main.if0.true
        j main.if0.end
main.if0.true:
        jal f
main.if0.end:

versus

Code:

        bge $t0, $t1, main.if0.end
        jal f
main.if0.end:

**rempas** · 03-12-2022

Originally Posted by Schol-R-LEA-2

The important thing to see here is that, by reversing the comparison, you can fall through to the consequent. Note that a jump isn't fully avoided; if there is an alternate clause (the else part), then the consequent needs to jump past the alternate code after the consequent finishes.

Perhaps this would make it clearer: the naive approach is to compile the conditional exactly, which will jump to the consequent:

Code:

    if (x < 100) goto consequent;
    goto conditional_end;
consequent:
     f();
conditional_end:

If you optimize the code by reversing the condition, you get:

Code:

    if (x >= 100) goto conditional_end;
     f();
conditional_end:

This eliminates the jump for the consequent case, which (assuming that the consequent is the expected case) will usually make the code faster. Why? because the conditional is compiled as the first jump (jl or jge as the case may be), rather than an unconditional jmp. These compile as

Code:

        cmp rax, rbx
        jl main.if0.true
        jmp main.if0.end
main.if0.true:
        call f
main.if0.end:

versus

Code:

        cmp rax, rbx
        jge main.if0.end
        call f
main.if0.end:

For comparison, here is the equivalent in MIPS assembly (assuming that the appropriate pseudo-instructions are supported):

Code:

        blt $t0, $t1, main.if0.true
        j main.if0.end
main.if0.true:
        jal f
main.if0.end:

versus

Code:

        bge $t0, $t1, main.if0.end
        jal f
main.if0.end:

This makes it very clear! I also see where I was wrong with my implementation! Thanks a lot!

Thread: Why the compiler creates a label and jumps for the code in the "else" block?

Thread Tools

Search Thread

Display

Why the compiler creates a label and jumps for the code in the "else" block?

Similar Threads

Updating properties on "active x control pad" created label

A Beginner's Inquiry: Missing output "Hello World!" in Code blocks compiler

Why create DLL with GCC/MinGW creates it with ".dll.so" extension?

"extern const int" creates linking errors that "extern int" doesn't

"CWnd"-"HWnd","CBitmap"-"HBitmap"...., What is mean by "

Tags for this Thread