Opened 6 years ago

Closed 3 years ago

#8321 closed feature request (fixed)

improve basic block layout on LLVM backend by predicting stack/heap checks

Reported by: rwbarton Owned by: rwbarton
Priority: normal Milestone: 8.2.1
Component: Compiler (LLVM) Version: 7.7
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Currently we don't give the LLVM optimizer any information about which branch of an if is likely to be taken. As a result, the optimizer is likely to produce a basic block layout which is not optimal. Improving the layout can improve performance through better instruction cache usage and better branch prediction by the hardware.

We can control LLVM's idea of what is likely with the llvm.expect intrinsic function. Some obvious branches which we can predict accurately are the stack and heap checks that appear near the entry of many functions.

Here's a small example of some Cmm code and the output of the LLVM optimizer/compiler.

 block_c2Lc_entry() //  [R1, R2]
         { info_tbl: [(c2Lc,
                       label: block_c2Lc_info
                       rep:StackRep [])]
           stack_info: arg_space: 0 updfr_space: Nothing
         }
     {offset
       c2Lc:
           Hp = Hp + 24;
           if (Hp > HpLim) goto c2Lm; else goto c2Ll;
       c2Lm:
           HpAlloc = 24;
           R2 = R2;
           R1 = R1;
           call stg_gc_pp(R2, R1) args: 8, res: 8, upd: 24;
       c2Ll:
           I64[Hp - 16] = :_con_info;
           P64[Hp - 8] = R1;
           P64[Hp] = R2;
           R1 = Hp - 14;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 24, res: 0, upd: 24;
     }
 }]
00000000000002b8 <c2Lc_info>:
 2b8:   4c 89 e0                mov    %r12,%rax
 2bb:   4c 8d 60 18             lea    0x18(%rax),%r12
 2bf:   4d 3b a5 18 01 00 00    cmp    0x118(%r13),%r12
 2c6:   76 10                   jbe    2d8 <c2Lc_info+0x20>
 2c8:   49 c7 85 48 01 00 00    movq   $0x18,0x148(%r13)
 2cf:   18 00 00 00 
 2d3:   e9 00 00 00 00          jmpq   2d8 <c2Lc_info+0x20>
                        2d4: R_X86_64_PC32      stg_gc_pp+0xfffffffffffffffc
 2d8:   48 c7 40 08 00 00 00    movq   $0x0,0x8(%rax)
 2df:   00 
                        2dc: R_X86_64_32S       ghczmprim_GHCziTypes_ZC_con_info
 2e0:   48 89 58 10             mov    %rbx,0x10(%rax)
 2e4:   4c 89 70 18             mov    %r14,0x18(%rax)
 2e8:   48 8b 45 08             mov    0x8(%rbp),%rax
 2ec:   48 83 c5 08             add    $0x8,%rbp
 2f0:   49 8d 5c 24 f2          lea    -0xe(%r12),%rbx
 2f5:   ff e0                   jmpq   *%rax
 2f7:   90                      nop

It would likely be better to invert the branch at 2c6 to a ja, so that the common case is adjacent to the function entry, and when llvm.expect is used on the condition in the branch, the LLVM optimizer does produce this alternate layout.

Change History (8)

comment:1 Changed 6 years ago by rwbarton

Owner: set to rwbarton

I have a proof-of-concept patch that implements this. It gave about a 1.5% speedup overall on nofib, with around 10% speedup on a couple simple stack-heavy benchmarks and a worst-case slowdown of -1.5%.

I plan to revisit it after the 7.8 release.

comment:2 Changed 6 years ago by carter

Awesome work Reid!

this sort of perf boost is huge. (also is another bullet point for why a lot of 7.10 work should be giving the code gens some love.)

comment:3 Changed 5 years ago by thomie

Type of failure: None/UnknownRuntime performance bug

comment:4 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:5 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:6 Changed 4 years ago by thomie

Milestone: 8.0.1

comment:7 Changed 3 years ago by Ben Gamari <ben@…>

In 20fb781e/ghc:

LLVM generate llvm.expect for conditional branches

This patch adds likeliness annotations to heap and and stack checks and
modifies the llvm codegen to recognize those to help it generate better
code.

So with this patch

```
...
if ((Sp + 8) - 24 < SpLim) (likely: False) goto c23c; else goto c23d;
...
```

roughly generates:

```
  %ln23k = icmp ult i64 %ln23j, %SpLim_Arg
  %ln23m = call ccc i1 (i1, i1) @llvm.expect.i1( i1 %ln23k, i1 0 )
  br i1 %ln23m, label %c23c, label %c23d
```

Note the call to `llvm.expect` which denotes the expected result for
the comparison.

Test Plan: Look at assembler code with and without this patch. If the
heap-checks moved out of the way we are happy.

Reviewers: austin, simonmar, bgamari

Reviewed By: bgamari

Subscribers: michalt, thomie

Differential Revision: https://phabricator.haskell.org/D2688

GHC Trac Issues: #8321

comment:8 Changed 3 years ago by bgamari

Milestone: 8.2.1
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.