#16005 closed task (fixed)

Don't use a generic apply thunk for known calls

Reported by: sgraf Owned by:
Priority: normal Milestone: 8.8.1
Component: Compiler Version: 8.6.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D5414
Wiki Page:

Description

Currently, an AP thunk like sat = f a b c will not have its own entry point and info pointer and will instead reuse a generic apply thunk like stg_ap_4_upd.

That's great from a code size perspective, but if f is a known function, a specialised entry point with a plain call can be much faster than figuring out the arity and doing dynamic dispatch.


I prepared a patch over at Phab:D5414 that fixes this by checking if the arity of f is unknown (e.g. 0). If not, it will no longer delegate to a generic apply thunk and generate regular entry code and info tables instead.

No impact on allocations, but on counted instructions and code size. Significant changes to counted instructions:

cryptarithm1 -2.5%
lcss -2.3%
paraffins -3.8%
wheel-sieve2 -3.4%
Min -3.8%
Max +0.0%
Geometric Mean -0.2%

And changes to binary size greater than 0.1% (all the other programs, basically):

Program Size Instrs
anna +0.3% -0.2%
expert +0.2% -0.0%
fluid +0.2% -0.1%
grep +0.2% -0.0%
infer +0.2% -0.4%
last-piece +0.2% -0.1%
lift +0.2% -0.0%
paraffins +0.2% -3.8%
prolog +0.2% -0.1%
scs +0.3% -0.0%
transform +0.2% -0.2%
veritas +0.2% +0.0%
Min +0.1% -3.8%
Max +0.3% +0.0%
Geometric Mean +0.1% -0.2%

Change History (3)

comment:1 Changed 10 months ago by sgraf

Status: newpatch

comment:2 Changed 10 months ago by Sebastian Graf <sebastian.graf@…>

In dc54c07/ghc:

Don't use a generic apply thunk for known calls

Summary:
Currently, an AP thunk like `sat = f a b c` will not have its own entry
point and info pointer and will instead reuse a generic apply thunk
like `stg_ap_4_upd`.

That's great from a code size perspective, but if `f` is a known
function, a specialised entry point with a plain call can be much faster
than figuring out the arity and doing dynamic dispatch.

This looks at `f`s arity to figure out if it is a known function and if so, it
will not lower it to a generic apply function.

Benchmark results are encouraging: No changes to allocation, but 0.2% less
counted instructions.

Test Plan: Validates locally

Reviewers: simonmar, osa1, simonpj, bgamari

Reviewed By: simonpj

Subscribers: rwbarton, carter

GHC Trac Issues: #16005

Differential Revision: https://phabricator.haskell.org/D5414

comment:3 Changed 10 months ago by bgamari

Milestone: 8.6.38.8.1
Resolution: fixed
Status: patchclosed

Given this isn't strictly speaking a bug fix I'm going to defer this to 8.8.

Note: See TracTickets for help on using tickets.