Opened 12 months ago

Last modified 12 months ago

#15573 new task

Make bindings with multiple occurrences a join point instead of duplicating code during inlining.

Reported by: AndreasK Owned by:
Priority: normal Milestone: 8.6.1
Component: Compiler Version: 8.4.3
Keywords: JoinPoints Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #15560 Differential Rev(s):
Wiki Page:

Description

I have some intermediate core of the form:

-- RHS size: {terms: 9, types: 2, coercions: 0, joins: 0/0}
cseAlts_s1dD [Occ=Once!T[1]] :: T -> Int#
[LclId, CallArity=1, Str=<S,1*U>]
cseAlts_s1dD
  = \ (lamVar_s1dw [Occ=Once!, Dmd=<S,1*U>, OS=OneShot] :: T) ->
      case lamVar_s1dw of wild_Xc [Dmd=<L,A>] {
        __DEFAULT -> 1#;
        B -> 2#;
        C -> 3#
      }

-- RHS size: {terms: 14, types: 3, coercions: 0, joins: 0/0}
$wfmerge_s1cZ [InlPrag=NOUSERINLINE[0]] :: T -> T -> Int#
[LclId, Arity=2, CallArity=2, Str=<S,1*U><L,1*U>]
$wfmerge_s1cZ
  = \ (w_s1cU [Occ=Once!, Dmd=<S,1*U>] :: T)
      (w_s1cV [Occ=Once*, Dmd=<L,1*U>] :: T) ->
      case w_s1cU of wild_XA [Dmd=<L,A>] {
        __DEFAULT -> -1#;
        A -> 2#;
        B -> cseAlts_s1dD w_s1cV;
        C -> cseAlts_s1dD w_s1cV
      }

Which after the simplifier ran got inlined into the branches to give us:

fmerge
  = \ (w_s1cU :: T) (w_s1cV :: T) ->
      case w_s1cU of {
        __DEFAULT -> GHC.Types.I# -1#;
        A -> GHC.Types.I# 2#;
        B ->
          case w_s1cV of {
            __DEFAULT -> GHC.Types.I# 1#;
            B -> GHC.Types.I# 2#;
            C -> GHC.Types.I# 3#
          };
        C ->
          case w_s1cV of {
            __DEFAULT -> GHC.Types.I# 1#;
            B -> GHC.Types.I# 2#;
            C -> GHC.Types.I# 3#
          }
      }

What I would really like GHC to do instead though is to make cseAlts_s1dD a join point when possible. This would eliminate both the call overhead AND the call duplication.

The current behavior seems fine when we can't make it a join point. But when we can we should try to take advantage of that opportunity.

Change History (2)

comment:1 Changed 12 months ago by simonpj

This relates to #15560: in the current ticket you are proposing the to use float-in (the opposite of float-out) to make cseAlts_s1dD a local binging again.

I'd prefer instead to make the top-level version (in the Description) more efficient so that it's just as efficient as the join-point version.

One reason I want to do that is because it improves inlining opportunities, by making the function (fmerge in this case) smaller.

comment:2 in reply to:  1 Changed 12 months ago by AndreasK

Replying to simonpj:

This relates to #15560: in the current ticket you are proposing the to use float-in (the opposite of float-out) to make cseAlts_s1dD a local binging again.

I'd prefer instead to make the top-level version (in the Description) more efficient so that it's just as efficient as the join-point version.

While we can (and should!) improve on the current cost even with your suggestions there will likely still be an overhead caused by code layout and calling convention. Even if far smaller.

One reason I want to do that is because it improves inlining opportunities, by making the function (fmerge in this case) smaller.

Indeed this sounds good. However I'm not sure if we can make calling top level functions efficient enough to never warant inlining. And assuming we still end up inlining in some cases this optimization would be good to have.

But I agree that the ideas discussed in #15560 have the chance to remove the need for this.

This ticket is mostly so I don't forget about this. It probably shouldn't be tackled before #15560.

Last edited 12 months ago by AndreasK (previous) (diff)
Note: See TracTickets for help on using tickets.