Opened 6 years ago

Last modified 6 years ago

#8032 new bug

Worker-wrapper transform and NOINLINE trigger bad reboxing behavior

Reported by: ezyang Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.7
Keywords: Cc: haskell@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by ezyang)

(Note: I've updated the ticket with a simpler test-case).

NOINLINE and the worker-wrapper transform sometimes interact poorly to cause unnecessary extra reboxing.

module Gnam where

data D = D Int

foo k d@(D e) =
  let loop i
        | i < e = loop (i + 1)
        | otherwise = baz k d i
  in loop 0
  where {-# NOINLINE baz #-}
        baz k d i = k (d, i)

This results in the following STG:

  :: forall t_alo.
     ((Gnam.D, GHC.Types.Int) -> t_alo) -> GHC.Prim.Int# -> t_alo
 Str=DmdType C(S)L,
 Unf=OtherCon []] =
    \r [w_sob ww_so3]
        let {
          e_so4 :: GHC.Types.Int
          [LclId, Unf=OtherCon []] =
              NO_CCS GHC.Types.I#! [ww_so3]; } in
        let {
          wild_so6 :: Gnam.D
          [LclId, Unf=OtherCon []] =
              NO_CCS Gnam.D! [e_so4];
        } in 

This worker function needs to box its arguments so that they can be passed to baz. However, the only invocation of wfoo already had these arguments available: [InlPrag=INLINE[0]]
  :: forall t_alo.
     ((Gnam.D, GHC.Types.Int) -> t_alo) -> Gnam.D -> t_alo
 Str=DmdType C(S)U(U(L)),
 Unf=OtherCon []] =
    \r [w_son w1_soh]
        case w1_soh of _ {
          Gnam.D ww_sok ->
              case ww_sok of _ {
                GHC.Types.I# ww2_soo -> Gnam.$wfoo w_son ww2_soo;

The problem seems to lie in how the worker wrapper transformation operates. Before, the STG is: =
  \ (@ t_alr)
    (k_aeM [Dmd=Just C(S)] :: (Gnam.D, GHC.Types.Int) -> t_alr)
    (d_aeN [Dmd=Just U(U(L))] :: Gnam.D) ->
    case d_aeN of wild_X5 { Gnam.D e_aeO [Dmd=Just U(L)] ->
    letrec {
      loop_smj [Occ=LoopBreaker] :: GHC.Types.Int -> t_alr
       Str=DmdType U(L) {aeM->C(S) aeO->U(L)},
       Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=1, Value=True,
               ConLike=True, WorkFree=True, Expandable=True,
               Guidance=IF_ARGS [20] 112 0}]
      loop_smj =
        \ (i_aeU [Dmd=Just U(L)] :: GHC.Types.Int) ->
          case i_aeU of wild_alU { GHC.Types.I# x_alW [Dmd=Just L] ->
          case e_aeO of _ { GHC.Types.I# y_am0 [Dmd=Just L] ->
          case GHC.Prim.<# x_alW y_am0 of _ {
            GHC.Types.False ->
              baz_smn @ t_alr @ Gnam.D @ GHC.Types.Int k_aeM wild_X5 wild_alU;
            GHC.Types.True -> loop_smj (GHC.Types.I# (GHC.Prim.+# x_alW 1))
          }; } in
    loop_smj lvl_smr

Notice that wild_alU is being properly used in the result. After the worker wrapper transformation, foo is now: =
  \ (@ t_alp)
    (w_sn7 [Dmd=Just C(S)] :: (Gnam.D, GHC.Types.Int) -> t_alp)
    (w_sn8 [Dmd=Just U(U(L))] :: Gnam.D) ->
    case w_sn8 of w_sn8 { Gnam.D ww_sna ->
    case ww_sna of ww_sna { GHC.Types.I# ww_snc ->
    $wfoo_sng @ t_alp w_sn7 ww_snc

So it seems that we should also pass along the evaluated variables, in case they are used. There is a tradeoff here, in that we will require more arguments to the function than if we just reconstructed it. However, if we smarten up worker-wrapper so that it drops unused arguments, this could be a win when not all of the fields are used, e.g. if we add another field to D: [InlPrag=INLINE[0]]
  :: forall t_alp.
     ((Gnam.D, GHC.Types.Int) -> t_alp) -> Gnam.D -> t_alp
 Str=DmdType C(S)U(U(L)L),
 Unf=OtherCon []] =
    \r [w_som w1_sof]
        case w1_sof of _ {
          Gnam.D ww_soj ww1_soo ->
              case ww_soj of _ {
                GHC.Types.I# ww3_son -> Gnam.$wfoo w_som ww3_son ww1_soo;

Now ww1_soo is passed, even though it is only ever used to rebox the value. I think there is a comment to this effect in the simplifier already. Passing boxed value would require exactly the same number of function arguments, but save on a heap allocation!

Change History (3)

comment:1 Changed 6 years ago by ezyang

Description: modified (diff)
difficulty: Unknown
Summary: Worker-wrapper transform triggers bad reboxing behaviorWorker-wrapper transform and NOINLINE trigger bad reboxing behavior

comment:2 Changed 6 years ago by ezyang

Description: modified (diff)

comment:3 Changed 6 years ago by ezyang

But it seems this story is complicated by the fact that, from the perspective of the simplifier (see the Thunk splitting note), the form:

      case x-rhs of 
	I# a ->	let x* = I# a 
	        in body

is desirable, because it allows the simplifier to continue doing optimizations inside body knowing the structure of x*. So this might be complicated to fix without making splitThunk more complicated.

Note: See TracTickets for help on using tickets.