Opened 3 years ago

Last modified 3 years ago

#13353 new bug

foldr/nil rule not applied consistently

Reported by: nomeata Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I just had a project where it made a difference whether I add

{-# RULES "foldr/nil" forall k n . GHC.Base.foldr k n [] = n #-}

to my file or not, despite this rule already being present in the library.

I tried to minimize the problem and came up with this:

foo1 (f, fs) (x, xs) = (f x, map ($x) fs ++ map f xs)
foo2 f fs x xs = (f x, map ($x) fs ++ map f xs)

test1 x xs = foo1 (id, []) (x, xs)
test2 x xs = foo2 id [] x xs
test3 x xs = (id x, map ($x) [] ++ map id xs)

test2 and test3 nicely optimize the map … [] ++ away, but test does not.

(In this minimized example, adding the rule again locally does *not* help, but there is still something fishy.)

Also, in all cases, map id remains, which should not be the case.

Change History (6)

comment:1 Changed 3 years ago by nomeata

Hmm, maybe I minimized the problem too much. If I do not export foo1, things work fine. Sigh.

comment:2 Changed 3 years ago by nomeata

Ok, so here is the deal. I get this core in the end:

test1
test1
  = \ @ a_aYA x_s1dJ xs_s1dK ->
      let {
        sat_s1dT
        sat_s1dT
          = let {
              z_s1dL
              z_s1dL = map id xs_s1dK } in
            letrec {
              go_s1dM
              go_s1dM
                = \ ds_s1dN ->
                    case ds_s1dN of {
                      [] -> z_s1dL;
                      : y_s1dP ys_s1dQ ->
                        let {
                          sat_s1dS
                          sat_s1dS = go_s1dM ys_s1dQ } in
                        let {
                          sat_s1dR
                          sat_s1dR = y_s1dP x_s1dJ } in
                        : sat_s1dR sat_s1dS
                    }; } in
            go_s1dM [] } in
      (x_s1dJ, sat_s1dT)

Note the useless application of what used to be foldr to []. Also note that foo1 was obviously inlined.

If I explicitly INLINE foo1, then I get:

test1
test1
  = \ @ a_aYA x_s1dh xs_s1di ->
      let {
        sat_s1dj
        sat_s1dj = map id xs_s1di } in
      (x_s1dh, sat_s1dj)

So despite GHC deciding to inline foo1 (eventually), making it inline it early makes a big difference.

With the INLINE pragma, GHC first considers to inline foo1 in simplifier phase 2, after float out.

Considering inlining: foo1
  arg infos [ValueArg, ValueArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance ALWAYS_IF(arity=2,unsat_ok=False,boring_ok=False)
  ANSWER = YES

Without we make a difference decision at this point:

Considering inlining: foo1
  arg infos [ValueArg, ValueArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [20 20] 240 30
  discounted size = 150
  ANSWER = NO

but later, when foo1 has been w/w’ed, we inline it (i.e. the wrapper) in the post-w/w simplifer phase 0.

Considering inlining: foo1
  arg infos [ValueArg, ValueArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
  ANSWER = YES
Inlining done: foo1
    Inlined fn:  \ @ a @ a w_s12e w_s12f ->
                   case w_s12e of { (ww_s12i, ww_s12j) ->
                   case w_s12f of { (ww_s12n, ww_s12o) ->
                   case $wfoo1 ww_s12i ww_s12j ww_s12n ww_s12o of
                   { (# ww_s12u, ww_s12v #) ->
                   (ww_s12u, ww_s12v)
                   }
                   }
                   }
    Cont:   ApplyToTy a
            ApplyToTy a
            ApplyToVal nodup lvl_s11y
            ApplyToVal nodup (x, xs)
            Stop[BoringCtxt] (a, [a])

and shortly after, we inline the worker:

Considering inlining: $wfoo1_s12t
  arg infos [ValueArg, ValueArg, TrivArg, TrivArg]
  interesting continuation CaseCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [60 0 0 0] 180 30
  discounted size = -5
  ANSWER = YES
Inlining done: $wfoo1
    Inlined fn:  \ @ a @ a ww_s12i ww_s12j ww_s12n ww_s12o ->
                   let {
                     ww_s12v
                     ww_s12v
                       = let {
                           z
                           z = map ww_s12i ww_s12o } in
                         letrec {
                           go
                           go
                             = \ ds ->
                                 case ds of {
                                   [] -> z;
                                   : y ys -> : (y ww_s12n) (go ys)
                                 }; } in
                         go ww_s12j } in
                   (# ww_s12i ww_s12n, ww_s12v #)
    Cont:   ApplyToTy a
            ApplyToTy a
            ApplyToVal nodup ww_s12i
            ApplyToVal nodup ww_s12j
            ApplyToVal nodup ww_s12n
            ApplyToVal nodup ww_s12o
            Select nodup ww_s12s
            Stop[BoringCtxt] (a, [a])

So it seems that after splitting the function into two pieces, it is small enough(?) so that both pieces inline? But that seems to be suboptimal: If we are going to inline both pieces anyways, can we not do it earlier, and thus enable useful fusion?

comment:3 Changed 3 years ago by simonpj

So it seems that after splitting the function into two pieces, it is small enough(?) so that both pieces inline?

Yes that is galling I agree.

  • Part of the trouble is that strictness analysis does a deep semantic analysis, pulls all the evals to the top, inlines them unconditionally, leaving behind a worker that may now be a lot smaller. The sizeExpr code in CoreUnfold is necessarily much simpler.
  • The discount we award for a scrutinised argument is computed in size_up here:
                alts_size (SizeIs tot tot_disc tot_scrut)
                              -- Size of all alternatives
                          (SizeIs max _        _)
                              -- Size of biggest alternative
                      = SizeIs tot (unitBag (v, 20 + tot - max)
                          `unionBags` tot_disc) tot_scrut
    
    For a single-alternative case (and you have a tuple arg here) tot = max, so there's fixed discount of 20. You could make that into a controllable flag and try varying it.
  • Idea. If a function starts with case x of blah (even if wrapped in lets) we know that the strictness analyser will find it strict in x. So we know it'll generate a wrapper, and the wrapper will inline. So in the end it'll be as if that case cost nothing at all. It would not be hard to make sizeExpr simply count zero for the cost of such cases; including nested. (Certainly for single-alternative ones.)

Worth a try?

Simon

comment:4 Changed 3 years ago by simonpj

Also, in all cases, map id remains, which should not be the case.

I think we are missing

{-# RULES
"mapFB/id"     forall c. mapFB c (\x->x) = c
  #-}

Want to try adding that?

comment:5 Changed 3 years ago by simonpj

I just had a project where it made a difference whether I add {-# RULES "foldr/nil" forall k n . GHC.Base.foldr k n [] = n #-} to my file or not, despite this rule already being present in the library.

I have no explanation for that!!

comment:6 Changed 3 years ago by nomeata

I moved the exprSize idea to #13374.

I have no explanation for that!!

Might be a misunderstanding in how I use the GHC API. (It seems that the instance of map on the RHS of a rule has no rules in its IdInfo, and so it does not fire. I’ll ask on the mailing list about that once I get to it.)

Want to try adding that? [mapFB rule]

I’m trying that: Phab:D3275

Note: See TracTickets for help on using tickets.