Opened 19 months ago

Last modified 8 months ago

#14816 new bug

Missed Called Arity opportunity?

Reported by: dfeuer Owned by:
Priority: normal Milestone: 8.10.1
Component: Compiler Version: 8.2.2
Keywords: DemandAnalysis Cc: nomeata, sgraf
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

When I compile

test :: (a -> a -> a) -> Int -> a -> HashMap Int a -> HashMap Int a
test f k a m = insertModifying a (blink (f a)) k m

blink :: (a -> b) -> a -> (# b #)
-- Or blink g = \a -> (# g a #) ; it makes no difference.
blink g a = (# g a #)

I get

test
  = \ (@ a_a9o3)
      (f_a7iP :: a_a9o3 -> a_a9o3 -> a_a9o3)
      (k_a7iQ :: Int)
      (a1_a7iR :: a_a9o3)
      (m_a7iS :: HashMap Int a_a9o3) ->
      case k_a7iQ of { GHC.Types.I# ww1_sa1Z ->
      RULES.$w$sinsertModifying
        @ a_a9o3
        a1_a7iR
        (let {
           g_s9E1 [Dmd=<L,C(U)>] :: a_a9o3 -> a_a9o3
           [LclId]
           g_s9E1 = f_a7iP a1_a7iR } in
         \ (a2_a7iU :: a_a9o3) -> (# g_s9E1 a2_a7iU #))
        ww1_sa1Z
        m_a7iS
      }

We build g_s9E1 = f_a7iP a1_a7iR for no apparent reason. Trouble persists into STG:

RULES.test
  :: forall a.
     (a -> a -> a)
     -> GHC.Types.Int
     -> a
     -> Data.HashMap.Base.HashMap GHC.Types.Int a
     -> Data.HashMap.Base.HashMap GHC.Types.Int a
[GblId,
 Arity=4,
 Str=<L,1*C1(C(U))><S(S),1*U(U)><L,U><S,1*U>,
 Unf=OtherCon []] =
    [] \r [f_saiX k_saiY a1_saiZ m_saj0]
        case k_saiY of {
          GHC.Types.I# ww1_saj2 [Occ=Once] ->
              let {
                g_saj3 [Occ=OnceL!, Dmd=<L,C(U)>] :: a_a9o3 -> a_a9o3
                [LclId] =
                    [f_saiX a1_saiZ] \u [] f_saiX a1_saiZ; } in
              let {
                sat_saj6 [Occ=Once] :: a_a9o3 -> (# a_a9o3 #)
                [LclId] =
                    [g_saj3] \r [a2_saj4]
                        let {
                          sat_saj5 [Occ=Once] :: a_a9o3
                          [LclId] =
                              [g_saj3 a2_saj4] \u [] g_saj3 a2_saj4;
                        } in  Unit# [sat_saj5];
              } in  RULES.$w$sinsertModifying a1_saiZ sat_saj6 ww1_saj2 m_saj0;
        };

insertModifying uses its function argument at most once, so there is no possible benefit to this partial application.

Change History (16)

comment:1 Changed 19 months ago by nomeata

Sorry. I commented here, but my comment seems to have disappeared… probably did not send it.

insertModifying uses its function argument at most once, so there is no possible benefit to this partial application.

but does GHC know this (non-local) fact? What is the strictness signature of RULES.$w$sinsertModifying.

Also, what is the Core directly after Call Arity? (-ddump-call-arity)?

comment:2 Changed 19 months ago by nomeata

Also, if call arity is too stupid, you might avoid the problem by using oneShot in the right place.

comment:3 Changed 19 months ago by dfeuer

nomeata, I finally came up with a standalone test case that exhibits the same (apparent) peculiarity. I don't really understand what you're asking, so I'm hoping this will help.

{-# language UnboxedTuples #-}
module Fish where
import Data.Array.ST
import Control.Monad.ST.Strict
import Control.Monad

blink :: (a -> b) -> a -> (# b #)
blink g a = (# g a #)

test :: Int -> a -> (a -> a -> a) -> STArray s Int a -> ST s (STArray s Int a)
test k a f m = insertModifyingArr k (blink (f a)) m
{-# NOINLINE test #-}

insertModifyingArr :: Int -> (a -> (# a #))
                   -> STArray s Int a -> ST s (STArray s Int a)
insertModifyingArr i0 f arr0 = do
   rng <- range <$> getBounds arr0
   go i0 rng arr0
  where
    go i [] arr = pure arr
    go i (k : ks) arr
      | i == k = do
          old <- readArray arr i
          case f old of (# new #) -> writeArray arr i new
          return arr
      | otherwise = go i ks arr

comment:4 Changed 19 months ago by nomeata

It works if you don't export insertModifyingArr. Then it gets inlined into test, and CallArity, when looking at the binding of g_s9E in your original snippet, sees all the uses, sees that they are called at most once, and is happy to eta-expand it!

But without inlining insertModifyingArr, this is beyond the reach of Call Arity, because it is not a higher-order analysis.

Now, why does the demand analyser (which is higher-order, i.e. knows how functions call their arguments) not fix this? Because the demand analyser does not see that insertModifyingArr calls its argument only once, because the call is in a recursive loop.

Sebastian Graf (@sgraf812) has thoughts on combining the analyses to give us the best of both worlds, maybe he can comment.

For now, does

import GHC.Magic

blink :: (a -> b) -> a -> (# b #)
blink g = oneShot $ \a -> (# g a #)

do what you want?

comment:5 Changed 19 months ago by dfeuer

Your workaround does work. Inlining insertModifying (in the real-life case) is probably a bad idea. Your workaround does work. An alternative workaround does too:

test k a f m = insertModifyingArr k (oneShot $ blink (f a)) m

I'm glad to hear someone's been thinking about this issue; I'm much less glad that I've spent so much time coming up with a test case for something that's already known!

comment:6 Changed 19 months ago by nomeata

I'm much less glad that I've spent so much time coming up with a test case for something that's already known!

Sorry about that – at least we have a good test case now for when someone pick this up again!

comment:7 Changed 19 months ago by sgraf

Hey, it seems I'm lucky I subscribed to ghc-ticket just yesterday :)

I investigated a little.

So, it turns out that Demand Analysis doesn't recognize insertModifyingArrs argument f to be called once (has usage C(U(U)). That's really weird, actually the analysis should be able to figure that out. The fact that -ddump-stranal doesn't mention f in the demand type as a free variable to go makes me feel like there is some conservative approximation at play here. And in fact, I believe that's how fix-pointing works for functions in DmdAnal: Completely forget about free variables and assume the most pessimistic result.

You can circumvent that if you make f an argument to go (reverse static arg transform, essentially):

insertModifyingArr :: Int -> (a -> (# a #))
                   -> STArray s Int a -> ST s (STArray s Int a)
insertModifyingArr i0 f arr0 = do
   rng <- range <$> getBounds arr0
   go f i0 rng arr0
  where
    go f i [] arr = pure arr
    go f i (k : ks) arr
      | i == k = do
          old <- readArray arr i
          case f old of (# new #) -> writeArray arr i new
          return arr
      | otherwise = go f i ks arr

This makes DmdAnal propagate the usage information to the top-level: The demand signature for inserModifyingArr mentions f with usage 1*C1(U(U)), which in theory should be enough information for the simplifier to eta-expand the blink expression.

And sure enough, it does (somewhere inside test:

      Fish.insertModifyingArr_$spoly_go
        @ a_a3IX
        @ s_a3IY
        w_s4pP
        ww6_s4s2
        ww8_s4s5
        ww3_s4q2
        ww4_s4q3
        (GHC.Enum.eftInt ww6_s4s2 ww8_s4s5)
        k_a15T
        (\ (a2_a15S [OS=OneShot] :: a_a3IX) ->
           (# f_a15V a1_a15U a2_a15S #))

The interactions between free vars and static args are a little counter-intuitive, I suppose...

comment:8 Changed 19 months ago by nomeata

You can circumvent that if you make f an argument to go (reverse static arg transform, essentially):

That’s smart! By making it an argument, you essentially tell GHC to apply inductive reasoning, and then the Demand Analyzer finds out that f is used at most once! Cool.

And I presume if the recursion were non-linear, it would also do the right thing…

So – can we just include the free variables in the strictness signature during fixpointing and get that optimization without the code transformation?

comment:9 in reply to:  8 Changed 19 months ago by sgraf

Replying to nomeata:

So – can we just include the free variables in the strictness signature during fixpointing and get that optimization without the code transformation?

Well, actually I had expected DmdAnal to just work here. Not sure why f doesn't end up in gos signature, I suspected this has something to do with this Lazy and unleashable free variables hack, but I'm not so sure anymore. I'll investigate.

comment:10 Changed 19 months ago by sgraf

Cc: sgraf added

comment:11 Changed 19 months ago by sgraf

OK, this is due to the call to reuseEnv here (that function effectively duplicates every demand, so 1*C1(U(U)) -> C(U(U))) with the explanation in Aggregated demand for cardinality, the gist of which is that we have to treat free variables of let-bound thunks differently in usage analysis than we would like to for strictness analysis.

I think this should only be relevant for thunks, e.g. we should first splitFVs is_thunk rhs_fv and only then reuseEnv the lazy_fv. I'll give that a shot.

comment:12 Changed 19 months ago by sgraf

OK, that didn't work, but for reasons I didn't expect. If we apply that change, suddenly some bindings get *worse* *strictness* annotations, although it should only make for *less conservative* (possibly unsound) *usage* annotations, as reuseEnv will only affect usage information.

It turns out that this is due to the interaction between the lazy fv hack and the fix-pointing algorithm. An example is adapted from T876:

foo :: Int -> Int
foo n = sum [ length [i..n] | i <- [1..n] ]

main = print (foo 100)

The variant that does get rid of the call to reuseEnv altogether will produce something like this code:

foo
  = \ (n_aYV [Dmd=<L,U(U)>] :: Int) ->
      joinrec {
        go_a2c3 [Occ=LoopBreaker] :: [Int] -> Int -> Int
        [LclId[JoinId(2)],
         Arity=2,
         Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True,
                 WorkFree=True, Expandable=True, Guidance=IF_ARGS [30 20] 137 0}]
        go_a2c3 (ds_a2c4 [Dmd=<S,1*U>] :: [Int])
                (eta_B1 [Dmd=<L,1*U(U)>] :: Int)
          = case ds_a2c4 of {
              [] -> eta_B1;
              : y_a2c9 [Dmd=<L,1*U(U)>] ys_a2ca ->
                jump go_a2c3
                  ys_a2ca
                  (case eta_B1 of { GHC.Types.I# x_a3ii [Dmd=<S,U>] ->
                   case y_a2c9 of { GHC.Types.I# x_a3jq [Dmd=<S,U>] ->
                   case n_aYV of { GHC.Types.I# y_a3jz [Dmd=<S,U>] ->
                   case GHC.List.$wlenAcc @ Int (GHC.Enum.eftInt x_a3jq y_a3jz) 0#
                   of ww2_a4JP [Dmd=<S,U>]
                   { __DEFAULT ->
                   GHC.Types.I# (GHC.Prim.+# x_a3ii ww2_a4JP)
                   }
                   }
                   }
                   })
            }; } in
      jump go_a2c3
        (case n_aYV of { GHC.Types.I# y_a3jz [Dmd=<S,U>] ->
         GHC.Enum.eftInt 1# y_a3jz
         })
        lvl_s4Jd

Note that foo is clearly strict in n (that's what HEAD finds out), but this variant is too conservative. Some printfs revealed that's due to abortion of fix-pointing. This is a log for the lazy_fvs and the sig_fv envs:

dmdAnalRhsLetDown go_a2c3 [] []
dmdAnalRhsLetDown go_a2c3 [] [aYV :-> <L,1*U(U)>]
dmdAnalRhsLetDown go_a2c3 [aYV :-> <L,U(U)>] []
dmdAnalRhsLetDown go_a2c3 [] [aYV :-> <L,1*U(U)>]
dmdAnalRhsLetDown go_a2c3 [aYV :-> <L,U(U)>] []
dmdAnalRhsLetDown go_a2c3 [] [aYV :-> <L,1*U(U)>]
dmdAnalRhsLetDown go_a2c3 [aYV :-> <L,U(U)>] []
dmdAnalRhsLetDown go_a2c3 [] [aYV :-> <L,1*U(U)>]
dmdAnalRhsLetDown go_a2c3 [aYV :-> <L,U(U)>] []
dmdAnalRhsLetDown go_a2c3 [] [aYV :-> <L,1*U(U)>]
dmdAnalRhsLetDown go_a2c3 [] [aYV :-> <L,1*U(U)>]
dmdAnalRhsLetDown foo [] []

It flip flops between putting ns demand into the sig_fv and the lazy_fv. That's decided by isWeakDmd, which amounts to checking if the demand is equivalent to <L,U> and will thus no longer change during fix-pointing. After the initial iteration, we find that n is called once and gets tagged onto the strictness signature. The second iteration sees that n is called an additional time, demand <L,U(U)>. This means it no longer has an interesting demand and goes into lazy_fv. But here's the culprit: The fix-pointer only compares the strictness signature for changes! It will start a third iteration, completely forget about any lazy_fv and flop back to the state of the first iteration.

There's two ways out:

  1. Also check lazy_fvs for changes. This is the thing we wanted to avoid in the first place. Also this is LetUp in disguise, which purposefully isn't equipped to deal with recursive bindings.
  2. Don't check lazy_fvs. These are outer bindings only, so they don't actually need to play a role in fix-pointing. Also everything in lazy_fvs is already top-ish, so it suffices to check if a variable in a prior signature is now part of lazy_fvs and exclude them from the check.

I'll try 2. tomorrow.

Last edited 19 months ago by sgraf (previous) (diff)

comment:13 Changed 19 months ago by sgraf

I spent some time thinking about how to implement this, but given that lazy_fvs plays a largely different role for thunks (unleashes strictness through signatures and stores usage in lazy_fvs) vs. non-thunks (puts weak demands into lazy_fvs, e.g. <L,U> and <L,C(C1(U))> which won't change anymore during fixpointing), I don't think pursuing this is worth the trouble.

I'm afraid we just have to live with free variables getting a second-class treatment compared to parameters for the time being. This is another good argument for eventually splitting the demand analyser into its three sub-analyses.

comment:14 Changed 15 months ago by bgamari

Milestone: 8.6.18.8.1

These won't be addressed in GHC 8.6.

comment:15 Changed 9 months ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of low-priority tickets.

comment:16 Changed 8 months ago by simonpj

Keywords: DemandAnalysis added
Note: See TracTickets for help on using tickets.