Opened 3 years ago

Closed 3 years ago

#12368 closed bug (fixed)

Demand Analyzer: Cunnig plan not adhered to with aborting fixpoint interation

Reported by: nomeata Owned by:
Priority: low Milestone: 8.2.1
Component: Compiler Version: 8.0.1
Keywords: Cc: ilya, osa1
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: stranal/should_run/{T12368,T12368a}
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D2392
Wiki Page:

Description

Not sure how relevant this is, but when reading both paper and code long enough, on inevitably finds some code smell.

This is about nested recursion, as in this example

  f [] = []
  f (x:xs) = let g []     = f xs
                 g (y:ys) = y+1 : g ys
              in g (h x)

The “cunning plan” of fixpoint iteration (see Note [Initialising strictness]) says that in the first time an inner recursive definition is looked at, its strictness is assumed to be b (botSig), and from then on, its idInformation (presumably from the previous iteration or the outer recursive definition) is used. A flag (virgin) in the analysis environment is used to detect that.

The problem is that the fixpoint iteration code in dmdFix aborts after more than 10 iterations:

    loop' n env pairs
      | found_fixpoint
      = (env', lazy_fv, pairs')
      | n >= 10
      = (env, lazy_fv, orig_pairs)      -- Safe output

This means that if the iteration does not terminate, we will “not” attach a strictness signature to the inner binders (g in the example above), but rather leave the binders untouched.

Then, in the second iteration of finding a fixpoint for f, the virgin flag is False, and idStrictness is used, which in this case will simply be the default, namely nopSig.

I could not produce an example where it matters. And it might be that it simply does not matter, so I’m not sure what to do with this information.

Change History (18)

comment:1 Changed 3 years ago by simonpj

That's quite right. After thinking it briefly (ie I might well be wrong) I don't see a good way to fix it.

The trouble is that the fixpoint iteration starts from an unsound assumption: that the function is hyperstrict. It then iterates to a sound conclusion. So if you have to abandon the process, the current approximation is unsound. So you can't just attach it.

I don't see how to take advantage of the work done so far.

Mind you, 10 iterations is a lot! I think it prints a warning: worth investigating. If it's rare enough it probably doesn't matter much.

Simon

comment:2 Changed 3 years ago by nomeata

The warning is disabled at the moment. It would be much less rare if we had nested demands on sum types; right now the easiest way to trigger it is a recursive function over a recursive product type (e.g. data Stream a = S a (Stream a) or data BinTree a = Node (BinTree a) a (BinTree a).

Maybe there is a way to fix this by doing a final single iteration with sound, but conservative assumptions about the strictness signature, and then using pairs' instead of orig_pairs. This would also make me less worry that the lazy_fv returned in the n>=10 case could be wrong (as they are produced from the unsound assumption).

comment:3 Changed 3 years ago by nomeata

I could trigger an unsound result this way:

module DmdFixBug where
-- Needs to be a product type
data Stream = S Int Stream
bar s = foo s
  where
    foo :: Stream -> Int
    foo (S n s) = n + foo s

that terminate only because of the 10-iteration-limit, and as you can see, the result is wrong (there is an “absent” value that is not absent)

bar :: Stream -> Int
[LclIdX,
 Arity=1,
 Str=<B,1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),A)))))))))>b,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
         Tmpl= \ (s_axX [Occ=Once] :: Stream) -> foo_sK7 s_axX}]
bar =
  \ (s_axX [Dmd=<B,1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),1*U(1*U(U),A)))))))))>]
       :: Stream) ->
    foo_sK7 s_axX

(I need to wrap foo in bar because foo does not get a strictness result attached, because the analysis fails.)

I’ll see if I can actually make the program crash, and turn this into a proper test suite test case.

comment:4 Changed 3 years ago by Joachim Breitner <mail@…>

In b9cea81d/ghc:

Show testcase where demand analysis abortion code fails

By making it believe that some deeply nested value is absent when it
really isn't. See #12368.

comment:5 Changed 3 years ago by nomeata

I could make it trigger a T12368: Oops! Entered absent arg w Stream; test case as T12368.

I think the fix is, if the number of iterations exceeds the limit, to do a final run with most pessimistic assumptions about the strictness signatures of the things in the recursive group.

comment:6 Changed 3 years ago by simonpj

Are you saying that the existing abortion mechanism, which returns to orig_pairs is unsound too? I didn't know that.

comment:7 Changed 3 years ago by nomeata

Differential Rev(s): Phab:D2392
Status: newpatch

That’s my impression. I propose a fix in https://phabricator.haskell.org/D2392

comment:8 Changed 3 years ago by simonpj

Sorry to be slow.

I've looked at this and I don't agree with it at all! In the current code, if we hit the iteration limit we do this:

    loop' n env pairs
      | n >= 10
      = (env, lazy_fv, orig_pairs)      -- Safe output

So we return an un-decorated binding orig_pairs but (wrongly) an extended environment env. We should just return the un-extended environment! (If a variable isn't in the environment it's treated as having topSig.) Simple.

Well, not totally simple. There is a nasty corner case, when we have nested recursive bindings:

f x = ...let g y = ...y... in ...

Suppose that

  • on the first f iteration we find a fixpoint for g which we attach to it.
  • But on the second f iteration we fail to find a fixpoint for g. Then we should revert to topSig, not to the result of the first iteration.

Conclusion: if the fixpoint limit is reached:

  • return an environment that simply does not mention the new binders
  • set all the binders to no-strictness-at-all

Finally, the loop function in dmdFix would be much better if it took a SigEnv rather than an AnalEnv, wouldn't it? Just a simple refactoring; but the ae_sigs field is the only bit that varies.

OK Joachim?

Simon

comment:9 Changed 3 years ago by nomeata

So we return an un-decorated binding orig_pairs

It is not necessarily un-decorated, but may have information from a previous pass of the Demand Analyzer. I was under the impression that we generally want to make sure that running the demand analyzer updates _all_ strictness signatures and demand information, and does not leave any around from a previous iteration.

I am never fully confident that I can overlook all the assumptions in the code and the consequences, that’s why I am much more confident with simple invariants such as “all code has been processed and annotated by the demand analyzer at least once.”

We should just return the un-extended environment! (If a variable isn't in the environment it's treated as having topSig.) Simple.

Is that safe? topSig (actually nopSig, since there is no top strictness signature) has an empty environment, so if the RHS of the definition makes use of a free variable, nopSig effectively has an absent demand on that.

I guess this is the reason for the lazy_fv fuss. I think in some other discussion you wondered whether that is really required. By having the analysis set to topSig it definitely has to.

Conclusion: if the fixpoint limit is reached:

  • return an environment that simply does not mention the new binders

Agreed, if lazy_fv works as advertised.

  • set all the binders to no-strictness-at-all

What about demand and strictness signatures attached to binders somewhere nested in the RHS of one of the equation. Should we zap them as well?

Finally, the loop function in dmdFix would be much better if it took a SigEnv rather than an AnalEnv, wouldn't it?

Or maybe even just a list of StrSigs, to make it clear that only the strictness signatures of the recursive binders vary.

All in all I wonder if avoiding an extra iteration (after we already did 10) in the corner case is worth the extra complication of having to think about how to properly abort the analysis from in an invalid state. Simply jumping to a definitely sound state, and being able to guarantee that a pass of the demand analyzer processes all code appears to be simpler and more reliable to me.

comment:10 Changed 3 years ago by simonpj

Yes that's the reason for the lazy_fv fuss! But that fuss is needed even when we do find a fixpoint.

What about demand and strictness signatures attached to binders somewhere nested in the RHS of one of the equation. Should we zap them as well?

Ah, now that is an excellent point! Which should be carefully documented in a Note, eventually

Yes ok. (I was not fussed about an extra iteration, just about clarity.) Let's do one more iteration. But we can do that simply with the original, unextended environment. I htink there is no need to do addPessimisticSigs.

comment:11 Changed 3 years ago by nomeata

I think there is no need to do addPessimisticSigs.

Ok, I will adjust the patch as requested, and add more ♫.

comment:12 Changed 3 years ago by nomeata

Working on this right now. I refactored the code quite a bit, removed some redundancies (such as passing around an strictness signature right next to an id, when that id is guaranteed to have been annotated with that strictness signature).

Code pushed to wip/12368, push to Phabricator will happens once the automatic validators have validated the change.

I think there is no need to do addPessimisticSigs.

Are you sure? Any variable with useful information (strict or used-once) will not be included in lazy_fv (according to splitFVs). If we now also remove them from the strictness signatures, their uses are not recorded anywhere – and then probably considered absent. I’ll try to produce a test case to verify that theory.

comment:13 Changed 3 years ago by nomeata

I’ll try to produce a test case to verify that theory.

I have one that shows the problem. I could not reproduce it in the previous code where the (unsound, as shown in the test case above) strictness signatures were used, and these then included the demand on the strict free variables.

Anyways, I need to run now. I have updated Phab:D2392 with my current code, for easier review.

comment:14 Changed 3 years ago by nomeata

Fixed the problem, by adding a topDmd demand on all free variables mentioned in the strictness signature of the binders to lazy_fvs, before throwing away the strictness signature. Fixes this test case. Code on Phab and the branch, currently awaiting validation.

While I am at it: splitFVs in Demand.hs has some intricate special handling for thunks, without any comment or justification. Do you have any idea why that is there, and whether it is still required?

comment:15 Changed 3 years ago by nomeata

Code on Phab and the branch, currently awaiting validation.

Didn’t validate, had a minor typo that was hard to track down. But now it is fixed, and ready for review.

comment:16 Changed 3 years ago by Joachim Breitner <mail@…>

In 8d92b88/ghc:

DmdAnal: Add a final, safe iteration

this fixes #12368.

It also refactors dmdFix a bit, removes some redundancies (such as
passing around an strictness signature right next to an id, when that id
is guaranteed to have been annotated with that strictness signature).

Note that when fixed-point iteration does not terminate, we
conservatively delete their strictness signatures (set them to nopSig).
But this loses the information on how its strict free variables are
used!

Lazily used variables already escape via lazy_fvs. We ensure that in the
case of an aborted fixed-point iteration, also the strict variables are
put there (with a conservative demand of topDmd).

Differential Revision: https://phabricator.haskell.org/D2392

comment:17 Changed 3 years ago by thomie

Test Case: stranal/should_run/{T12368,T12368a}

What should be the status of this ticket? (merge, close, new, patch?) And which milestone should it have? (8.0.2 or 8.2.1?)

comment:18 Changed 3 years ago by nomeata

Milestone: 8.2.1
Resolution: fixed
Status: patchclosed

Sorry for forgetting to close the trac tickets.

It should be close. The bug is mostly theoretically and not observed in the wild, so this should not be backported, I believe.

Note: See TracTickets for help on using tickets.