Changes between Version 6 and Version 7 of Commentary/Compiler/Demand/LetUp

Nov 9, 2017 11:43:38 AM (2 years ago)

Implementation concerns


  • Commentary/Compiler/Demand/LetUp

    v6 v7  
    183183`dnf` shows the biggest drawback of this: This is pretty much **exponential** in the number `Both` nodes.
    185 'Substitution' is pretty easy here: Because there are no `Both` nodes (or rather that the top-level is `Or` only), we can just substitute into each
     185'Substitution' is pretty easy here: Because there are no `Both` nodes (or rather that the top-level is `Or` only), we can just substitute into each other.
    187187''sgraf: Actually, I'm not sure what this buys us, other than simplicity of implementation.
    189189Effectively, we model **every possible path of execution** separately.
    190190It's the same as in logic, I guess: DNF is always possible, and easily handled, but conversion blows up exponentially in general.''
     192== `Termination` ==
     194The previous ideas greatly simplify by ignoring `Termation`/`DmdResult`s.
     195In fact, the bugs encountered while implementing the first proposal were all due to complex interplay with `Termination`.
     197Consider the two `DmdType`s `{}` (e.g. empty, terminating) and `{y-><S,1*U>}x` (uses `y` exactly once and throws, like in `error y`).
     198When these are `lub`ed, we want the `DmdType` `{y-><L,1*U>}` (uses `y` at most once, possibly terminating) as a result.
     200The status quo is to compute the new `DmdEnv` by `plusVarEnv_CD lubDmd env1 (defaultDmd res1) env2 (defaultDmd res2)`.
     202- Where *either* the left or right `env*` is defined, we compute a new entry by `lub`ing, using `defaultDmd res*` (where `res*` is the `DmdResult` of the corresponding `DmdType`) when there is no entry in one of the `env*`s.
     203- When there is no entry in either `env*`, then there won't be in the result!
     205Since this approach gets its precision from delaying `lub`s as long as possible, we have to store `Termination` information in the `DmdEnv'`/`DmdEnv''` data structure.
     206At the least, we have to tag the children of `Or` nodes, for the first two approaches (`DmdEnv'`) we also have to tag `Both` nodes, e.g.
     209data DmdEnv'
     210  = Empty
     211  | Var Id Demand
     212  | Or DmdEnv' (Termination ()) DmdEnv' (Termination ())
     213  | Both DmdEnv' (Termination ()) DmdEnv' (Termination ())
     215newtype DmdEnv''
     216  = DNF [(DmdEnv, Termination ())] -- Not quite, see below
     219Variables not mentioned in either `lub` operand are assumed to be used according to `defaultDmd res`, where `res` is the `DmdResult` currently part of the containing `DmdType`.
     220Note that due to `ExnStr`/`catch#`, the `defaultDmd res` can change after computing the actual `lub` above, so in general `defaultDmd res*` and `defaultDmd res` can vary independently!
     222That's also why [`findIdDemand`]( in `Demand.hs` takes a `DmdResult` as argument.
     224And that's also why the above definition for `DmdEnv''` is incorrect!
     225It assumes that a single `Termination` result per `lub` operand is enough, which isn't the case, as the following continued example shows:
     227We started with `lubDmdType {} {y-><S,1*U>}x = {y-><L,1*U>}` (in terms of the status quo).
     228Now, `lub` that again with `{z-><S,1*U>}` to get `{y-><L,1*U>,z-><L,1*U>}`.
     230So far so good.
     231The DNF would look something like `[({},Dunno ()), ({y-><S,1*U>},ThrowsExn), ({z-><S,1*U>},Dunno ())]`, which `flattenDmdEnv''`s to pretty much the same `DmdEnv` as in the `DmdType` above.
     233Now consider what happens when `Termination` changed in-between the two `lub`s, e.g. because the first result of `lubDmdType` was an argument to `error`:
     235  lubDmdType (markThrowsExn (lubDmdType {} {y-><S,1*U>}x)) {z-><S,1*U>}
     236= lubDmdType (markThrowsExn {y-><L,1*U>}                 ) {z-><S,1*U>}
     237= lubDmdType  {y-><L,1*U>}x                                {z-><S,1*U>}
     238= {y-><L,1*U>,z-><S,1*U>}
     241Note that, different to before, `z` is now evaluated strictly!
     242In our current DNF formulation, we wouldn't be able to express the difference.
     244The problem here is that `lub` is no longer associative, because it depends on the `Termination` at the time of the call.
     245So, we can't actually model `DmdEnv''` as a DNF, but have to resort to
     248data DmdEnv''
     249  = Lit DmdEnv
     250  | Or DmdEnv'' (Termination ()) DmdEnv'' (Termination ())
     253Which is only marginally less complex than a version of `DmdEnv'` where we subsume the `Empty` and `Var` cases with an equivalent `Lit` case:
     256data DmdEnv'
     257  = Lit DmdEnv
     258  | Or DmdEnv' (Termination ()) DmdEnv' (Termination ())
     259  | Both DmdEnv' (Termination ()) DmdEnv' (Termination ())
     261dnf :: DmdEnv' -> DmdEnv''
     262dnf (DmdEnv'.Lit env) = DmdEnv''.Lit env                               -- hypothetical disambiguating syntax
     263dnf (DmdEnv'.Or fv1 t1 fv2 t2) = DmdEnv''.Or (dnf fv1) t1 (dnf fv2) t2
     264dnf (DmdEnv'.Both fv1 t1 fv2 t2) = distr (dnf fv1) t1 (dnf fv2) t2 -- case in point
     265  where
     266    -- both env1 with defaultDmd t1 into every Lit of fv2
     267    distr (DmdEnv''.Lit env1) t1 fv2 t2 = undefined
     268    -- both env2 with defaultDmd t2 into every Lit of fv1
     269    distr fv1 t1 (DmdEnv''.Lit env2) t2 = undefined
     270    -- Not exactly sure how to compute this.
     271    -- This case leads to the exponential explosion,
     272    -- as we have to combine every Lit in fv1 with every Lit in fv2.
     273    -- This gets insane as we have to keep in mind which `Termination` to use for each point
     274    -- in each `Lit` separately.
     275    distr DmdEnv''.Or{} t1 DmdEnv''.Or{} t2 = undefined
     278`dnf` mirrors how `bothDmdEnv'` would have to change in the current implementation to get to the DNF-like version.
     280In fact, the currently working prototype of the first proposal had a complicated bug in the [ `lookupDmdTree`] implementation, which forced me to think about how the lookup of a single point can potentially access *all* `Termination ()` tags on the path to a `Lit` it is contained in.
     281This also applies to the implementation of `dnf`: I'm pretty sure the step that distributes `Both` over both sub trees needs to model all points separately to know with which `Termination` to `both`.
     282Sorry if this is a little incomprehensible without an example, but this is already pretty deep down the rabbit hole.
     284The bottom-line is that there still is some DNF-like structure possible, but in addition to the incurred exponential space complexity, there's also the issue of how to distribute `Both` nodes into leafs of the Or-tree without losing sanity to the implementation.
     287== Evaluation ==
     289=== Both/Or trees, grafted at MCRAs
     291The current [ working prototype of the first proposal] (progress tracked at [ `and-or` branch]) passes `./validate` and nofib.
     293GHC is built with 0.1% more allocations and 7.1% (!) more executed instructions (cachegrind).
     294There may still be potential in optimizing the `DmdTree` (called `DmdEnv'` on this page) representation in its smart constructors (`bothDmdTree`, `lubDmdTree`) to avoid big trees.
     296Running nofib revealed two benchmarks where allocation changed, `fft2` with -0.1% and `transform` with -0.0%.
     297Changes in counted instructions where all around +-0.0%, with the exception of `fft2` (-0.2%).
     298Seems like a net gain, but pretty much insignificant, especially compared to the significant increase in compilation time.
     300=== Both/Or trees, pushed into Or
     302The implementation of the second proposal can be found [ here] (progress tracked at [ `and-or-distr` branch].
     303This was a [ minimal change] compared to the single graft point solution, with great repercussions:
     305Compilation of modules with big records (looking at you, [ `Distribution.Types.BuildInfo`] eats up all resources on my machine without coming to an end.
     306Given the minimal invasive change, this can only be due to combinatorial explosion.
     307Some debugging pinned this down to derived `Eq` and `Show` instances, where the generated `DmdTree`s grow bigger than e.g. 10000 nodes.
     308I'm not entirely sure what causes the duplication that leads to the slowdown, so I tried falling back to the status quo for dictionary Ids (like we do for unlifted lets anyway) to no avail.
     310=== DNF ===
     312When trying to implement this, I stumbled on the problems wrt. `Termination` outlined above.
     313Also considering that the space complexity of this approach will be even less predictable, I've put the implementation of this is on ice for now.