Changes between Version 6 and Version 7 of Commentary/Compiler/Demand/LetUp


Ignore:
Timestamp:
Nov 9, 2017 11:43:38 AM (2 years ago)
Author:
sgraf
Comment:

Implementation concerns

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/Compiler/Demand/LetUp

    v6 v7  
    183183`dnf` shows the biggest drawback of this: This is pretty much **exponential** in the number `Both` nodes.
    184184
    185 'Substitution' is pretty easy here: Because there are no `Both` nodes (or rather that the top-level is `Or` only), we can just substitute into each
     185'Substitution' is pretty easy here: Because there are no `Both` nodes (or rather that the top-level is `Or` only), we can just substitute into each other.
    186186
    187187''sgraf: Actually, I'm not sure what this buys us, other than simplicity of implementation.
     
    189189Effectively, we model **every possible path of execution** separately.
    190190It's the same as in logic, I guess: DNF is always possible, and easily handled, but conversion blows up exponentially in general.''
     191
     192== `Termination` ==
     193
     194The previous ideas greatly simplify by ignoring `Termation`/`DmdResult`s.
     195In fact, the bugs encountered while implementing the first proposal were all due to complex interplay with `Termination`.
     196
     197Consider the two `DmdType`s `{}` (e.g. empty, terminating) and `{y-><S,1*U>}x` (uses `y` exactly once and throws, like in `error y`).
     198When these are `lub`ed, we want the `DmdType` `{y-><L,1*U>}` (uses `y` at most once, possibly terminating) as a result.
     199
     200The status quo is to compute the new `DmdEnv` by `plusVarEnv_CD lubDmd env1 (defaultDmd res1) env2 (defaultDmd res2)`.
     201
     202- Where *either* the left or right `env*` is defined, we compute a new entry by `lub`ing, using `defaultDmd res*` (where `res*` is the `DmdResult` of the corresponding `DmdType`) when there is no entry in one of the `env*`s.
     203- When there is no entry in either `env*`, then there won't be in the result!
     204
     205Since this approach gets its precision from delaying `lub`s as long as possible, we have to store `Termination` information in the `DmdEnv'`/`DmdEnv''` data structure.
     206At the least, we have to tag the children of `Or` nodes, for the first two approaches (`DmdEnv'`) we also have to tag `Both` nodes, e.g.
     207
     208{{{
     209data DmdEnv'
     210  = Empty
     211  | Var Id Demand
     212  | Or DmdEnv' (Termination ()) DmdEnv' (Termination ())
     213  | Both DmdEnv' (Termination ()) DmdEnv' (Termination ())
     214
     215newtype DmdEnv''
     216  = DNF [(DmdEnv, Termination ())] -- Not quite, see below
     217}}}
     218
     219Variables not mentioned in either `lub` operand are assumed to be used according to `defaultDmd res`, where `res` is the `DmdResult` currently part of the containing `DmdType`.
     220Note that due to `ExnStr`/`catch#`, the `defaultDmd res` can change after computing the actual `lub` above, so in general `defaultDmd res*` and `defaultDmd res` can vary independently!
     221
     222That's also why [`findIdDemand`](https://github.com/sgraf812/ghc/blob/922db3dac896b8cf364c9ebaebf1a27c2468c709/compiler/basicTypes/Demand.hs#L1577) in `Demand.hs` takes a `DmdResult` as argument.
     223
     224And that's also why the above definition for `DmdEnv''` is incorrect!
     225It assumes that a single `Termination` result per `lub` operand is enough, which isn't the case, as the following continued example shows:
     226
     227We started with `lubDmdType {} {y-><S,1*U>}x = {y-><L,1*U>}` (in terms of the status quo).
     228Now, `lub` that again with `{z-><S,1*U>}` to get `{y-><L,1*U>,z-><L,1*U>}`.
     229
     230So far so good.
     231The DNF would look something like `[({},Dunno ()), ({y-><S,1*U>},ThrowsExn), ({z-><S,1*U>},Dunno ())]`, which `flattenDmdEnv''`s to pretty much the same `DmdEnv` as in the `DmdType` above.
     232
     233Now consider what happens when `Termination` changed in-between the two `lub`s, e.g. because the first result of `lubDmdType` was an argument to `error`:
     234{{{
     235  lubDmdType (markThrowsExn (lubDmdType {} {y-><S,1*U>}x)) {z-><S,1*U>}
     236= lubDmdType (markThrowsExn {y-><L,1*U>}                 ) {z-><S,1*U>}
     237= lubDmdType  {y-><L,1*U>}x                                {z-><S,1*U>}
     238= {y-><L,1*U>,z-><S,1*U>}
     239}}}
     240
     241Note that, different to before, `z` is now evaluated strictly!
     242In our current DNF formulation, we wouldn't be able to express the difference.
     243
     244The problem here is that `lub` is no longer associative, because it depends on the `Termination` at the time of the call.
     245So, we can't actually model `DmdEnv''` as a DNF, but have to resort to
     246
     247{{{
     248data DmdEnv''
     249  = Lit DmdEnv
     250  | Or DmdEnv'' (Termination ()) DmdEnv'' (Termination ())
     251}}}
     252
     253Which is only marginally less complex than a version of `DmdEnv'` where we subsume the `Empty` and `Var` cases with an equivalent `Lit` case:
     254
     255{{{
     256data DmdEnv'
     257  = Lit DmdEnv
     258  | Or DmdEnv' (Termination ()) DmdEnv' (Termination ())
     259  | Both DmdEnv' (Termination ()) DmdEnv' (Termination ())
     260
     261dnf :: DmdEnv' -> DmdEnv''
     262dnf (DmdEnv'.Lit env) = DmdEnv''.Lit env                               -- hypothetical disambiguating syntax
     263dnf (DmdEnv'.Or fv1 t1 fv2 t2) = DmdEnv''.Or (dnf fv1) t1 (dnf fv2) t2
     264dnf (DmdEnv'.Both fv1 t1 fv2 t2) = distr (dnf fv1) t1 (dnf fv2) t2 -- case in point
     265  where
     266    -- both env1 with defaultDmd t1 into every Lit of fv2
     267    distr (DmdEnv''.Lit env1) t1 fv2 t2 = undefined
     268    -- both env2 with defaultDmd t2 into every Lit of fv1
     269    distr fv1 t1 (DmdEnv''.Lit env2) t2 = undefined
     270    -- Not exactly sure how to compute this.
     271    -- This case leads to the exponential explosion,
     272    -- as we have to combine every Lit in fv1 with every Lit in fv2.
     273    -- This gets insane as we have to keep in mind which `Termination` to use for each point
     274    -- in each `Lit` separately.
     275    distr DmdEnv''.Or{} t1 DmdEnv''.Or{} t2 = undefined
     276}}}
     277
     278`dnf` mirrors how `bothDmdEnv'` would have to change in the current implementation to get to the DNF-like version.
     279
     280In fact, the currently working prototype of the first proposal had a complicated bug in the [https://github.com/sgraf812/ghc/blob/43c045249402319170fa421438a1f05887269a26/compiler/basicTypes/Demand.hs#L1287 `lookupDmdTree`] implementation, which forced me to think about how the lookup of a single point can potentially access *all* `Termination ()` tags on the path to a `Lit` it is contained in.
     281This also applies to the implementation of `dnf`: I'm pretty sure the step that distributes `Both` over both sub trees needs to model all points separately to know with which `Termination` to `both`.
     282Sorry if this is a little incomprehensible without an example, but this is already pretty deep down the rabbit hole.
     283
     284The bottom-line is that there still is some DNF-like structure possible, but in addition to the incurred exponential space complexity, there's also the issue of how to distribute `Both` nodes into leafs of the Or-tree without losing sanity to the implementation.
     285
     286
     287== Evaluation ==
     288
     289=== Both/Or trees, grafted at MCRAs
     290
     291The current [https://github.com/sgraf812/ghc/blob/43c045249402319170fa421438a1f05887269a26/compiler/basicTypes/Demand.hs working prototype of the first proposal] (progress tracked at [https://github.com/sgraf812/ghc/blob/and-or/compiler/basicTypes/Demand.hs `and-or` branch]) passes `./validate` and nofib.
     292
     293GHC is built with 0.1% more allocations and 7.1% (!) more executed instructions (cachegrind).
     294There may still be potential in optimizing the `DmdTree` (called `DmdEnv'` on this page) representation in its smart constructors (`bothDmdTree`, `lubDmdTree`) to avoid big trees.
     295
     296Running nofib revealed two benchmarks where allocation changed, `fft2` with -0.1% and `transform` with -0.0%.
     297Changes in counted instructions where all around +-0.0%, with the exception of `fft2` (-0.2%).
     298Seems like a net gain, but pretty much insignificant, especially compared to the significant increase in compilation time.
     299
     300=== Both/Or trees, pushed into Or
     301
     302The implementation of the second proposal can be found [https://github.com/sgraf812/ghc/blob/6bd6ab3c521de5dfa4042642c767b7906315ca28/compiler/basicTypes/Demand.hs here] (progress tracked at [https://github.com/sgraf812/ghc/blob/and-or-distr/compiler/basicTypes/Demand.hs `and-or-distr` branch].
     303This was a [https://github.com/sgraf812/ghc/commit/6bd6ab3c521de5dfa4042642c767b7906315ca28 minimal change] compared to the single graft point solution, with great repercussions:
     304
     305Compilation of modules with big records (looking at you, [https://github.com/haskell/cabal/blob/4726b2ada46a4f0757ec4fcaf5508c40faa98307/Cabal/Distribution/Types/BuildInfo.hs `Distribution.Types.BuildInfo`] eats up all resources on my machine without coming to an end.
     306Given the minimal invasive change, this can only be due to combinatorial explosion.
     307Some debugging pinned this down to derived `Eq` and `Show` instances, where the generated `DmdTree`s grow bigger than e.g. 10000 nodes.
     308I'm not entirely sure what causes the duplication that leads to the slowdown, so I tried falling back to the status quo for dictionary Ids (like we do for unlifted lets anyway) to no avail.
     309
     310=== DNF ===
     311
     312When trying to implement this, I stumbled on the problems wrt. `Termination` outlined above.
     313Also considering that the space complexity of this approach will be even less predictable, I've put the implementation of this is on ice for now.