| 191 | |
| 192 | == `Termination` == |
| 193 | |
| 194 | The previous ideas greatly simplify by ignoring `Termation`/`DmdResult`s. |
| 195 | In fact, the bugs encountered while implementing the first proposal were all due to complex interplay with `Termination`. |
| 196 | |
| 197 | Consider the two `DmdType`s `{}` (e.g. empty, terminating) and `{y-><S,1*U>}x` (uses `y` exactly once and throws, like in `error y`). |
| 198 | When these are `lub`ed, we want the `DmdType` `{y-><L,1*U>}` (uses `y` at most once, possibly terminating) as a result. |
| 199 | |
| 200 | The status quo is to compute the new `DmdEnv` by `plusVarEnv_CD lubDmd env1 (defaultDmd res1) env2 (defaultDmd res2)`. |
| 201 | |
| 202 | - Where *either* the left or right `env*` is defined, we compute a new entry by `lub`ing, using `defaultDmd res*` (where `res*` is the `DmdResult` of the corresponding `DmdType`) when there is no entry in one of the `env*`s. |
| 203 | - When there is no entry in either `env*`, then there won't be in the result! |
| 204 | |
| 205 | Since this approach gets its precision from delaying `lub`s as long as possible, we have to store `Termination` information in the `DmdEnv'`/`DmdEnv''` data structure. |
| 206 | At the least, we have to tag the children of `Or` nodes, for the first two approaches (`DmdEnv'`) we also have to tag `Both` nodes, e.g. |
| 207 | |
| 208 | {{{ |
| 209 | data DmdEnv' |
| 210 | = Empty |
| 211 | | Var Id Demand |
| 212 | | Or DmdEnv' (Termination ()) DmdEnv' (Termination ()) |
| 213 | | Both DmdEnv' (Termination ()) DmdEnv' (Termination ()) |
| 214 | |
| 215 | newtype DmdEnv'' |
| 216 | = DNF [(DmdEnv, Termination ())] -- Not quite, see below |
| 217 | }}} |
| 218 | |
| 219 | Variables not mentioned in either `lub` operand are assumed to be used according to `defaultDmd res`, where `res` is the `DmdResult` currently part of the containing `DmdType`. |
| 220 | Note that due to `ExnStr`/`catch#`, the `defaultDmd res` can change after computing the actual `lub` above, so in general `defaultDmd res*` and `defaultDmd res` can vary independently! |
| 221 | |
| 222 | That's also why [`findIdDemand`](https://github.com/sgraf812/ghc/blob/922db3dac896b8cf364c9ebaebf1a27c2468c709/compiler/basicTypes/Demand.hs#L1577) in `Demand.hs` takes a `DmdResult` as argument. |
| 223 | |
| 224 | And that's also why the above definition for `DmdEnv''` is incorrect! |
| 225 | It assumes that a single `Termination` result per `lub` operand is enough, which isn't the case, as the following continued example shows: |
| 226 | |
| 227 | We started with `lubDmdType {} {y-><S,1*U>}x = {y-><L,1*U>}` (in terms of the status quo). |
| 228 | Now, `lub` that again with `{z-><S,1*U>}` to get `{y-><L,1*U>,z-><L,1*U>}`. |
| 229 | |
| 230 | So far so good. |
| 231 | The DNF would look something like `[({},Dunno ()), ({y-><S,1*U>},ThrowsExn), ({z-><S,1*U>},Dunno ())]`, which `flattenDmdEnv''`s to pretty much the same `DmdEnv` as in the `DmdType` above. |
| 232 | |
| 233 | Now consider what happens when `Termination` changed in-between the two `lub`s, e.g. because the first result of `lubDmdType` was an argument to `error`: |
| 234 | {{{ |
| 235 | lubDmdType (markThrowsExn (lubDmdType {} {y-><S,1*U>}x)) {z-><S,1*U>} |
| 236 | = lubDmdType (markThrowsExn {y-><L,1*U>} ) {z-><S,1*U>} |
| 237 | = lubDmdType {y-><L,1*U>}x {z-><S,1*U>} |
| 238 | = {y-><L,1*U>,z-><S,1*U>} |
| 239 | }}} |
| 240 | |
| 241 | Note that, different to before, `z` is now evaluated strictly! |
| 242 | In our current DNF formulation, we wouldn't be able to express the difference. |
| 243 | |
| 244 | The problem here is that `lub` is no longer associative, because it depends on the `Termination` at the time of the call. |
| 245 | So, we can't actually model `DmdEnv''` as a DNF, but have to resort to |
| 246 | |
| 247 | {{{ |
| 248 | data DmdEnv'' |
| 249 | = Lit DmdEnv |
| 250 | | Or DmdEnv'' (Termination ()) DmdEnv'' (Termination ()) |
| 251 | }}} |
| 252 | |
| 253 | Which is only marginally less complex than a version of `DmdEnv'` where we subsume the `Empty` and `Var` cases with an equivalent `Lit` case: |
| 254 | |
| 255 | {{{ |
| 256 | data DmdEnv' |
| 257 | = Lit DmdEnv |
| 258 | | Or DmdEnv' (Termination ()) DmdEnv' (Termination ()) |
| 259 | | Both DmdEnv' (Termination ()) DmdEnv' (Termination ()) |
| 260 | |
| 261 | dnf :: DmdEnv' -> DmdEnv'' |
| 262 | dnf (DmdEnv'.Lit env) = DmdEnv''.Lit env -- hypothetical disambiguating syntax |
| 263 | dnf (DmdEnv'.Or fv1 t1 fv2 t2) = DmdEnv''.Or (dnf fv1) t1 (dnf fv2) t2 |
| 264 | dnf (DmdEnv'.Both fv1 t1 fv2 t2) = distr (dnf fv1) t1 (dnf fv2) t2 -- case in point |
| 265 | where |
| 266 | -- both env1 with defaultDmd t1 into every Lit of fv2 |
| 267 | distr (DmdEnv''.Lit env1) t1 fv2 t2 = undefined |
| 268 | -- both env2 with defaultDmd t2 into every Lit of fv1 |
| 269 | distr fv1 t1 (DmdEnv''.Lit env2) t2 = undefined |
| 270 | -- Not exactly sure how to compute this. |
| 271 | -- This case leads to the exponential explosion, |
| 272 | -- as we have to combine every Lit in fv1 with every Lit in fv2. |
| 273 | -- This gets insane as we have to keep in mind which `Termination` to use for each point |
| 274 | -- in each `Lit` separately. |
| 275 | distr DmdEnv''.Or{} t1 DmdEnv''.Or{} t2 = undefined |
| 276 | }}} |
| 277 | |
| 278 | `dnf` mirrors how `bothDmdEnv'` would have to change in the current implementation to get to the DNF-like version. |
| 279 | |
| 280 | In fact, the currently working prototype of the first proposal had a complicated bug in the [https://github.com/sgraf812/ghc/blob/43c045249402319170fa421438a1f05887269a26/compiler/basicTypes/Demand.hs#L1287 `lookupDmdTree`] implementation, which forced me to think about how the lookup of a single point can potentially access *all* `Termination ()` tags on the path to a `Lit` it is contained in. |
| 281 | This also applies to the implementation of `dnf`: I'm pretty sure the step that distributes `Both` over both sub trees needs to model all points separately to know with which `Termination` to `both`. |
| 282 | Sorry if this is a little incomprehensible without an example, but this is already pretty deep down the rabbit hole. |
| 283 | |
| 284 | The bottom-line is that there still is some DNF-like structure possible, but in addition to the incurred exponential space complexity, there's also the issue of how to distribute `Both` nodes into leafs of the Or-tree without losing sanity to the implementation. |
| 285 | |
| 286 | |
| 287 | == Evaluation == |
| 288 | |
| 289 | === Both/Or trees, grafted at MCRAs |
| 290 | |
| 291 | The current [https://github.com/sgraf812/ghc/blob/43c045249402319170fa421438a1f05887269a26/compiler/basicTypes/Demand.hs working prototype of the first proposal] (progress tracked at [https://github.com/sgraf812/ghc/blob/and-or/compiler/basicTypes/Demand.hs `and-or` branch]) passes `./validate` and nofib. |
| 292 | |
| 293 | GHC is built with 0.1% more allocations and 7.1% (!) more executed instructions (cachegrind). |
| 294 | There may still be potential in optimizing the `DmdTree` (called `DmdEnv'` on this page) representation in its smart constructors (`bothDmdTree`, `lubDmdTree`) to avoid big trees. |
| 295 | |
| 296 | Running nofib revealed two benchmarks where allocation changed, `fft2` with -0.1% and `transform` with -0.0%. |
| 297 | Changes in counted instructions where all around +-0.0%, with the exception of `fft2` (-0.2%). |
| 298 | Seems like a net gain, but pretty much insignificant, especially compared to the significant increase in compilation time. |
| 299 | |
| 300 | === Both/Or trees, pushed into Or |
| 301 | |
| 302 | The implementation of the second proposal can be found [https://github.com/sgraf812/ghc/blob/6bd6ab3c521de5dfa4042642c767b7906315ca28/compiler/basicTypes/Demand.hs here] (progress tracked at [https://github.com/sgraf812/ghc/blob/and-or-distr/compiler/basicTypes/Demand.hs `and-or-distr` branch]. |
| 303 | This was a [https://github.com/sgraf812/ghc/commit/6bd6ab3c521de5dfa4042642c767b7906315ca28 minimal change] compared to the single graft point solution, with great repercussions: |
| 304 | |
| 305 | Compilation of modules with big records (looking at you, [https://github.com/haskell/cabal/blob/4726b2ada46a4f0757ec4fcaf5508c40faa98307/Cabal/Distribution/Types/BuildInfo.hs `Distribution.Types.BuildInfo`] eats up all resources on my machine without coming to an end. |
| 306 | Given the minimal invasive change, this can only be due to combinatorial explosion. |
| 307 | Some debugging pinned this down to derived `Eq` and `Show` instances, where the generated `DmdTree`s grow bigger than e.g. 10000 nodes. |
| 308 | I'm not entirely sure what causes the duplication that leads to the slowdown, so I tried falling back to the status quo for dictionary Ids (like we do for unlifted lets anyway) to no avail. |
| 309 | |
| 310 | === DNF === |
| 311 | |
| 312 | When trying to implement this, I stumbled on the problems wrt. `Termination` outlined above. |
| 313 | Also considering that the space complexity of this approach will be even less predictable, I've put the implementation of this is on ice for now. |