Changes between Version 2 and Version 3 of SourceNotes

Oct 23, 2014 3:14:59 PM (5 years ago)

More fleshed out & updated


  • SourceNotes

    v2 v3  
    1111Each step is lossy in its own way, but the idea is to make the most of the incidental data we have.
    13 == Implementation ==
     13= Implementation =
    1515We implement this as follows:
    2323- Once all optimisations have run, we extract the source notes from the Cmm blocks, (re)constructing the tick scope tree in the process. This will give us a map of back-end blocks to source notes, which makes it straight-forward to generate, for example, DWARF information.
     25Detailed design notes in the following sections.
     27== Desugaring ==
     29 - SourceNotes get introduced just like other tickishs in the desugaring
     30  phase. Note we generate lots of them (maximum granularity), which
     31  will however naturally reduce later on as a side-effect of optimisations.
     33- We also generalise the whole pass so it can generate more than one
     34  tick type at the same time. This allows e.g. cost-centre profiling
     35  side-by-side with source notes.
     37== CoreToCore ==
     39- We rework the tick framework a bit, breaking everything down into three tick properties:
     40   1. Scoping: Does the tick encode a property of the whole expression
     41    subtree? Then we need to take care when changing code, such as
     42    re-annotating code when moved out.
     43   2. Counting: Do we care about entry counts? This means that we should
     44    take care when duplicating the ticked expression, as well as
     45    preventing a number of optimisations from happening.
     46   3. Placement: Are there certain expression types that we can look
     47    through? For example, cost-centres do not have any effect when
     48    placed on variables, so they can often be moved or removed. This
     49    decides where in the expression mkTick is going to "place" the
     50    tick.
     52- Source notes are relatively lax in this context - we care about scoping, but allow ticks to move upwards when optimisations require it (see above, this is the same as "merging" annotations). Furthermore, we do not care about entry counts.
     54- We can especially allow `let` floating. Now this is a remarkably tricky issue, but in the end we can show that "keep ticks on covered code" makes sense. Details:
     55   - With some heavy lifting we can show that causally speaking, we do not need to re-annotate the body while floating, and just need to update the annotation on the allocation cost.
     56   - This means that if we are willing to ignore the allocation cost, we could float completely without restrictions.
     57   - However, given that we are unlikely to be able to reconstruct the full call stack behind thunks, we end up annotating the floated-through ticks on the `let` binding anyway. As the semantics assume full control flow tracking, this was simply redundant there.
     59- For placement, we choose to float ticks through lambdas. This has mostly practical reasons: It frees us from having to change `collectBinders`. The only expense we have is that we are moving an allocation - but as argued above, we are already moving these all the time.
     61- If at some point we decide that we seriously want to track allocation cost, we would probably have to introduce a special tick field in `Let` and lambdas. Some details in the thesis.
     63- We strip out source notes from iface code when not enabled.
     65== CoreToStg ==
     67- Note that due to source notes having non-lambda placement,
     68  they will not violate the CorePrep invariant of lambdas only
     69  appearing as bindings.
     71- We replace the StgTick / StgSCC constructors with a unified StgTick
     72  that can carry any sort of tickish. Note this changes how ticks
     73  are pretty-printed (GHCJS beware).
     75- Detecting selector thunks becomes a bit more involved, as we can run
     76  into ticks at multiple points.
     78- Note that we introduce the extra CorePrep invariant that we never want to have ticks appear inside non-type applications. This makes sense, but makes eta reduction a bit tricky (it might have to pull ticks out of type applications). See the code comment.
     80== Cmm ==
     82- We want our annotations to survive through this stage, which we achieve using a
     83  new CmmTick pseudo-instruction
     85- We also generate such nodes for external Cmm code - it's cheap and in principle allows Cmm debugging/profiling.
     87- However, Cmm's flatness means that if we just put CmmTicks into
     88  blocks we have no idea what exactly they are meant to cover. We
     89  introduce nested scopes, represented as lists of uniques. The
     90  "nesting" relation is given by the subset relation. For example a
     91  tick declared in a block with, say, scope `[b,a]` now scopes over all
     92  blocks that have at least a tick scope of `[b,a]`, so for example also
     93  `[c,b,a]`.
     95- This makes it easy to express most optimisations: It is both easy to
     96  generate new blocks that share all ticks with existing blocks. It is
     97  especially possible to merge blocks to have combined contexts, simply
     98  by merging the scope lists.
     99  - So for example, if we want to merge two "common" blocks with scopes `[b,a]` and `[c,a]`, we would get scope `[c,b,a]`, which is still covered by all ticks from `[b,a]` and `[c,a]` respectively.
     100  - Note that this means that the scopes will not form a tree, but a (directed) graph. We will reduce that to a tree again later.
     101  - Also there's actually a problem here: If the "commoned" block with scope `[c,b,a]` contains source notes, they would stop scoping over /other/ blocks in `[b,a]` and `[c,a]`. I haven't been able to come up with an example where this happens (for CBE such blocks generally get jumped to, and therefore need to get merged first). To solve this, we would probably need to give `CmmTick` its own scope field that's independent of the block's scope.
     103- Given that the code often passes Cmm around without a `CmmEntry`, we have to
     104  make sure that its intended tick scope does not get lost. To keep the amount
     105  of boilerplate to a minimum we define a `CmmAGraphScoped` type synonym
     106  here that just bundles the scope with a portion of Cmm to be assembled
     107  later.
     109- We want to open a new scope every time we start a new block that we might have fresh ticks for. As it happens, this matches up well with where code generation uses `getCode`, so we have it automatically create a new scope. After all, having too many tick scopes won't hurt, and if we find that tick scopes end up being too coarse-grained, we can always add extra ones later.
     111== Unwinding ==
     113This is a side topic - unwinding isn't required for source note tracking, but allows DWARF-aware debuggers to unroll the Haskell stack in order to discover more code pointers. We cover it here anyway because we will pack all this information together under the "debug" umbrella term in a minute.
     115- We declare yet another new Cmm pseudo-instruction, `CmmUnwind`.
     117- Even though we only use it for `Sp` so far, we allow CmmUnwind to specify
     118  unwind information for any register. This is pretty cheap and could
     119  come in useful in future.
     121- We allow full `CmmExpr` expressions for specifying unwind values. The
     122  advantage here is that we don't have to make up new syntax, and can e.g.
     123  use the WDS macro directly. On the other hand, the back-end will now
     124  have to simplify the expression until it can sensibly be converted
     125  into DWARF byte code - a process which might fail, yielding NCG panics.
     126  On the other hand, when you're writing Cmm by hand you really ought to
     127  know what you're doing.
     129== Extraction ==
     131In the back-end we want to separate debug information from the
     132Cmm. We therefore collect all information gathered so far into a
     133tree of debug blocks (tree because of tick scopes).
     135- As tick scopes can form a directed graph, this means that we are removing some edges at this point. We replicate source notes suitably to ensure they still cover the appropriate blocks.
     137- This is also where we decide what source location we regard as
     138  representing a code block the "best". The heuristic is basically that
     139  we want the most specific source reference that comes from the same file
     140  we are currently compiling. This seems to be the most useful choice in
     141  my experience.
     143- We are careful to not be too lazy so we don't end up breaking streaming.
     144  Debug data will be kept alive until the end of codegen, after all.
     146- We change native assembler dumps to happen right away for every Cmm group.
     147  This simplifies the code somewhat and is consistent with how pretty much
     148  all of GHC handles dumps with  respect to streamed code.