Changes between Version 16 and Version 17 of HIEFiles


Ignore:
Timestamp:
Aug 13, 2018 12:34:43 PM (13 months ago)
Author:
wz1000
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HIEFiles

    v16 v17  
    1111
    1212== File Contents
    13 * The data structure should be a simplified, source aware, annotated AST derived from the Renamed/Typechecked Source
     13* The data structure is a simplified, source aware, annotated AST derived from the Renamed/Typechecked Source
    1414* We traverse the Renamed and Typechecked AST to collect the following info about each SrcSpan
    15   * Its type, if it corresponds to a binding, pattern or expression
    16   * Details about any tokens in the original source corresponding to this span(keywords, symbols, etc.) 
     15  * Its assigned type(s)(In increasing order of generality), if it corresponds to a binding, pattern or expression
     16    * The `id` in `id 'a'` is assigned types [Char -> Char, forall a. a -> a]
    1717  * The set of Constructor/Type pairs that correspond to this span in the GHC AST
    1818  * Details about all the identifiers that occur at this SrcSpan
     
    2727  7. Type variable binding, along with its scope(which takes into account ScopedTypeVariables)
    2828* It should be possible to exactly recover the source from the .hie file. This will probably be achieved by including the source verbatim in the .hie file, as recovering the source exactly from the AST might be tricky and duplicate the work on ghc-exactprint.
    29 * The actual representation on disk as well as serialisation/de-serialisation could be done through CBOR, using the package [https://hackage.haskell.org/package/serialise-0.2.0.0 serialise].
    3029* The first line of the .hie file should be a human readable string containing information about the version of the format, the filename of the original file, and the version of GHC the file was compiled with. Example: (v1.0,GHC8.4.6,Foo.hs)
    3130* The format should be fairly stable across ghc versions, so we need to avoid capturing too much information. More detailed information about the exact haskell syntactic structure a part of the tree represents could be obtained by inspecting the tokens/keywords in that part.
    3231
    3332The RichToken type used in haddock: https://github.com/haskell/haddock/blob/master/haddock-api/src/Haddock/Backends/Hyperlinker/Types.hs#L35
     33
     34== Efficient serialization of highly redundant type info
     35
     36The type information in .hie files is highly repetitive and redundant. For example, consider the expression
     37
     38{{{
     39const True 'a'
     40}}}
     41
     42The type of the overall expression is `Boolean`, the type of `const True` is `Char -> Boolean` and the type of `const` is `Boolean -> Char -> Boolean`
     43
     44All 3 of these types will be stored in the .hie file
     45
     46To solve the problem of duplication, we introduce a new data type that is a flattened version of `Type`
     47
     48{{{
     49data HieType a = HAppTy a a  -- data Type = AppTy Type Type
     50               | HFunTy a a  --           | FunTy Type Type
     51               | ...
     52}}}
     53
     54`HieType` represents one layer of `Type`.
     55
     56All the types in the final AST are stored in a `Array Int (HieType Int)`, where the `Int`s in the `HieType` are references to other elements of the array. Types recovered from GHC are deduplicated and stored in this compressed form with sharing of subtrees.
     57
     58`Fix HieType` is roughly isomorphic to the original GHC `Type`
    3459
    3560== Scope information about symbols
     
    122147
    123148[https://docs.google.com/document/d/1QP4tV-oSJd3X90JKVY4D__Dfr-ypVB57p1yDqyk2aQ8/edit?usp=sharing Original GSOC Proposal]
    124 
    125 Why CBOR over binary/cereal? http://code.haskell.org/~duncan/binary-experiment/binary.pdf