Changes between Version 4 and Version 5 of StaticValues


Ignore:
Timestamp:
Sep 3, 2014 9:48:53 AM (5 years ago)
Author:
mboes
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • StaticValues

    v4 v5  
    1 = Design notes for static values
    2 
    3 These notes discuss the design of the language extension for static values as proposed in [#c1 [1]]. We start with a brief contextualization of the extension with respect to Cloud Haskell, then run through the contents of the user guide and finally discuss the implementation.
    4 
    5 The corresponding ticket, to track progress, is #7015.
    6 
    7 == Cloud Haskell
    8 
    9 The distributed-process package [#c2 [2]] implements a framework for distributed computing. This in general requires sending computations remotely, i.e. closures. The distributed-static package [#c3 [3]] provides the facilities to represent closures in a way that can be shared remotely, under certain limitations. These facilities include means to represent closures, as well as combinators to build new closure representations from existing ones.
    10 
    11 A closure is a code pointer paired with an environment. The -XStaticValues language extension offers first class support in the compiler for creating a portable representation of the pointer part of the closure. In order to retain compatibility with previous representations of code pointers used in distributed-static (which have the advantage of not requiring a compiler extension), work has been put into generalizing distributed-static [ [#c4 4], [#c5 5] ] in order to make the combinators defined there generic in the pointer representation.
    12 
    13 == `Static a` values
    14 
    15 As defined in distributed-static, a Closure is a `Static a` (a code pointer, morally), paired with an environment, in serialized form:
    16 {{{
    17 data Closure a = Closure (Static (ByteString -> a)) ByteString
    18 }}}
    19 To a first approximation, a value of type `Static a` is a “label” (one could also say “reference”, “pointer”, or “name”) for a value of type `a`. Labels can be serialized, but the value they refer to need not be. Thus, a value of type `Static a` can be communicated between processes on disparate machines and dereferenced in any of them.
    20 
    21 The dereferencing operation is called `unstatic` which, for the sake of this discussion, can be given the type:
    22 {{{
    23 unstatic :: Static a -> IO (Maybe a)
    24 }}}
    25 The function `unstatic` either succeeds in dereferencing its argument or it returns `Nothing`. The idea is that if `unstatic` fails to find the actual value referred to, of the right type, then it returns `Nothing`.
    26 
    27 The notion of “label” in distributed-static is rather ample: a label of type `Static a` can be the composition of two other labels. The main composition operator is:
    28 {{{
    29 staticApply :: Static (a -> b) -> Static a -> Static b
    30 }}}
    31 
    32 Therefore, `Static` is not in fact the datatype of labels, but given some label type `l` one can lift a label to a `Static` and compose the result:
    33 {{{
    34 data Static l a where
    35   StaticLabel :: l -> StaticLabel l a
    36   StaticApply :: Static l (a -> b) -> Static l a -> Static b
    37 
    38 staticApply = StaticApply
    39 }}}
    40 Note: `Static` is not the free applicative functor over labels, because only labels can be lifted into a `Static`, not arbitrary types. That is, `StaticLabel` is not the unit of a free applicative functor.
    41 
    42 == `Ref a` values
    43 
    44 With `-XStaticValues` enabled, any expression of the form `static e` where `e :: a` produces a type-tagged compiler-generated label of type `Ref a`. We have
    45 {{{
    46 newtype Ref a = Ref GlobalName
    47 }}}
    48 so we can define a lifting function for such labels:
    49 {{{
    50 staticLabel :: Ref a -> Static GlobalName a
    51 }}}
    52 
    53 Like `Static GlobalName a`, a reference of type `Ref a` points to a value of type `a`. It also can be serialized and dereferenced in other processes with a function `deRef :: Ref a -> IO (Maybe a)`. The main difference with a `Static a` value is that a `Ref a` is a type tagged label, whereas `Static a` is the type of label compositions.
    54 
    55 Other label types are not type tagged. One might ask, why lift values of type `Ref a` rather than `GlobalName` directly? The reason is that given an expression of the form `static e`, user code might want to know the inferred type of `e`, say in order to get a `TypeRep`.
    56 
    57 == Using `-XStaticValues` to produce `Static a` values
    58 
    59 With `-XStaticValues`, GHC can generate a `Ref a` for any closed expression `e` of type `a`. This is denoted as `static e`. With a closed expression meaning that the free variables of `e` are only identifiers of top-level bindings. All of the following definitions are permissible:
    60 {{{
    61 inc :: Int -> Int
    62 inc x = x + 1
    63 
    64 ref1 = static 1
    65 ref2 = static inc
    66 ref3 = static (inc 1)
    67 ref4 = static ((\x -> x + 1) (1 :: Int))
    68 ref5 y = static (let x = 1 in x)
    69 }}}
    70 While the following definitions are rejected:
    71 {{{
    72 ref6 = let x = 1 in static x      -- the body of static is not closed
    73 ref7 y = static (let x = 1 in y)  -- again the body is not closed
    74 }}}
    75 
    76 With this extension turned on, `static` is no longer a valid identifier.
    77 
    78 The information contained in the reference is used by `deRef` to locate the values at runtime using the symbol tables in executables, libraries and object files. For this to work, symbol tables need to be made available at runtime. A simple way to ensure this is to pass the `-rdynamic` flag to GHC during linking.
    79 
    80 Note that, because the symbol table does not contain type information about its entries, it is not possible to check that the value returned by `deRef` really is of the expected type. In an adversarial network, this would mean that some adversary could easily crash the local program by sending it a `x :: Static b` where a `Static a` is expected, disguising the `x` as a `Static a` by sending along the `TypeRep` for some `y :: Static a` along with `x` instead of the `TypeRep` for `x`. It should be noted, however, in the context of today’s Cloud Haskell, short of any authentication, there are many other ways for a remote program to abuse the local program anyways.
    81 
    82 A `Closure` can be obtained from a `Ref` as in the following example:
    83 {{{
    84 client server = do
    85   spawn server $ staticClosure $ staticLabel $ static (say “hello”)
    86 }}}
    87 where `staticClosure` is a combinator from the package distributed-static:
    88 {{{
    89 staticClosure :: Typeable a => Static a -> Closure a
    90 }}}
    91 Or we can internationalize the example:
    92 {{{
    93 sayI18N translate s = say (translate “hello”)
    94 
    95 client :: Static (String -> String) -> NodeId -> Process ProcessId
    96 client staticTranslate server = do
    97   spawn server $ staticClosure $
    98     staticLabel (static sayI18N) `staticApply` staticTranslate
    99 }}}
    100 
    101 === Static semantics of `Ref a` values
    102 
    103 Informally, if we have a closed expression
    104 {{{
    105 e :: forall a_1 ... a_n. t
    106 }}}
    107 the static form is of type
    108 {{{
    109 static e :: forall a_1 ... a_n. Ref t
    110 }}}
    111 
    112 The following definitions are valid:
    113 {{{
    114 ref8 = static id :: Ref (a -> a)
    115 ref9 = static show :: Ref (Int -> String)
    116 ref10 = static return :: Ref (a -> IO a)
    117 }}}
    118 
    119 Currently, the type of the body of the `static` form is constrained to have an unqualified rank-1 type. The following are therefore illegal:
    120 {{{
    121 static show                    -- show has a constraint (Show a)
    122 static Control.Monad.ST.runST  -- runST has a higher-ranked type
    123 }}}
    124 
    125 That being said, with the appropriate use of wrapper data types, the
    126 above limitations induce no loss of generality:
    127 {{{
    128 {-# LANGUAGE ConstraintKinds #-}
    129 {-# LANGUAGE ExistentialQuantification #-}
    130 {-# LANGUAGE Rank2Types #-}
    131 {-# LANGUAGE StaticValues #-}
    132 
    133 import Control.Monad.ST
    134 import GHC.Ref
    135 
    136 data Dict c = c => Dict
    137 
    138 g1 :: Ref (Dict (Show a) -> a -> String)
    139 g1 = static (\Dict -> show)
    140 
    141 data Rank2Wrapper f = R2W (forall s. f s)
    142 newtype Flip f a s = Flip { unFlip :: f s a }
    143 
    144 g2 :: Ref (Rank2Wrapper (Flip ST a) -> a)
    145 g2 = static (\(R2W f) -> runST (unFlip f))
    146 }}}
    147 It is proposed that the `Dict` wrapper in particular, for reified dictionaries, is reexported by the distributed-static package.
    148 
    149 == Static values through the compiler phases
    150 
    151 The renamer checks that the free variables appearing in the body of the `static` forms are always identifiers of top-level bindings. This holds for both values and types.
    152 
    153 The type-checker treats the static form mostly as if `static` were a function:
    154 {{{
    155 static :: a -> Ref a
    156 }}}
    157 At the very end of the type-checking phase of a module, the types of the bodies of all found `static` forms are examined to determine if they are qualified, and if so they are rejected. This needs to be done at the end of type-checking because the monomorphism restriction may affect the inferred type.
    158 
    159 The desugarer replaces the `static` form with a `ref :: Ref a` value pointing to the body of it. When the body is a single identifier, the value `ref` points to it. When not, the body is floated to a freshly generated top-level definition and the `ref` value points to it instead.
    160 
    161 An expression like `static return :: a -> IO a` may appear to have a single identifier in the body of `static`, but in the desugarer, it is really `return` applied to the `Monad IO` dictionary. Therefore, it will be floated.
    162 
    163 Currently, floating of the bodies of `static` forms is implemented in the type-checker. It has been pointed out that it would be desirable to do it in the desugarer. Another task the type-checker is doing is making sure that referenced identifiers do appear in symbol tables in the produced object files via calls to the function `keepAlive :: Name -> TcM ()`. Probably, this is something that could be done in the desugarer as well.
    164 
    165 == In GHCi
    166 
    167 The `static` forms can be created in GHCi, but floating of expressions is not implemented there. This means that `deRef` has a chance to work in GHCi only when the body of the `static` form is a single identifier which does not depend on using any type class dictionaries. Any other kind of expression will require producing a new top-level binding and this is not done yet.
    168 
    169 == The `Ref` datatype
    170 
    171 The `Ref` datatype is implemented in the module base:GHC.Ref.
    172 {{{
    173 -- | A reference to a top-level value of type 'a'.
    174 data Ref a = Ref { unRef :: GlobalName }
    175   deriving (Read, Show, Typeable)
    176 
    177 -- | Global names identifying top-level values
    178 --
    179 -- > GlobalName package_id installed_package_id module_name value_name
    180 --
    181 data GlobalName = GlobalName String String String String
    182   deriving (Read, Show, Typeable)
    183 }}}
    184 
    185 A GlobalName holds the information about a top-level value that the desugarer fills in when replacing a `static` form. It augments the information provided by `Language.Haskell.TH.Syntax.Name` with the information in the `installed_package_id` field. This field is [[http://hackage.haskell.org/package/Cabal-1.20.0.2/docs/Distribution-Package.html#t:InstalledPackageId|Cabal:Distribution.Package.InstalledPackageId]] and it would be needed to identify the package when multiple variations of it are installed.
    186 
    187 The installed package id could be useful to locate the symbol if the dynamic dependencies of a program were not known when linking. However, the field is an empty string for all `static` forms requiring floating or pointing to identifiers in the current package. So its usefulness is very limited and we are considering to remove it.
    188 
    189 == References
    190 
    191 [=#c1 [1]] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards Haskell in the cloud. SIGPLAN Not., 46(12):118–129, September 2011. ISSN 0362-1340. [[http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf|pdf]]
    192 
    193 [=#c2 [2]] https://hackage.haskell.org/package/distributed-process
    194 
    195 [=#c3 [3]] https://hackage.haskell.org/package/distributed-static
    196 
    197 [=#c4 [4]] https://github.com/tweag/distributed-static/commits/globalnames
    198 
    199 [=#c5 [5]] https://github.com/tweag/distributed-process/commits/generic-static4
     1See StaticPointers.