Distributed closures
This page desribes a possible design for distributed-closure
, a library that implements
serialisable closures, building on top of Typeable and StaticPointers.
See the root page DistributedHaskell.
The distributed-process package implements a framework for distributed
programming à la Erlang. Support for static closures is implemented in a separate package called
distributed-static package. We propose to patch this library in the
following way, and rename it to distributed-closure
. Ultimately, distributed-closure should be the one-stop shop for all distributed frameworks that wish to allow users to program with static closures.
Proposed design
distributed-closure
will define the following datatype:
data Closure a
Closure
is the type of static closures. Morally, it contains some
pointer to a static expression, paired with an environment of only
serializable values.
Why do we need Closure
? Closure
is strictly more expressive than
StaticPtr
. StaticPtr
can only be constructed from closed expressions
(no free variables). Closure
is built on top of StaticPtr
. It allows
encoding serializable expressions. That is, expressions formed of
only top-level identifiers, literals, and serializable free variables.
for example, using Closure
, one can write:
f :: Int -> Int -> ... f x y = ... closure (static (+)) `closureAp` closurePure x `closureAp` closurePure y ...
We introduce the following library functions on Closure
. This is the full API of distributed-closure
:
data Closure a closure :: StaticPtr a -> Closure a unclosure :: Closure a -> a closurePure :: Serializable a => a -> Closure a closureAp :: Closure (a -> b) -> Closure a -> Closure b
The signature of closure
mentions Serializable
, which is a class
defined as follows:
data Dict c = c => Dict class (Binary a, Typeable a) => Serializable a where serializableDict :: Closure (Dict (Serializable a))
NOTE: this definition of Serializable
is different from the current one as found in distributed-process
, and also different from the one DistributedHaskell#Serialization.
In words, a serializable value is a value for which we have
a Binary
instance and a Typeable
instance, but moreover for which
we can obtain a static proof of serializability, in form of a dictionary.
One could make do with an empty class definition for Serializable
, as is currently the case, but without this augmented definition, lifting arbitrary serializable values to Closure
s becomes much less convenient (see closurePure
below).
One might argue that Serializable
is a tad more complex than it could be. The central issue is that the -XStaticPointers
extension alone does not offer any means to reify a constraint to static evidence in the form of a dictionary - a feature that is in general very useful indeed (see below). In principle this could be done, since dictionaries are morally either static, or simple combinations of static dictionaries, but it would require a further extension to the compiler, possibly to the type system itself. The above definition of Serializable
comes at the economy of such a compiler extension.
Below are some example instances:
instance Serializable Int where serializableDict = closure $ static Dict instance Serializable (Maybe Int) where serializableDict = closure $ static Dict instance Serializable b => Serializable (Either String b) where serializableDict = closure (static \Dict -> Dict) `closureApply` serializableDict data Foo = Foo deriving (Generic, Typeable) instance Binary Foo instance Serializable Foo where serializableDict = static Dict -- Datatype of state transitions, where the state s need not be serializable. data Command s = Transition (Closure (s -> s)) | Stop instance Binary Command instance (Typeable s) => Serializable (Command s) where -- NOTE: Requires the TTypeRep proposal. serializableDict = closure (static (`withTypeable` Dict)) `closureApply` closurePure tTypeRep
Notice the pay-as-you-go nature of the instances. Only in instances with polymorphic heads do you need to compose dictionaries by hand. Most users do not have type parameterized data types that they want to serialize. Those users never need to write more than simple one liner instances.
Implementation
Implementation in GHC
TODO See old proposal and blog post by Simon PJ.
Implementation of distributed-closure
The definition of Closure a
is as follows:
data Closure a where StaticPtr :: StaticPtr b -> Closure b Encoded :: ByteString -> Closure ByteString Ap :: Closure (b -> c) -> Closure b -> Closure c
This definition permits an efficient implementation: there is no need
to reserialize the environment everytime one composes two Closures
s.
The definition in the Cloud Haskell paper is as follows:
data Closure' a where Closure' :: StaticPtr (ByteString -> b) -> ByteString -> Closure b
Note that the Closure'
constructor can be simulated:
Closure cf env <=> Ap (StaticPtr cf) (Encoded env)
One can even add the following constructor for better efficiency:
data Closure a where ... Closure :: Closure a -> a -> Closure a
Any StaticPtr
can be lifted to a Closure
, and so can any
serializable value:
closure :: StaticPtr a -> Closure a closure x = StaticPtr closurePure :: Serializable a => a -> Closure a closurePure x = StaticPtr (static decodeD) `closureAp` serializableDict `closureAp` Encoded (encode x) where decodeD :: Dict (Serializable a) -> ByteString -> a decodeD Dict = decode
Side note:
What if we didn't have serializableDict
? What if we reverted to the current definition of Serializable
? Then it would be impossible to write closurePure
with the above signature. We would need to pass in explicitly some static serialization dictionary. This is quite a bother, since it forces us to introduce spurious top-level bindings for dictionaries, e.g.:
closurePure' :: Static (Dict (Serializable a)) -> a -> Closure a closurePure' sdict x = StaticPtr sdict `closureAp` Encoded (encode x) sdictInt :: Static (Dict (Serializable a)) sdictInt = static Dict thirtyClosure = plusClosure `closureAp` closurePure sdictInt 10 `closureAp` closurePure sdictInt 20
End side note.
Given any two Closure
s with compatible types, they can be combined
using closureAp
:
closureAp :: Closure (a -> b) -> Closure a -> Closure b closureAp = Ap
Notice how Closure
is nearly the free applicative functor, though not completely free, because we impose Serializable a
in closurePure
. It is, however, a semigroupoid.
Closure serialization is straightforward, but closure deserialization
is tricky. See
this blog post section from Simon PJ as to why. The issue is that
when deserializing from a bytestring to target type Closure b
, one
needs to ensure that the target type matches the type of the closure
before it was serialized, lest bad things happen. We need to impose
that Typeable b
when deserializing to Closure b
, but that doesn't
help us for all closures. Consider in particular the type of Ap
:
Ap :: Closure (b -> c) -> Closure b -> Closure c
Notice that the type b
is not mentioned in the return type of the
constructor. We need to know Typeable (b -> c)
and Typeable b
in
order to recursively deserialize the subclosures, but we can't infer
either from the context Typeable c
.
There are solutions to this problem.
Alternative 1:
The trick is to realize that in fact the type b
does not matter: it could be arbitrary. After all, that's why it appears existentially quantified in the type of the Ap
constructor. Type safety guarantees that the Ap
constructor always combines two Closure
s of compatible type. In other words, the type of the first argument of Ap
could as well be taken to be Closure (Any -> c)
, because any lifted type can be coerced to/from Any
, so that any value of type b
can reasonably also be ascribed type Any
. Since we don't care about the type b
, might as well take it to be Any
. If one trusts closure serialization, and indeed closure serialization must be part of the TCB (see DistributedHaskell#Thetrustedcodebase), then when deserializing a closure, we can reconstruct an Ap
node, taking b ~ Any
, or equivalently, always deserializing at the following type:
Ap :: Closure (Any -> c) -> Closure Any -> Closure c
Note: Any
must have a Typeable
instance. This is the case in GHC 7.8, but in GHC >= 7.9, Any
is now a type family with no instance, hence cannot be given a Typeable
instance (see tickets #9429). Deserialization can go something along the following lines (beware, highly idealized code):
decodeClosure :: forall a. Typeable a => ByteString -> (ByteString, Closure a) decodeClosure (B.uncons -> '0':bs) = (StaticPtr <$> decodeStaticPtr) bs decodeClosure (B.uncons -> '1':bs) = (Encoded <$> decodeByteString) bs decodeClosure (B.uncons -> '2':bs) = (do cf <- decodeClosure :: Any -> a cx <- decodeClosure :: Any return $ ClosureAp cf cx ) bs
Now it may often happen that decodeStaticPtr
will be called against type StaticPtr Any
, or StaticPtr (Any -> c)
. decodeStatic
internally compares TypeRep
s before producing a result: it compares the TypeRep
of the target type against the type found in the SPT. With the above definition of Ap
, the TypeRep
s in general will not match exactly: we will be comparing the Typerep
of Any -> c
to that of b -> c
for some type b
. For this to work, we need to carry out TypeRep
matching modulo Any
. More precisely, we require the following laws (for some Typeable b
):
isJust (cast :: Any -> Maybe b) == True isJust (cast :: b -> Maybe Any) == True
Insofar as Any
is an internal compiler type not normally accessible to the user, the above should not compromise type safety.
Alternative 2:
-- Like dynApply, but for things of type Closure a. dynClosureApply :: Dynamic -> Dynamic -> Dynamic dynClosureApply x y = error "TODO" decodeClosure :: Typeable a => ByteString -> (ByteString, Closure a) decodeClosure = fromDyn <$> dyn where dyn :: ByteString -> (ByteString, Dynamic) dyn ('0':bs) = (do DS (sptr :: StaticPtr a) <- decodeStatic bs return $ toDyn $ StaticPtr sptr) bs dyn ('1':bs) = (toDyn . Encoded <$> decodeByteString) bs dyn ('2':bs) = (dynClosureApply <$> dyn <*> dyn) bs
Assuming this proposal for Typeable, it should be possible to implement dynClosureApply
in a type safe way, without using unsafeCoerce
internally. Otherwise dynClosureApply
will have to be part of the TCB.
End of alternatives.
All that remains is to implement unclosure
:
unstatic :: StaticPtr a -> a unclosure :: Closure a -> a unclosure (StaticPtr sptr) = unstatic sptr unclosure (Encoded x) = x unclosure (Ap cf cx) = (unstatic cf) (unstatic cx) unclosure (Closure cx x) = x
Tradeoffs:
Alternative 1:
- requires a
Typeable
instance forAny
. - requires type casting modulo
Any
. - Potentially faster: dynamic type checks done only once per
StaticPtr
indecodeClosure
. - Makes
decodeClosure
part of the TCB.
Alternative 2:
- Can be be implemented outside the TCB, but requires the Typeable proposal to do so.
- Potentially slower: dynamic type checks at every
Ap
node when doingunDynClosure
.
About performance
In the case of monomorphic functions, there need be no TypeRep
sent over the wire in the scheme presented above, yet we still preserve type safety. TypeRep
s bloat the messages sent on the wire, so their absence can only be a good thing for performance. That said, we anticipate that even the dynamic type checks performed by decodeStatic
can be detrimental for performance. For this reason, there ought to be a
unsafeDecodeStaticPtr :: ByteString -> (ByteString, StaticPtr a) unsafeDecodeClosure :: ByteString -> (ByteString, Closure a) unsafeDecodeClosure = ... unsafeDecodeStaticPtr ...
that perform no dynamic type checks. This of course may come at the cost of some type safety, but only if the user somehow writes something equivalent to decodeClosure (encodeClosure (x :: Closure a)) :: Closure b
, or if the remote side crafts devious applications of two closures of incompatible type. In high performance settings, one can often make the simplifying assumption that peers can be trusted, so this should not be a problem in practice.
Conclusion
It appears possible to implement a language extension first proposed
in the original Cloud Haskell paper in a way that supports polymorphic
types - a feature that was not considered in the paper. Furthermore,
the proposal in the original Cloud Haskell paper compromised type
safety since it allowed deserializing Closure
s at arbitrary type,
while this proposal adds extra safety yet still making it possible to
use a backdoor for performance.
What's the trusted code base (TCB) that you need to trust in order to
guarantee type safety? GHC of course, but not only. This language
extension adds one new primitive function to GHC. But one needs to
also trust decodeStaticPtr
, and that's it. Note that if decodeStaticPtr
is implemented in terms of Binary
, then Ideally
one would only have to trust GHC and its standard libraries, and have
dynamic-closure
be part of the standard library. But in any case
dynamic-closure
depends on at least binary
in order to do its
work, which it would be undesirable to pull into base
, so is best
kept separate from GHC.