| Version 1 (modified by igloo, 3 years ago) |
|---|
DRAFT! Not yet submitted!
Proposal: Add binary 0.5.0.2 to the Haskell Platform
Proposal Author: Ian Lynagh
Maintainer: Lennart Kolmodin, Don Stewart
Introduction
This is a proposal for the binary package to be included in the next major release of the Haskell platform.
Everyone is invited to review this proposal, following the standard procedure for proposing and reviewing packages.
http://trac.haskell.org/haskell-platform/wiki/AddingPackages
Review comments should be sent to the libraries mailing list by January 31st.
Credits
The following individuals contributed to the review process: <no-one, yet!>
Abstract
The 'binary' package provides efficient, pure binary serialisation using lazy ByteStrings.
Haskell values may be encoded to and from binary formats, written to disk as binary, or sent over the network.
The binary format can either be an externally defined format, or binary's internal default format may be used if you wish only to serialise and deserialise from a Haskell program.
Documentation and tarball from the hackage page:
Main development repo:
darcs get http://code.haskell.org/binary/
Active branches:
darcs get http://www.haskell.org/~kolmodin/code/binary-push-unpacked
All package requirements are met.
Rationale
binary provides basic functionality not yet available in the Haskell Platform.
binary has 193 direct reverse dependencies including Agda, hxt, Pugs SHA and tar. It is also used by GHC, although currently GHC's copy is renamed as binary is not in the HP.
The API
The API is broken up into four pieces:
- The main interface, for serialising and deserialising values:
http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary.html
- Functions for implementing serialisation for datatypes:
http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary-Put.html
- Functions for implementing deserialisation for datatypes:
http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary-Get.html
- An internal type used for constructing ByteStrings incrementally:
http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary-Builder.html
Here is an example of the basic functionality, from the haddock docs:
To serialise a custom type, an instance of Binary for that type is required. For example, suppose we have a data structure:
> data Exp = IntE Int
> | OpE String Exp Exp
> deriving Show
We can encode values of this type into bytestrings using the following instance, which proceeds by recursively breaking down the structure to serialise:
> instance Binary Exp where
> put (IntE i) = do put (0 :: Word8)
> put i
> put (OpE s e1 e2) = do put (1 :: Word8)
> put s
> put e1
> put e2
>
> get = do t <- get :: Get Word8
> case t of
> 0 -> do i <- get
> return (IntE i)
> 1 -> do s <- get
> e1 <- get
> e2 <- get
> return (OpE s e1 e2)
Note how we write an initial tag byte to indicate each variant of the data type.
We can simplify the writing of 'get' instances using monadic combinators:
> get = do tag <- getWord8
> case tag of
> 0 -> liftM IntE get
> 1 -> liftM3 OpE get get get
To serialise this to a bytestring, we use 'encode', which packs the data structure into a binary format, in a lazy bytestring
> > let e = OpE "*" (IntE 7) (OpE "/" (IntE 4) (IntE 2))
> > let v = encode e
Where 'v' is a binary encoded data structure. To reconstruct the original data, we use 'decode'
> > decode v :: Exp
> OpE "*" (IntE 7) (OpE "/" (IntE 4) (IntE 2))
The lazy ByteString that results from 'encode' can be written to disk, and read from disk using Data.ByteString.Lazy IO functions, such as hPutStr or writeFile:
> > writeFile "/tmp/exp.txt" (encode e)
And read back with:
> > readFile "/tmp/exp.txt" >>= return . decode :: IO Exp
> OpE "*" (IntE 7) (OpE "/" (IntE 4) (IntE 2))
We can also directly serialise a value to and from a Handle, or a file:
> > v <- decodeFile "/tmp/exp.txt" :: IO Exp
> OpE "*" (IntE 7) (OpE "/" (IntE 4) (IntE 2))
And write a value to disk
> > encodeFile "/tmp/a.txt" v
Design decisions and random facts
- The interface is pure, modulo IO helper functions for (de)serialising directly to files pure
- Built on top of lazy ByteString
- Uses CPP extension
- When building with GHC, uses MagicHash and UnboxedTuple extensions
- Uses FlexibleContexts extension fo this instance: instance (Binary i, Ix i, Binary e, IArray UArray e) => Binary (UArray i e) where
- The implementation is entirely Haskell (no additional C code or libraries).
- The package provides a QuickCheck testsuite and some benchmarks.
- The package adds no new dependencies to the HP.
- The package builds with the Simple cabal way.
- There is no existing functionality for binary serialisation in the HP.
- All but one exports have haddock docs, and many have complexity annotations.
- The code is -Wall clean
Open issues
- There is currently work on redesigning the parsing interface to support incremental parsing. The work is taking place in the binary-push and binary-push-unpacked branches, and the changes are in the Data.Binary.Get module. We may wish to accept the package with this change, rather than adding it in its current form.
Notes
The implementation consists of 4 modules. The modules are under 2000 lines, under 1000 of which is actual code.
