DRAFT! Not yet submitted!

Proposal: Add binary 0.5.0.2 to the Haskell Platform

Proposal Author: Ian Lynagh

Maintainer: Lennart Kolmodin, Don Stewart

Introduction

This is a proposal for the binary package to be included in the next major release of the Haskell platform.

Everyone is invited to review this proposal, following the standard procedure for proposing and reviewing packages.

http://trac.haskell.org/haskell-platform/wiki/AddingPackages

Review comments should be sent to the libraries mailing list by January 31st.

Credits

The following individuals contributed to the review process: <no-one, yet!>

Abstract

The 'binary' package provides efficient, pure binary serialisation using lazy ByteStrings.

Haskell values may be encoded to and from binary formats, written to disk as binary, or sent over the network.

The binary format can either be an externally defined format, or binary's internal default format may be used if you wish only to serialise and deserialise from a Haskell program.

Documentation and tarball from the hackage page:

http://hackage.haskell.org/package/binary

Main development repo:

darcs get http://code.haskell.org/binary/

Active branches:

darcs get http://www.haskell.org/~kolmodin/code/binary-push

darcs get http://www.haskell.org/~kolmodin/code/binary-push-unpacked

All package requirements are met.

Rationale

binary provides basic functionality not yet available in the Haskell Platform.

binary has 193 direct reverse dependencies including Agda, hxt, Pugs SHA and tar. It is also used by GHC, although currently GHC's copy is renamed as binary is not in the HP.

The API

The API is broken up into four pieces:

  • The main interface, for serialising and deserialising values:

http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary.html

  • Functions for implementing serialisation for datatypes:

http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary-Put.html

  • Functions for implementing deserialisation for datatypes:

http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary-Get.html

  • An internal type used for constructing ByteStrings incrementally:

http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary-Builder.html

Here is an example of the basic functionality, from the haddock docs:

To serialise a custom type, an instance of Binary for that type is required. For example, suppose we have a data structure:

    > data Exp = IntE Int
    >          | OpE  String Exp Exp
    >    deriving Show

We can encode values of this type into bytestrings using the following instance, which proceeds by recursively breaking down the structure to serialise:

    > instance Binary Exp where
    >       put (IntE i)          = do put (0 :: Word8)
    >                                  put i
    >       put (OpE s e1 e2)     = do put (1 :: Word8)
    >                                  put s
    >                                  put e1
    >                                  put e2
    >
    >       get = do t <- get :: Get Word8
    >                case t of
    >                     0 -> do i <- get
    >                             return (IntE i)
    >                     1 -> do s  <- get
    >                             e1 <- get
    >                             e2 <- get
    >                             return (OpE s e1 e2)

Note how we write an initial tag byte to indicate each variant of the data type.

We can simplify the writing of 'get' instances using monadic combinators:

    >       get = do tag <- getWord8
    >                case tag of
    >                    0 -> liftM  IntE get
    >                    1 -> liftM3 OpE  get get get

To serialise this to a bytestring, we use 'encode', which packs the data structure into a binary format, in a lazy bytestring

    > > let e = OpE "*" (IntE 7) (OpE "/" (IntE 4) (IntE 2))
    > > let v = encode e

Where 'v' is a binary encoded data structure. To reconstruct the original data, we use 'decode'

    > > decode v :: Exp
    > OpE "*" (IntE 7) (OpE "/" (IntE 4) (IntE 2))

The lazy ByteString that results from 'encode' can be written to disk, and read from disk using Data.ByteString.Lazy IO functions, such as hPutStr or writeFile:

    > > writeFile "/tmp/exp.txt" (encode e)

And read back with:

    > > readFile "/tmp/exp.txt" >>= return . decode :: IO Exp
    > OpE "*" (IntE 7) (OpE "/" (IntE 4) (IntE 2))

We can also directly serialise a value to and from a Handle, or a file:

    > > v <- decodeFile  "/tmp/exp.txt" :: IO Exp
    > OpE "*" (IntE 7) (OpE "/" (IntE 4) (IntE 2))

And write a value to disk

    > > encodeFile "/tmp/a.txt" v

Design decisions and random facts

  • The interface is pure, modulo IO helper functions for (de)serialising directly to files pure
  • Built on top of lazy ByteString
  • Uses CPP extension
  • When building with GHC, uses MagicHash and UnboxedTuple extensions
  • Uses FlexibleContexts extension fo this instance: instance (Binary i, Ix i, Binary e, IArray UArray e) => Binary (UArray i e) where
  • The implementation is entirely Haskell (no additional C code or libraries).
  • The package provides a QuickCheck testsuite and some benchmarks.
  • The package adds no new dependencies to the HP.
  • The package builds with the Simple cabal way.
  • There is no existing functionality for binary serialisation in the HP.
  • All but one exports have haddock docs, and many have complexity annotations.
  • The code is -Wall clean

Open issues

  1. There is currently work on redesigning the parsing interface to support incremental parsing. The work is taking place in the binary-push and binary-push-unpacked branches, and the changes are in the Data.Binary.Get module. We may wish to accept the package with this change, rather than adding it in its current form.

Notes

The implementation consists of 4 modules. The modules are under 2000 lines, under 1000 of which is actual code.