Version 4 (modified by tibbe, 2 years ago)

--

Proposal: Add the vector package to the Haskell Platform

This is a proposal for the 'vector' package to be included in the next major release of the Haskell platform.

Everyone is invited to review this proposal, following the standard procedure for proposing and reviewing packages.

Review comments should be sent to the libraries mailing list before 1 October 2012, which is the discussion deadline.

Credits

Proposal author: Johan Tibell

Package maintainer: Roman Leshchinskiy

The following individuals contributed to the review process:

  • Henning Thielemann
  • Bryan O'Sullivan
  • Bas Van Dijk
  • Ryan Newton
  • Simon Marlow
  • Simon Peyton-Jones
  • Thomas Schilling
  • Gábor Lehel
  • Gregory Collins
  • Yitzchak Gale
  • Brandon Allbery
  • Mark Lentczner
  • Antoine Latter

Abstract

The vector package is an efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework. It supports both boxed and unboxed data, the latter having a more efficient implementation.

Documentation and tarball from the hackage page:

http://hackage.haskell.org/package/vector

Development repo:

darcs get http://code.haskell.org/vector/

Rationale

Vectors are ubiquitous in programming. They have great cache locality and thus provide both very efficient bulk operations and indexing, outperforming regular Haskell lists in many cases. Vectors are also useful low-level building blocks. For example, the unordered-containers package uses vectors to implement the hash array mapped trie used internally to implement e.g. Data.HashMap.

The Haskell Platform already provides a vector package, array, but many users prefer the vector package, due to

  • having a more modern and comprehensive API (similar to bytestring and text), and
  • a more efficient implementation.

In addition, while the Array type's indexing mechanism is more general, it's slower and can be a nuisance as most algorithms on vectors assume zero-based, Int-indexed vectors.

The platform needs a common vector type that other packages can use in their APIs. I argue that Vector is a better candidate than Array and that we should therefor bless it by moving it into the platform.

Introduction to the API

The API is structured as follows:

  • Data.Vector - Boxed vectors of arbitrary types.
  • Data.Vector.Unboxed - Unboxed vectors with an adaptive representation based on data type families.
  • Data.Vector.Storable - Unboxed vectors of Storable types.
  • Data.Vector.Primitive - Unboxed vectors of primitive types as defined by the primitive package. Data.Vector.Unboxed is more flexible at no performance cost.
  • Data.Vector.Generic - Generic interface to the vector types.

The API has a traditional list-like API for immutable vectors and an imperative API for mutable vectors. The full reference documentation is available on Hackage. There's also an introductory tutorial on the Haskell wiki.

Immutable vectors are processed using bulk operations, such as maps, filters, and folds and not using the explicitly recursive approach often used by lists. This is due to the O(n) cost of cons. Incremental construction of immutable vectors is done by mutating a mutable vector and then freezing it, converting it into an immutable vector. Mutation is done either in ST or IO. The primitive package, which vector depends on, allows code to abstract over the underlying monad used in imperative code.

There are a number of higher level algorithms on vector defined in the vector-algorithms package (not proposed for inclusion at this point.)

Design decisions

  • Uses a list-like API, which should be very familiar to Haskell programmers.
  • Uses type families to get more efficient representations of unpackable data (e.g. primitive values, such as Int, and records thereof.)
  • Abstracts over the monad in which algorithms working on immutable data runs in. This allows users to e.g. use mutable vectors in IO without having to litter their code with stToIO calls.

Open issues

  • The vector package depends on the primitive package, which is a thin wrapper over the Array# and MutableArray# operations provided by GHC. The package is somewhat of an implementation detail, but it is is independently useful as it provides a monad that abstracts over IO and ST, which would be needed for e.g. an hash table implementation in the containers package. This package would also have to be included in the platform as well as there's currently no way to hide it. -- Johan

Decision: While the package name is maybe not ideal, we will include the package as is. We will not widely advertise its inclusion (it's mostly an implementation detail).

  • The package has a large number (11) of .Safe modules, for the benefit of the SafeHaskell? language feature. These modules cluttered the API and risk to confuse new users (who wouldn't want to use the API in a "safe" way?). I suggest they be moved into their own package that users of SafeHaskell? can rely on. -- Johan

Decision: We will include a version of vector without the .Safe modules. The platform as a whole will decide on SafeHaskell?'s place in the platform separately, some time in the future.