= pqueue package proposal =
Everyone is invited to review this proposal, following the standard
procedure for proposing and reviewing packages:
http://trac.haskell.org/haskell-platform/wiki/AddingPackages
If you could reply with your review comments by ?? then we should have time to resolve issues before the final deadline
on ??.
= Credits =
Proposal author and package maintainer: Louis Wasserman
The following individuals contributed to the design process:
* Milan Straka
* Jim Apple
* Ross Paterson
= Abstract =
The pqueue package is designed to provide a speedy, reliable, and trustworthy priority queue implementation for Haskell programmers. We provide queues that distinguish priority from value, queues that do not, and max and min versions of each.
Documentation and tarball from the hackage page:
http://hackage.haskell.org/package/pqueue
Development repo:
darcs get http://code.haskell.org/containers-pqueue/
= Rationale =
A number of packages, including primes, fgl, and monadiccp, preferred to roll their own implementations rather than using one of the many implementations now on Hackage. The goal of this package is to provide a standard, trustworthy implementation that meets the needs of the broadest possible range of users. A broadly accepted, ubiquitous library for priority queues would significantly reduce the necessity for package writers to roll their own implementations. Here are several reasons programmers might not be satisfied with existing Hackage packages:
* Unwillingness to add a dependency that isn't relatively standardized.
* Realtime performance constraints or other concerns about asymptotics, including concerns over persistence.
* Dissatisfaction with the API, either because the API is overcomplicated or because it is insufficiently generalized.
= Introduction to the API =
The package primarily consists of four modules:
* Data.PQueue.Min
* Data.PQueue.Max
* Data.PQueue.Prio.Min
* Data.PQueue.Prio.Max
Data.PQueue.Min and Data.PQueue.Max are modeled after Data.Set, while Data.PQueue.Prio.Min and Data.PQueue.Prio.Max are modeled after Data.Map.
First, let's look at the API for Data.PQueue.Prio.Min. We export the type `MinPQueue k a`, which is a minimum priority queue which associates ''elements'' of type `a` associated with ordered ''keys'' of type `k`. The primary extraction method on `MinPQueue k a` is `minViewWithKey`, which behaves exactly like the method of the same name in Data.Map.
For an example, consider the following method for FGL, which takes a root node and returns a graph containing the shortest-path tree starting at that root.
{{{
dijkstraTree :: (DynGraph gr, Ord b, Num b) => Node -> gr a b -> gr a b
dijkstraTree v g = dijkstraTree' (singleton 0 (v, v, 0)) empty g
-- we take as input a priority queue of edges indexed by total distance from the root node,
-- a partial shortest path tree, and the remainder of the graph. We label the returned tree
-- with distance from the root.
dijkstraTree' :: (DynGraph gr, Ord b, Num b) => MinPQueue b (LEdge b) -> gr a b -> gr b b -> gr a b
dijkstraTree' q tree g = case minViewWithKey q of
Nothing -> tree
Just ((dV, (vPre, v, dVE)), q') -> case match v g of
(Nothing, g') -> dijkstraTree' q' tree g'
(Just cxt, g') -> dijkstraTree' (q' `union` fromList [(dV + d, w) | (w, d) <- lsuc' cxt])
(([(dVE, vPre)], v, dV, []) & tree) g'
}}}
In general, the package attempts to match the interface of Data.Map and Data.Set in many ways, save that our priority queues allow multiple equal priorities or elements.
Here is a complete list of methods offered by Data.PQueue.Prio.Min that are analogous to the methods of the same name in Data.Map: `empty`, `singleton`, `insert`, `union`, `unions`, `null`, `size`, `findMin`, `deleteMin`, `deleteFindMin`, `updateMin`, `updateMinWithKey`, `minView`, `minViewWithKey`, `map`, `mapWithKey`, `mapKeys`, `mapKeysMonotonic`, `foldrWithKey`, `foldlWithKey`, `filter`, `filterWithKey`, `partition`, `partitionWithKey`, `mapMaybe`, `mapMaybeWithKey`, `mapEither`, `mapEitherWithKey`, `fromList`, `fromAscList`, `keys`, `elems`, `assocs`, `toAscList`, `toDescList`, `toList`. Furthermore, as of this writing Data.Map implements `Traversable`, but does not add a `traverseWithKey` method; we offer `traverseWithKey`.
In addition, we offer a few new methods designed to increase flexibility in manipulating the minimum element of the queue, including `getMin` (a Maybe version of `findMin`), `adjustMin` and `adjustMinWithKey` (with type analogous to `Data.Map.adjust`, except that they adjust the minimum element).
The biggest class of added methods is in methods modeled after the `take` and `takeWhile` family of Data.List methods. The most general examples are
{{{
splitAt :: Int -> MinPQueue k a -> ([(k, a)], MinPQueue k a)
spanWithKey :: Ord k => (k -> a -> Bool) -> MinPQueue k a -> ([(k, a)], MinPQueue k a)
}}}
In general, when we do a `take`-like method, we get an association list, and when we do a `drop`-like method, we get a queue.
Finally, we offer a number of unordered traversal methods, all of the form `xxxU` to indicate that they do not respect order. In most cases, they are asymptotically faster than the ordered traversal methods, and the speed advantage is the primary reason to allow this violation of the priority queue abstraction.
The situation for Data.PQueue.Min (and Data.PQueue.Max) is essentially analogous to Data.Set. The most prominent change is that `foldrAsc` and `foldlDesc` methods are offered instead of a `Foldable` instance, which would be impossible due to the `Ord` constraints.
= Important design decisions =
The biggest design decision was the choice of data structure. While this could certainly be changed in a future release, for the moment we have settled on a type-magical binomial heap implementation with a lazy spine and a global root. Here are some reasons for this decision:
* Offers optimal asymptotics on the most common usage styles, in particular, insertion and deletion in a single-threaded context. (This is the primary motivation for excluding leftist heaps, which were otherwise very closely competitive.)
* Offers good asymptotics on all operations, including union, even in a persistent context.
* [http://code.haskell.org/containers-pqueue/bench-chart.pdf Benchmarks] indicate superior performance against other priority queue implementations on a variety of operations.
* Exemplifies Haskell programming style and power of the type system.
The other big design decision was the choice to restrict the exposed interface to an H98-compatible API and to generally follow design choices made by the containers package. The primary motivation for these design decisions is my own personal inclination not to "rock the boat."