Version 3 (modified by LouisWasserman, 4 years ago)

--

pqueue package proposal

Everyone is invited to review this proposal, following the standard procedure for proposing and reviewing packages:

http://trac.haskell.org/haskell-platform/wiki/AddingPackages

If you could reply with your review comments by ?? then we should have time to resolve issues before the final deadline on ??.

Credits

Proposal author and package maintainer: Louis Wasserman <wasserman.louis@…>

The following individuals contributed to the design process:

  • Milan Straka
  • Jim Apple
  • Ross Paterson

Abstract

The pqueue package is designed to provide a speedy, reliable, and trustworthy priority queue implementation for Haskell programmers. We provide queues that distinguish priority from value, queues that do not, and max and min versions of each.

Documentation and tarball from the hackage page:

http://hackage.haskell.org/package/pqueue

Development repo:

darcs get http://code.haskell.org/containers-pqueue/

Rationale

A number of packages, including primes, fgl, and monadiccp, preferred to roll their own implementations rather than using one of the many implementations now on Hackage. The goal of this package is to provide a standard, trustworthy implementation that meets the needs of the broadest possible range of users. A broadly accepted, ubiquitous library for priority queues would significantly reduce the necessity for package writers to roll their own implementations. Here are several reasons programmers might not be satisfied with existing Hackage packages:

  • Unwillingness to add a dependency that isn't relatively standardized.
  • Realtime performance constraints or other concerns about asymptotics, including concerns over persistence.
  • Dissatisfaction with the API, either because the API is overcomplicated or because it is insufficiently generalized.

Introduction to the API

The package primarily consists of four modules:

  • Data.PQueue.Min
  • Data.PQueue.Max
  • Data.PQueue.Prio.Min
  • Data.PQueue.Prio.Max

Data.PQueue.Min and Data.PQueue.Max are modeled after Data.Set, while Data.PQueue.Prio.Min and Data.PQueue.Prio.Max are modeled after Data.Map.

First, let's look at the API for Data.PQueue.Prio.Min. We export the type MinPQueue k a, which is a minimum priority queue which associates elements of type a associated with ordered keys of type k. The primary extraction method on MinPQueue k a is minViewWithKey, which behaves exactly like the method of the same name in Data.Map.

For an example, consider the following method for FGL, which takes a root node and returns a graph containing the shortest-path tree starting at that root.

dijkstraTree :: (DynGraph gr, Ord b, Num b) => Node -> gr a b -> gr a b
dijkstraTree v g = dijkstraTree' (singleton 0 (v, v, 0)) empty g

-- we take as input a priority queue of edges indexed by total distance from the root node,
-- a partial shortest path tree, and the remainder of the graph.  We label the returned tree
-- with distance from the root.
dijkstraTree' :: (DynGraph gr, Ord b, Num b) => MinPQueue b (LEdge b) -> gr a b -> gr b b -> gr a b
dijkstraTree' q tree g = case minViewWithKey q of
   Nothing -> tree
   Just ((dV, (vPre, v, dVE)), q') -> case match v g of
      (Nothing, g')  -> dijkstraTree' q' tree g'
      (Just cxt, g') -> dijkstraTree' (q' `union` fromList [(dV + d, w) | (w, d) <- lsuc' cxt])
                            (([(dVE, vPre)], v, dV, []) & tree) g'

In general, the package attempts to match the interface of Data.Map and Data.Set in many ways, save that our priority queues allow multiple equal priorities or elements.

Here is a complete list of methods offered by Data.PQueue.Prio.Min that are analogous to the methods of the same name in Data.Map: empty, singleton, insert, union, unions, null, size, findMin, deleteMin, deleteFindMin, updateMin, updateMinWithKey, minView, minViewWithKey, map, mapWithKey, mapKeys, mapKeysMonotonic, foldrWithKey, foldlWithKey, filter, filterWithKey, partition, partitionWithKey, mapMaybe, mapMaybeWithKey, mapEither, mapEitherWithKey, fromList, fromAscList, keys, elems, assocs, toAscList, toDescList, toList. Furthermore, as of this writing Data.Map implements Traversable, but does not add a traverseWithKey method; we offer traverseWithKey.

In addition, we offer a few new methods designed to increase flexibility in manipulating the minimum element of the queue, including getMin (a Maybe version of findMin), adjustMin and adjustMinWithKey (with type analogous to Data.Map.adjust, except that they adjust the minimum element).

The biggest class of added methods is in methods modeled after the take and takeWhile family of Data.List methods. The most general examples are

splitAt :: Int -> MinPQueue k a -> ([(k, a)], MinPQueue k a)
spanWithKey :: Ord k => (k -> a -> Bool) -> MinPQueue k a -> ([(k, a)], MinPQueue k a)

In general, when we do a take-like method, we get an association list, and when we do a drop-like method, we get a queue.

Finally, we offer a number of unordered traversal methods, all of the form xxxU to indicate that they do not respect order. In most cases, they are asymptotically faster than the ordered traversal methods, and the speed advantage is the primary reason to allow this violation of the priority queue abstraction.

The situation for Data.PQueue.Min (and Data.PQueue.Max) is essentially analogous to Data.Set. The most prominent change is that foldrAsc and foldlDesc methods are offered instead of a Foldable instance, which would be impossible due to the Ord constraints.

Important design decisions

The biggest design decision was the choice of data structure. While this could certainly be changed in a future release, for the moment we have settled on a type-magical binomial heap implementation with a lazy spine and a global root. Here are some reasons for this decision:

  • Offers optimal asymptotics on the most common usage styles, in particular, insertion and deletion in a single-threaded context. (This is the primary motivation for excluding leftist heaps, which were otherwise very closely competitive.)
  • Offers good asymptotics on all operations, including union, even in a persistent context.
  • Benchmarks indicate superior performance against other priority queue implementations on a variety of operations.
  • Exemplifies Haskell programming style and power of the type system.

The other big design decision was the choice to restrict the exposed interface to an H98-compatible API and to generally follow design choices made by the containers package. The primary motivation for these design decisions is my own personal inclination not to "rock the boat."