Version 2 (modified by tibbe, 19 months ago)

--

Proposal: Add the unordered-containers package and the hashable package to the Haskell Platform

This is a proposal for the 'attoparsec' package version 0.2.3.0 and the 'hashable' package version 1.1.2.5 to be included in the next major release of the Haskell platform.

Everyone is invited to review this proposal, following the standard procedure for proposing and reviewing packages.

Review comments should be sent to the libraries mailing list before <TODO>, which is the discussion deadline.

Credits

Proposal author: Johan Tibell

Package maintainer: Johan Tibell

The following individuals contributed to the review process:

  • ...

Abstract

The 'unordered-containers' package implements hashing-based containers that trade some functionality (lack of ordering) for speed. The 'hashable' package provides a type class that, like Ord, is implemented by all types that can be used as keys in these containers.

Documentation and tarballs from Hackage:

Development repos:

Rationale

Many applications that analyse large data sets ("Big Data") hold large lookup tables in memory. unordered-containers simultaneously targets such applications while also offering a faster drop-in replacement for e.g. Data.Map.

unordered-containers trades some functionality (lack of key-order related function) for [speed and lower memory usage](http://blog.johantibell.com/2012/03/announcing-unordered-containers-02.html). Lookup operations are more than twice as fast and insertion is between 33-50% faster than Data.Map. The memory overhead is ~4.5 words per key-value pair while Data.IntMap uses 8 (includes key) and Data.Map uses 6.

The unordered-containers API should be familiar to any Haskeller that has used the containers package.

The unordered-containers library is very widely used: according to http://packdeps.haskellers.com/, it has 122 reverse dependencies on Hackage.

Introduction to the API

The unordered-containers API should be familiar to any Haskeller that has used the containers package. It contains a subset of the functions in e.g. Data.Map. The idea is to eventually grow the API to be as rich as the the containers one (minus the operations that rely on key ordering).

The hashable API contains a single type class, which is in spirit similar to Ord:

class Hashable a where
    hash :: a -> Int
    hash = hashWithSalt defaultSalt

    hashWithSalt :: Int -> a -> Int
    hashWithSalt salt x = salt `combine` hash x

hashWithSalt computes a hash value for its argument, mixing in the given salt. The use of a salt allows the user to produce different hash values from the same input, which is important in some applications, like cuckoo hashing.

Design decisions

## Type class instead of higher-order function to generate hash values

Instead of using a type class to create hash values one could use a higher-order argument to the container constructor

empty :: (k -> Int)  -- ^ hash function
      -> (k -> k -> Bool)  -- ^ equality test
      -> HashMap k v

as done in Data.HashTable. This approach was not chosen for the following reasons:

  • It's less convenient: users rarely want to vary the hash function used for a given key type.
  • It's more error prone: hash functions are a specialist topic and letting the user choosing which function to use might lead to he/she using a bad one.
  • It's algorithmically less efficient for some operations: e.g. set-union can be done more efficiently if we know both sets used the same hash function.
  • It's faster: GHC does a better job specializing functions that use dictionaries (via INLIABLE) than functions that use higher-order arguments.

Open issues

  • Q: Why use an older version of hashable?
  • A: The latest version of hashable has some performance issues, mostly due to it design, that need to be addressed before it can be included in the platform.