Proposal: Add the unordered-containers package and the hashable package to the Haskell Platform
This is a proposal for the unordered-containers package version 0.2.3.0 and the hashable package version 184.108.40.206 to be included in the next major release of the Haskell platform.
Everyone is invited to review this proposal, following the standard procedure for proposing and reviewing packages.
Review comments should be sent to the libraries mailing list before Monday March 25th, 2013, which is the discussion deadline.
Proposal author: Johan Tibell
Package maintainer: Johan Tibell
The following individuals contributed to the review process:
The unordered-container' package implements hashing-based containers that trade some functionality (lack of ordering) for speed. The hashable package provides a type class that, like Ord, is implemented by all types that can be used as keys in these containers.
Documentation and tarballs from Hackage:
Many applications that analyse large data sets ("Big Data") hold large lookup tables in memory. unordered-containers simultaneously targets such applications while also offering a faster drop-in replacement for e.g. Data.Map.
unordered-containers trades some functionality (lack of key-order related function) for [speed and lower memory usage](http://blog.johantibell.com/2012/03/announcing-unordered-containers-02.html). Lookup operations are more than twice as fast and insertion is between 33-50% faster than Data.Map. The memory overhead is ~4.5 words per key-value pair while Data.IntMap uses 8 (includes key) and Data.Map uses 6.
The unordered-containers API should be familiar to any Haskeller that has used the containers package.
The unordered-containers library is very widely used: according to http://packdeps.haskellers.com/, it has 122 reverse dependencies on Hackage.
Introduction to the API
The unordered-containers API should be familiar to any Haskeller that has used the containers package. It contains a subset of the functions in e.g. Data.Map. The idea is to eventually grow the API to be as rich as the the containers one (minus the operations that rely on key ordering).
The hashable API contains a single type class, which is in spirit similar to Ord:
class Hashable a where hash :: a -> Int hash = hashWithSalt defaultSalt hashWithSalt :: Int -> a -> Int hashWithSalt salt x = salt `combine` hash x
hashWithSalt computes a hash value for its argument, mixing in the given salt. The use of a salt allows the user to produce different hash values from the same input, which is important in some applications, like cuckoo hashing.
Type class instead of higher-order function to generate hash values
Instead of using a type class to create hash values one could use a higher-order argument to the container constructor
empty :: (k -> Int) -- ^ hash function -> (k -> k -> Bool) -- ^ equality test -> HashMap k v
as done in Data.HashTable. This approach was not chosen for the following reasons:
- It's less convenient: users rarely want to vary the hash function used for a given key type.
- It's more error prone: hash functions are a specialist topic and letting the user choosing which function to use might lead to he/she using a bad one.
- It's algorithmically less efficient for some operations: e.g. set-union can be done more efficiently if we know both sets used the same hash function.
- It's faster: GHC does a better job specializing functions that use dictionaries (via INLIABLE) than functions that use higher-order arguments.
Insert vs lookup performance
The data structure (a hash array mapped trie) that's used to implement unordered-containers trades some insert performance for a much better lookup performance. This is a sensible trade-off for the majority of applications, but there are a few programs where using e.g. Data.IntMap might be better.
- Q: Why use an older version of hashable?
- A: The latest version of hashable has some performance issues, mostly due to it design, that need to be addressed before it can be included in the platform.