Opened 9 years ago
Closed 8 years ago
#5321 closed bug (fixed)
Very slow constraint solving for type families
Reported by: | simonpj | Owned by: | |
---|---|---|---|
Priority: | low | Milestone: | 7.6.1 |
Component: | Compiler (Type checker) | Version: | 7.0.3 |
Keywords: | Cc: | gershomb@…, dimitris@…, pho@… | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Compile-time performance bug | Test Case: | perf/compiler/T5321Fun, T5321FD |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description
This post (to the ghc-users mailing list, from Gershom Bazerman) is in literate Haskell. It describes how certain performance leaks are introduced in type level programming. These leaks do not affect program runtimes, but can cause compile times to grow drastically. They exist both with Functional Dependencies and Type Families, but are currently worse with the former, and have grown worse with the new constraint solver in GHC 7. It is intended both as a guide to those encountering these issues, and as a motivation for the GHC development team to address such issues as the constraint solver is developed and improved.
> {-# OPTIONS_GHC -fcontext-stack=1000 #-} {-# LANGUAGE > FlexibleContexts, FlexibleInstances, FunctionalDependencies, > MultiParamTypeClasses, OverlappingInstances, TypeSynonymInstances, > TypeOperators, UndecidableInstances, TypeFamilies #-} module > TypePerformance where
Our running example, for simplicity's sake, is a type-level map of a single function. For reference, here is the code for a simple value-level map of a single function.
> vfoo = id > mapfoo (x : xs) = vfoo x : mapfoo xs > mapfoo [] = []
Because Haskell is a lazy language, this runs in O(n) time and constant stack.
We now lift map to the type level, to operate over HLists.
First, the basic HList types
> infixr 3 :* > data x :* xs = x :* xs deriving Show > data HNil = HNil deriving Show
Next, a large boring HList
> -- Adds ten cells > addData x = i :* i :* d :* d :* s :* > i :* i :* d :* d :* s :* > x > where i = 1 :: Int > d = 1 :: Double > s = "" > > -- Has 70 cells. > sampleData = addData $ addData $ addData $ addData $ addData $ > addData $ addData $ > HNil
Next, a simple polymorphic function to map
> class Foo x y | x -> y > where foo :: x -> y > foo = undefined > instance Foo Int Double > instance Foo Double Int > instance Foo String String
Now, our map
> class HMapFoo1 as bs | as -> bs where > hMapFoo1 :: as -> bs > > instance (Foo a b, HMapFoo1 as bs) => HMapFoo1 (a :* as) (b :* bs) where > hMapFoo1 (x :* xs) = foo x :* hMapFoo1 xs > > instance HMapFoo1 HNil HNil where > hMapFoo1 _ = HNil
If we enable the following line, compilation time is ~ 9 seconds.
> testHMapFoo1 = hMapFoo1 sampleData
Furthermore, playing with the size of sampleData, we see that the time spent in compilation is superlinear -- each additional cell costs a greater amount of time. This is because while Haskell is lazy at the value level, it is strict at the type level. Therefore, just as in a strict language, HMapFoo1's cost grows O(n2) because even as we induct over the as, we build up a stack of bs. Just as in a strict language, the solution is to make hMapFoo tail recursive through introducing an accumulator. This also reverses the hlist, but never mind that.
> class HMapFoo2 acc as bs | acc as -> bs where > hMapFoo2 :: acc -> as -> bs > > instance (Foo a b, HMapFoo2 (b :* bs) as res) => HMapFoo2 bs (a :* as) res where > hMapFoo2 acc (x :* xs) = hMapFoo2 (foo x :* acc) xs > > instance HMapFoo2 acc HNil acc where > hMapFoo2 acc _ = acc
If we enable the following line, compilation time is a much more satisfying ~0.5s.
> testHMapFoo2 = hMapFoo2 HNil sampleData
But wait, there's trouble on the horizon! Consider the following version:
> class HMapFoo3 acc as bs | acc as -> bs where > hMapFoo3 :: acc -> as -> bs > > instance (HMapFoo3 (b :* bs) as res, Foo a b) => HMapFoo3 bs (a :* as) res where > hMapFoo3 acc (x :* xs) = hMapFoo3 (foo x :* acc) xs > > instance HMapFoo3 acc HNil acc where > hMapFoo3 acc _ = acc
The only difference between hMapFoo2 and hMapFoo3 is that the order of constraints on the inductive case has been reversed, with the recursive constraint first and the immediately checkable constraint second. Now, if we enable the following line, compilation time rockets to ~6s!
> testHMapFoo3 = hMapFoo3 HNil sampleData
Slowdowns such as those described here are not a purely hypothetical issue, but have caused real problems in production code. The example given above is fairly simple. The constraints used are minimal and easily checked. In real programs, the constraints are more difficult, increasing constant factors significantly. These constant factors, combined with unexpected algorithmic slowdowns due to the type inference engine, can lead (and have lead) to compilation times of HList-style code becoming deeply unwieldy-to-unusable, and can lead (and have lead) to this occuring only well after this code has been introduced and used on smaller cases without trouble.
The constraint solver should certainly be smart enough to reduce the compile times of HMapFoo3 to those of HMapFoo2. In fact, with type families, the there is no difference (see below). Could the compiler be smart enough to do the same for HMapFoo1? I'm not sure. Certainly, it could at least knock down its own constant factors a bit, even if it can't improve the big-O performance there.
Appendix: Examples with Type Families
As the below code demonstrates, the same issues demonstrated with Functional Dependencies also appear with Type Families, although less horribly, as their code-path seems more optimized in the current constraint solver:
> class TFoo x where > type TFooFun x > tfoo :: x -> TFooFun x > tfoo = undefined > > instance TFoo Int where > type TFooFun Int = Double > instance TFoo Double where > type TFooFun Double = Int > instance TFoo String where > type TFooFun String = String > > class THMapFoo1 as where > type THMapFoo1Res as > thMapFoo1 :: as -> THMapFoo1Res as > > instance (TFoo a, THMapFoo1 as) => THMapFoo1 (a :* as) where > type THMapFoo1Res (a :* as) = TFooFun a :* THMapFoo1Res as > thMapFoo1 (x :* xs) = tfoo x :* thMapFoo1 xs > > instance THMapFoo1 HNil where > type THMapFoo1Res HNil = HNil > thMapFoo1 _ = HNil
The following, when enabled, takes ~3.5s. This demonstrates that slowdown occurs with type families as well.
> testTHMapFoo1 = thMapFoo1 sampleData > class THMapFoo2 acc as where > type THMapFoo2Res acc as > thMapFoo2 :: acc -> as -> THMapFoo2Res acc as > > instance (TFoo a, THMapFoo2 (TFooFun a :* acc) as) => THMapFoo2 acc (a :* as) where > type THMapFoo2Res acc (a :* as) = THMapFoo2Res (TFooFun a :* acc) as > thMapFoo2 acc (x :* xs) = thMapFoo2 (tfoo x :* acc) xs > > instance THMapFoo2 acc HNil where > type THMapFoo2Res acc HNil = acc > thMapFoo2 acc _ = acc
The following, when enabled, takes ~0.6s. This demonstrates that the tail recursive transform fixes the slowdown with type families just as with fundeps.
> testTHMapFoo2 = thMapFoo2 HNil sampleData > class THMapFoo3 acc as where > type THMapFoo3Res acc as > thMapFoo3 :: acc -> as -> THMapFoo3Res acc as > > instance (THMapFoo3 (TFooFun a :* acc) as, TFoo a) => THMapFoo3 acc (a :* as) where > type THMapFoo3Res acc (a :* as) = THMapFoo3Res (TFooFun a :* acc) as > thMapFoo3 acc (x :* xs) = thMapFoo3 (tfoo x :* acc) xs > > instance THMapFoo3 acc HNil where > type THMapFoo3Res acc HNil = acc > thMapFoo3 acc _ = acc
The following, when enabled, also takes ~0.6s. This demonstrates that, unlike the fundep case, the order of type class constraints does not, in this instance, affect the performance of type families.
> testTHMapFoo3 = thMapFoo3 HNil sampleData
Change History (6)
comment:1 Changed 9 years ago by
Cc: | pho@… added |
---|
comment:2 Changed 8 years ago by
Milestone: | → 7.4.1 |
---|
comment:3 Changed 8 years ago by
difficulty: | → Unknown |
---|---|
Test Case: | → perf/compiler/T5321Fun, T5321FD |
comment:4 Changed 8 years ago by
Milestone: | 7.4.1 → 7.6.1 |
---|---|
Priority: | normal → low |
comment:5 Changed 8 years ago by
With the changes I've made things have been improved. With functional dependencies:
testHMapFoo1 0.21sec 96MB testHMapFoo2 0.21sec 95MB testHMapFoo3 0.20sec 92MB
I dont understand very well your third allocation number Simon (you meant Mbytes?) With type families we get:
testHMapFoo1 0.21sec 117MB testHMapFoo2 0.29sec 124MB testHMapFoo3 0.29sec 125MB
So they are much more comparable (and faster too!). The slightly higher allocation I believe is probably caused because with type families there's evidence around whereas with Derived constraints from fundeps that's not the case.
This all is in my branch but I will push to HEAD shortly. The file is already in the testuite.
comment:6 Changed 8 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Changes are now in HEAD (I forget the exact commit) and regression tests added.
I've just tried this with the HEAD (roughly 7.4). Here's what I get.
With functional dependencies:
With type functions:
I'm a bit surprised that the type-function version is slower. Dimitrios: it might be worth profiling to see where the time is going.
But it's reassuring that the performance fragility wrt the order of constraints seems to have gone away.
NB: this is without
-dcore-lint
, which makes a big difference because in programs like this we get pretty big coercions.I'm adding
T5321Fun
andT5321FD
as performance tests. Currently all tests are run with-dcore-lint
on, so the performance numbers are much worse than those above.