Opened 8 years ago

Closed 7 years ago

#5321 closed bug (fixed)

Very slow constraint solving for type families

Reported by: simonpj Owned by:
Priority: low Milestone: 7.6.1
Component: Compiler (Type checker) Version: 7.0.3
Keywords: Cc: gershomb@…, dimitris@…, pho@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: perf/compiler/T5321Fun, T5321FD
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

This post (to the ghc-users mailing list, from Gershom Bazerman) is in literate Haskell. It describes how certain performance leaks are introduced in type level programming. These leaks do not affect program runtimes, but can cause compile times to grow drastically. They exist both with Functional Dependencies and Type Families, but are currently worse with the former, and have grown worse with the new constraint solver in GHC 7. It is intended both as a guide to those encountering these issues, and as a motivation for the GHC development team to address such issues as the constraint solver is developed and improved.

> {-# OPTIONS_GHC -fcontext-stack=1000 #-} {-# LANGUAGE 
> FlexibleContexts, FlexibleInstances, FunctionalDependencies, 
> MultiParamTypeClasses, OverlappingInstances, TypeSynonymInstances, 
> TypeOperators, UndecidableInstances, TypeFamilies #-} module 
> TypePerformance where

Our running example, for simplicity's sake, is a type-level map of a single function. For reference, here is the code for a simple value-level map of a single function.

> vfoo = id
> mapfoo (x : xs) = vfoo x : mapfoo xs
> mapfoo [] = []

Because Haskell is a lazy language, this runs in O(n) time and constant stack.

We now lift map to the type level, to operate over HLists.

First, the basic HList types

> infixr 3 :*
> data x :* xs = x :* xs deriving Show
> data HNil = HNil deriving Show

Next, a large boring HList

> -- Adds ten cells
> addData x = i :* i :* d :* d :* s :* 
>             i :* i :* d :* d :* s :* 
>             x 
>     where i = 1 :: Int 
>           d = 1 :: Double 
>           s = "" 
> 
> -- Has 70 cells. 
> sampleData = addData $ addData $ addData $ addData $ addData $ 
>              addData $ addData $ 
>              HNil

Next, a simple polymorphic function to map

> class Foo x y | x -> y 
>     where foo :: x -> y 
>           foo = undefined

> instance Foo Int Double
> instance Foo Double Int
> instance Foo String String

Now, our map

> class HMapFoo1 as bs | as -> bs where 
>     hMapFoo1 :: as -> bs
> 
> instance (Foo a b, HMapFoo1 as bs) => HMapFoo1 (a :* as) (b :* bs) where 
>     hMapFoo1 (x :* xs) = foo x :* hMapFoo1 xs
> 
> instance HMapFoo1 HNil HNil where 
>     hMapFoo1 _ = HNil

If we enable the following line, compilation time is ~ 9 seconds.

  > testHMapFoo1 = hMapFoo1 sampleData 

Furthermore, playing with the size of sampleData, we see that the time spent in compilation is superlinear -- each additional cell costs a greater amount of time. This is because while Haskell is lazy at the value level, it is strict at the type level. Therefore, just as in a strict language, HMapFoo1's cost grows O(n2) because even as we induct over the as, we build up a stack of bs. Just as in a strict language, the solution is to make hMapFoo tail recursive through introducing an accumulator. This also reverses the hlist, but never mind that.

> class HMapFoo2 acc as bs | acc as -> bs where 
>     hMapFoo2 :: acc -> as -> bs
> 
> instance (Foo a b, HMapFoo2 (b :* bs) as res) => HMapFoo2 bs (a :* as) res where 
>     hMapFoo2 acc (x :* xs) = hMapFoo2 (foo x :* acc) xs
> 
> instance HMapFoo2 acc HNil acc where 
>     hMapFoo2 acc _ = acc

If we enable the following line, compilation time is a much more satisfying ~0.5s.

  > testHMapFoo2 = hMapFoo2 HNil sampleData 

But wait, there's trouble on the horizon! Consider the following version:

> class HMapFoo3 acc as bs | acc as -> bs where 
>     hMapFoo3 :: acc -> as -> bs
> 
> instance (HMapFoo3 (b :* bs) as res, Foo a b) => HMapFoo3 bs (a :* as) res where 
>     hMapFoo3 acc (x :* xs) = hMapFoo3 (foo x :* acc) xs
> 
> instance HMapFoo3 acc HNil acc where 
>     hMapFoo3 acc _ = acc

The only difference between hMapFoo2 and hMapFoo3 is that the order of constraints on the inductive case has been reversed, with the recursive constraint first and the immediately checkable constraint second. Now, if we enable the following line, compilation time rockets to ~6s!

  > testHMapFoo3 = hMapFoo3 HNil sampleData 

Slowdowns such as those described here are not a purely hypothetical issue, but have caused real problems in production code. The example given above is fairly simple. The constraints used are minimal and easily checked. In real programs, the constraints are more difficult, increasing constant factors significantly. These constant factors, combined with unexpected algorithmic slowdowns due to the type inference engine, can lead (and have lead) to compilation times of HList-style code becoming deeply unwieldy-to-unusable, and can lead (and have lead) to this occuring only well after this code has been introduced and used on smaller cases without trouble.

The constraint solver should certainly be smart enough to reduce the compile times of HMapFoo3 to those of HMapFoo2. In fact, with type families, the there is no difference (see below). Could the compiler be smart enough to do the same for HMapFoo1? I'm not sure. Certainly, it could at least knock down its own constant factors a bit, even if it can't improve the big-O performance there.


Appendix: Examples with Type Families

As the below code demonstrates, the same issues demonstrated with Functional Dependencies also appear with Type Families, although less horribly, as their code-path seems more optimized in the current constraint solver:

> class TFoo x where 
>      type TFooFun x 
>      tfoo :: x -> TFooFun x 
>      tfoo = undefined
> 
> instance TFoo Int where 
>     type TFooFun Int = Double
> instance TFoo Double where 
>     type TFooFun Double = Int
> instance TFoo String where 
>     type TFooFun String = String
> 
> class THMapFoo1 as where 
>       type THMapFoo1Res as 
>       thMapFoo1 :: as -> THMapFoo1Res as
> 
> instance (TFoo a, THMapFoo1 as) => THMapFoo1 (a :* as) where 
>     type THMapFoo1Res (a :* as) = TFooFun a :* THMapFoo1Res as 
>     thMapFoo1 (x :* xs) = tfoo x :* thMapFoo1 xs
> 
> instance THMapFoo1 HNil where 
>     type THMapFoo1Res HNil = HNil 
>     thMapFoo1 _ = HNil

The following, when enabled, takes ~3.5s. This demonstrates that slowdown occurs with type families as well.

  > testTHMapFoo1 = thMapFoo1 sampleData 

> class THMapFoo2 acc as where 
>       type THMapFoo2Res acc as 
>       thMapFoo2 :: acc -> as -> THMapFoo2Res acc as
> 
> instance (TFoo a, THMapFoo2 (TFooFun a :* acc) as) => THMapFoo2 acc (a :* as) where 
>     type THMapFoo2Res acc (a :* as) = THMapFoo2Res (TFooFun a :* acc) as 
>     thMapFoo2 acc (x :* xs) = thMapFoo2 (tfoo x :* acc) xs
> 
> instance THMapFoo2 acc HNil where 
>     type THMapFoo2Res acc HNil = acc 
>     thMapFoo2 acc _ = acc

The following, when enabled, takes ~0.6s. This demonstrates that the tail recursive transform fixes the slowdown with type families just as with fundeps.

  > testTHMapFoo2 = thMapFoo2 HNil sampleData 

> class THMapFoo3 acc as where 
>       type THMapFoo3Res acc as 
>       thMapFoo3 :: acc -> as -> THMapFoo3Res acc as
> 
> instance (THMapFoo3 (TFooFun a :* acc) as, TFoo a) => THMapFoo3 acc (a :* as) where 
>     type THMapFoo3Res acc (a :* as) = THMapFoo3Res (TFooFun a :* acc) as 
>     thMapFoo3 acc (x :* xs) = thMapFoo3 (tfoo x :* acc) xs
> 
> instance THMapFoo3 acc HNil where 
>     type THMapFoo3Res acc HNil = acc 
>     thMapFoo3 acc _ = acc

The following, when enabled, also takes ~0.6s. This demonstrates that, unlike the fundep case, the order of type class constraints does not, in this instance, affect the performance of type families.

  > testTHMapFoo3 = thMapFoo3 HNil sampleData 

Change History (6)

comment:1 Changed 8 years ago by PHO

Cc: pho@… added

comment:2 Changed 8 years ago by igloo

Milestone: 7.4.1

comment:3 Changed 8 years ago by simonpj

difficulty: Unknown
Test Case: perf/compiler/T5321Fun, T5321FD

I've just tried this with the HEAD (roughly 7.4). Here's what I get.

With functional dependencies:

testHMapFoo1     0.76sec    291Mbyte allocated
testHMapFoo2     0.41sec    161Mbyte allocated
testHMapFoo3     0.43sec    178Gbyte allocated

With type functions:

testHMapFoo1     0.71sec    297Mbyte allocated
testHMapFoo2     0.64sec    274Mbyte allocated
testHMapFoo3     0.65sec    274Gbyte allocated

I'm a bit surprised that the type-function version is slower. Dimitrios: it might be worth profiling to see where the time is going.

But it's reassuring that the performance fragility wrt the order of constraints seems to have gone away.

NB: this is without -dcore-lint, which makes a big difference because in programs like this we get pretty big coercions.

I'm adding T5321Fun and T5321FD as performance tests. Currently all tests are run with -dcore-lint on, so the performance numbers are much worse than those above.

comment:4 Changed 8 years ago by igloo

Milestone: 7.4.17.6.1
Priority: normallow

comment:5 Changed 7 years ago by dimitris

With the changes I've made things have been improved. With functional dependencies:

testHMapFoo1    0.21sec            96MB
testHMapFoo2    0.21sec            95MB
testHMapFoo3    0.20sec            92MB

I dont understand very well your third allocation number Simon (you meant Mbytes?) With type families we get:

testHMapFoo1    0.21sec           117MB
testHMapFoo2    0.29sec           124MB
testHMapFoo3    0.29sec           125MB 

So they are much more comparable (and faster too!). The slightly higher allocation I believe is probably caused because with type families there's evidence around whereas with Derived constraints from fundeps that's not the case.

This all is in my branch but I will push to HEAD shortly. The file is already in the testuite.

comment:6 Changed 7 years ago by simonpj

Resolution: fixed
Status: newclosed

Changes are now in HEAD (I forget the exact commit) and regression tests added.

Note: See TracTickets for help on using tickets.