Version 7 (modified by goldfire, 5 years ago) (diff)

--

# Kind inference for types and classes

## Baseline strategy (BASELINE)

This plan, originally due to Mark Jones, is the strategy that GHC 7.8 follows for ordinary, recursive term-level functions, and for recursive data types. I'll describe it for data types, with this example:

   data SS f a b = MkSS (TT a f) (SS f a b)
data TT (a::k) (f::k -> *) :: * where
MkTT :: f a -> SS f a Maybe -> SS f a Int -> TT a f

1. Identify which type constructors have Complete User Type Signatures (CUSK). In this example, TT does. Extend the environment with these, fixed, kinds:
       TT :: forall k. k -> (k->*) -> *

2. Perform strongly-connected component (SCC) analysis on the non-CUSK decls, ignoring dependencies on a type constructor with a CUSK. In our example, we get a single recursive SCC, containing SS.
1. For each SCC in turn:
• Bind the type constructor to a fresh meta-kind variable:
        SS :: kappa0

• Kind-check all the declarations of the SCC in this environment. This will generate some unifications, so in the end we get
        kappa0 ~ (kappa1 -> *) -> kappa1 -> kappa2 -> *

The kappa1 arises from instantiating TT at its call site in SS
• Generalise. So we get
        SS :: forall k1 k2. (k1->*) -> k1 -> k2 -> *

2. Extend the environment with these generalised kind bindings, and kind-check the CUSK declarations.

The Key Point is that we can kind-check SS without looking at TT's definition at all, because we completely know TT's kind. That in turn means that we can exploit inferred polymorphism for SS when kind-checking TT. As we do here: TT uses SS in two different ways (SS f a Maybe) and (SS f a Int).

Note that for a non-recursive type or class declaration, (BASELINE) always works fine.

## Partial kind signature strategy (PARTIAL)

The key idea is that all polymorphism is declared, so nothing gets to be kind-polymorphic unless you say so. But the payoff is that you can give partial kind signatures. Here's the strategy.

1. Sort the declarations into SCCs. No special treatment for CUSKs.
1. For each declaration, extend the environment with a kind binding that has a forall for each explicit user-written kind variable, but meta-kind variables otherwise. These kind annotations amount to partial kind signatures. For example
      data Foo (a :: k1 -> k1) b c = ...

would get a kind binding
      Foo :: forall k1. (k1->k1) -> kappa1 -> kappa2 -> *

Our earlier example would give
      T :: forall k. k -> (k->*) -> *
S :: kappa3 -> kappa4 -> kappa5 -> *

1. Kind-check the declartions in this environment. At a call of Foo, say, we'd instantiate the forall k1 with a fresh meta-kind variable, but would share kappa1, kappa2 among all calls to Foo.
1. Default any unconstrained meta kind variables to *

That's it! No generalisation step. The only polymorphism is that declared by the user.

So our earlier SS/TT example would be rejected because it relies on S being polymorphic in its third parameter. If you want the SS/TT example to work you could write

   data SS (f::k1->*) (a::k1) (b::k2) = MkSS (TT a f) (SS f a b)
data TT (a::k) (f::k->*) where
MkTT :: f a -> SS f a Maybe -> SS f a Int -> TT a f


I believe that if you want polymorphism in k, you must decorate all the places that k appears. For example, this won't work:

  data Foo (a::k) f = MkFoo (f a) (Foo a f)


because in step 2 we get the kind signature

  Foo :: forall k. k -> kappa1 -> *


where kappa1 is a unification variable shared among all calls to Foo in the SCC. I don't think we can then allow kappa1 to be unified with anything involving k. This is a tricky point.

Richard: I don't understand this last point. Are you saying that the described algorithm does not do this unification and thus would reject Foo? Or, are you saying that the proposed strategy should not do this unification, as a point of design that might be friendlier to users? End Richard

## Generalised partial kind signature strategy (PARGEN)

The (PARGEN) strategy is exactly like (PARTIAL) except that step 4 is different:

1. Generalise over any unconstrained meta kind variable (that is not free in the environment), rather than defaulting to *.

Richard: What meta variables can be free in the environment at this point? We're operating at the top level. End Richard

So we use the partial kind signatures to express any polymorphism necessary for recursion inside the SCC, but perhaps infer yet more polymorphism that can be used after the SCC. Thus:

data T f a = MkT (f a) (T f a)
-- Success:  T :: forall k. (k->*) -> k -> *

data S f a = MkS (f a) (S Maybe Int) (S Monad Maybe)
-- Failure: needs polymorphic recursion

data S2 f (a::k) = MkS (f a) (S Maybe Int) (S Monad Maybe)
-- Success: needs polymorphic recursion


## All of the above (ALL)

Combine (BASELINE), for the CUSK stuff, with (PARGEN) for type with partial kind signatures.

## Type signatures

Richard: I'm not sure what the upshot of this section is. In type signatures, it feels like we're using an algorithm other than (BASELINE), because polymorphic recursion on kinds works just fine without any mention of kind variables. I suppose this is because the body of a function is considered outside of its type signature's SCC and is not considered when doing kind inference. Given that recursion in a type signature is not possible (we can't mention terms in types), I can't quite figure out what differentiates the strategies in type signatures. End Richard

Another place that we currently (i.e. using (BASELINE)) do kind generalisation is in type signatures. If you write

f :: m a -> m a
f = ...


then the type signature is kind-generalised thus:

This user-written signature
f :: m a -> m a
means this (BASELINE)
f :: forall k (a:k) (m:k->*). m a -> m a


And f's RHS had better be that polymorphic.

Under (PARTIAL) it would be consistent to say this:

This user-written signature
f :: m a -> m a
means this (PARTIAL)
f :: forall (a:*) (m:k->*). m a -> m a


If you want the kind-polymorphic one, you'd have to write thus

This user-written signature
f :: forall k (a:k) (m:k->*). m a -> m a
means this (PARTIAL)
f :: forall k (a:k) (m:k->*). m a -> m a


## Declarative typing rules

Richard: I'm similarly unsure of this section. I like thinking in terms of typing rules, but I want rules about datatype declarations, not function declarations. I agree with the conclusions here, but I can't figure out how (PARGEN) would look different from (BASELINE) in this presentation. End Richard

I think that (PARTIAL) has a nice declarative typing rule.

Here is what the conventional declarative typing rule, in the absence of polymorphism for a single self-recursive function looks like:

        G, f:t |- e:t
G, f:t |- b:t'
---------------------------
G |- letrec f = e in b : t'


Here the "t" is a monotype (no foralls) that the declarative typing rules clairvoyantly conjures up out of thin air.

Once you add Hindley-Milner style polymorphism, the rule gets a bit more complicated

        G, f:t |- e:t
G, f:gen(G,t) |- b:t'
---------------------------
G |- letrec f = e in b : t'


where 'gen' is generalising.

The (PARTIAL) rule might look like this:

        t = forall vs. sig[t1..tn/_]
vs \not\in ti
G, f : t |- e : forall vs.t
G, f : t |- b:t'
---------------------------
G |- letrec f :: forall vs. sig; f = e in b : t'


Here I'm expressing the user-specified knowledge as a signature forall vs.sig, with '_' for bits you don't want to specify.

       f :: forall a. _ -> a -> _


Then the rule intantiates each '_' with a clairvoyantly guessed monotype (provided it does not mention the 'vs', or 'a' in this example), and off you go.

## Reflection

I think we could reasonably switch to (PARTIAL) throughout.

As Richard's comments in TcHsType point out, we don't want maximal polymorphism. His example is:

    type family F a where
F Int = Bool
F Bool = Char


We could generate

   F :: forall k1 k2. k1 -> k2


so that (F Maybe) is well-kinded, but stuck. But that's probably not what we want. It would be better to get F :: * -> *

    type family G a f b where
G Int  f b = f b
G Bool f b = Char -> f b


You could just about argue that the programmer intends

   F :: forall k. * -> (k->*) -> k -> *


It's quite similar to this:

  data PT f a = MkPT (f a)


which today, using (BASELINE), we infer to have kind

  PT :: forall k. (k->*) -> k -> *


But I'd be perfectly happy if PT got a monomorphic inferred kind, which is what (PARTIAL) would do:

  PT :: (*->*) -> * -> *


If you want the poly-kinded PT, use a signature:

  -- Any of these would do
data PT f             (a :: k) = MkPT (f a)
data PT (f :: k -> *) a        = MkPT (f a)
data PT (f :: k -> *) (a :: k) = MkPT (f a)


One oddity is that we'd do (BASELINE) for terms and (PARTIAL) for types. But perhaps that's ok. They are different.

• Terms ought to be as polymorphic as possible but arguably not types. Examples above. Also, since kind polymorphism is still in its infancy, maybe it's no bad thing if all kind polymorphism is explicitly signalled every time a kind-polymorphic binder is introduced.
• Terms have well-established separate type signatures, but we don't have a syntax for separate kind signatures of types and classes.

If we moved from (BASELINE) to (PARTIAL), some programs that work now would fail:

• the original S/T example above
• a data type like PT where the user did actually want the kind-polymorphic version.

But that might be a price worth paying for the simplicity, uniformity, and predictability you'd get in exchange.

Richard: I think changing to (PARTIAL) throughout would be a mistake, as lots of code would fail to compile. Kind polymorphism by default in datatypes and classes has been around since 7.4, and I suspect there is quite a bit of code that such a change would disrupt.

On the other hand, I think changing to (PARGEN) throughout would work nicely. I believe that it would allow all current code to type-check (except for the weird example that probably should be rejected in #9201). If we were to choose (PARGEN) over (ALL), it's possible that some code would become more polymorphic, as (PARGEN) is more polymorphic than (BASELINE) in the presence of a CUSK. However, I don't believe that this could be a breaking change, and I would prefer going with (PARGEN) over (ALL) for the sake of simplicity -- no need to have two systems around.

I can't figure out a way that (BASELINE) and (PARGEN) are different in type signatures for terms. This version doesn't have quite as nice a declarative typing rule because the type is generalized over kind variables that go completely unmentioned in the type -- a straightforward forall ftv(t). t doesn't quite do it. We need to generalize over seen variables, infer kinds, and then generalize over meta-kind variables. But, this is what is done today.

(Because open type families do not have a body, they would still need their own kind inference story, where unconstrained meta-variables default to *.)

In comment:5:ticket:9200, I discuss "good" polymorphism and "bad" polymorphism. This discussion, in retrospect, seems tangential at this point. It really only makes sense when discussing closed type families, which aren't at the heart of the problems here. End Richard