Opened 4 years ago

Closed 4 years ago

Last modified 2 years ago

#10447 closed bug (fixed)

DeriveFoldable rejects instances with constraints in last argument of data type

Reported by: RyanGlScott Owned by:
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.10.1
Keywords: deriving Cc: core-libraries-committee@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: GHC rejects valid program Test Case: deriving/should_run/T10447
Blocked By: Blocking:
Related Tickets: #8678 Differential Rev(s): Phab:D1031
Wiki Page:

Description (last modified by RyanGlScott)

Currently, the -XDeriveFoldable extension will reject any derived Foldable instance for a data type where the last argument of the type constructor is constrained. For example, using this data type from TcDeriv.hs as inspiration:

{-# LANGUAGE DeriveFoldable, GADTs, StandaloneDeriving #-}
module DeriveFoldableRejected where

data T a b where
    T4 :: Ord b => b -> T a b
    T5 :: b -> T b b
    T6 :: T a (b,b)

deriving instance Foldable (T a)

Compiling DeriveFoldableRejected.hs with GHC 7.10 will currently fail:

DeriveFoldableRejected.hs:9:1:
    Can't make a derived instance of ‘Foldable (T a)’:
      Constructor ‘T4’ must be truly polymorphic in the last argument of the data type
    In the stand-alone deriving instance for ‘Foldable (T a)’
Failed, modules loaded: none.

I don't think this restriction needs to apply to Foldable instances. Unlike Functor and Traversable instances, which require the last argument to be truly universal, Foldable instances can get away without this. To demonstrate, here's a slightly modified T data type, without the constraints:

{-# LANGUAGE DeriveFoldable, GADTs, StandaloneDeriving #-}
{-# OPTIONS_GHC -ddump-deriv #-}
module DeriveFoldableLegal where

data T a b where
    T45 :: b -> T a b
    T6  :: T a b

deriving instance Foldable (T a)

The output of -ddump-deriv is:

Derived instances:
  instance Data.Foldable.Foldable (DeriveFoldableLegal.T a_aDc) where
    Data.Foldable.foldr f_aDd z_aDe (DeriveFoldableLegal.T45 a1_aDf)
      = f_aDd a1_aDf z_aDe
    Data.Foldable.foldr f_aDg z_aDh DeriveFoldableLegal.T6 = z_aDh
    Data.Foldable.foldMap f_aDi (DeriveFoldableLegal.T45 a1_aDj)
      = f_aDi a1_aDj
    Data.Foldable.foldMap f_aDk DeriveFoldableLegal.T6
      = GHC.Base.mempty

Copying this back into DeriveFoldableRejected.hs (after some cleanup):

{-# LANGUAGE DeriveFoldable, GADTs, StandaloneDeriving #-}
module DeriveFoldableRejected where

data T a b where
    T4 :: Ord b => b -> T a b
    T5 :: b -> T b b
    T6 :: T a (b,b)

instance Foldable (T a) where
    foldr f z (T4 a) = f a z
    foldr f z (T5 a) = f a z
    foldr f z T6     = z
    
    foldMap f (T4 a) = f a
    foldMap f (T5 a) = f a
    foldMap f T6     = mempty

reveals that it will compile correctly with the generated code. Therefore, it seems like the check for universality in the last type argument shouldn't be used in -XDeriveFoldable.

Change History (26)

comment:1 Changed 4 years ago by RyanGlScott

Description: modified (diff)

comment:2 Changed 4 years ago by simonpj

Cc: core-libraries-committee@… added

Copying the Core Libraries Committee. What conditions are ok? For example, is this OK?

data T a where
  MkT :: (a~Int) => a -> T a

or this (which is equivalent)

data T a where
  MkT :: Int -> T Int

I think that the 'deriving' code for Functor, Foldable etc behaves specially when it comes across the universally-quantified type variable (a above), so it might get confused if it was Int and not a.

Disabling the test is easy: the question is whether that's the right thing to do.

Simon

comment:3 Changed 4 years ago by RyanGlScott

To be a bit more explicit, TcDeriv specifies these rules for DeriveFunctor, DeriveFoldable, and DeriveTraversable:

-- OK for Functor/Foldable/Traversable class
-- Currently: (a) at least one argument
--            (b) don't use argument contravariantly
--            (c) don't use argument in the wrong place, e.g. data T a = T (X a a)
--            (d) optionally: don't use function types
--            (e) no "stupid context" on data type

These conditions are OK. I propose that these additional rules from TcDeriv only be used for DeriveFunctor and DeriveTraversable, not for DeriveFoldable:

Note [Check that the type variable is truly universal]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For Functor, Foldable, Traversable, we must check that the *last argument*
of the type constructor is used truly universally quantified.  Example

   data T a b where
     T1 :: a -> b -> T a b      -- Fine! Vanilla H-98
     T2 :: b -> c -> T a b      -- Fine! Existential c, but we can still map over 'b'
     T3 :: b -> T Int b         -- Fine! Constraint 'a', but 'b' is still polymorphic
     T4 :: Ord b => b -> T a b  -- No!  'b' is constrained
     T5 :: b -> T b b           -- No!  'b' is constrained
     T6 :: T a (b,b)            -- No!  'b' is constrained

Notice that only the first of these constructors is vanilla H-98. We only
need to take care about the last argument (b in this case).  See Trac #8678.
Eg. for T1-T3 we can write

     fmap f (T1 a b) = T1 a (f b)
     fmap f (T2 b c) = T2 (f b) c
     fmap f (T3 x)   = T3 (f x)

These rules must hold for Functor and Traversable instances because the type signatures of fmap :: Functor t => (a -> b) -> t a -> t b and traverse :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b) require that the last type argument be truly polymorphic, since it must be instantiated with two arbitrary types, t a and t b. As evidence, if you try to make a Functor T instance using what DeriveFunctor would (hypothetically) generate:

{-# LANGUAGE GADTs #-}
module T where

data T a where
    MkT :: Int -> T Int

instance Functor T where
    fmap f (MkT a) = MkT (f a)

it will be rejected:

T.hs:8:22:
    Could not deduce (b ~ Int)
    from the context (a ~ Int)
      bound by a pattern with constructor
                 MkT :: Int -> T Int,
               in an equation for ‘fmap’
      at T.hs:8:13-17
      ‘b’ is a rigid type variable bound by
          the type signature for fmap :: (a -> b) -> T a -> T b at T.hs:8:5
    Expected type: T b
      Actual type: T Int
    Relevant bindings include
      f :: a -> b (bound at T.hs:8:10)
      fmap :: (a -> b) -> T a -> T b (bound at T.hs:8:5)
    In the expression: MkT (f a)
    In an equation for ‘fmap’: fmap f (MkT a) = MkT (f a)

On the other hand, Foldable instances do not require the last type argument to be truly polymorphic, since all of the Foldable methods only parameterize t over a single type variable:

class Foldable t where
    fold :: Monoid m => t m -> m
    foldMap :: Monoid m => (a -> m) -> t a -> m
    foldr :: (a -> b -> b) -> b -> t a -> b
    foldr' :: (a -> b -> b) -> b -> t a -> b
    foldl :: (b -> a -> b) -> b -> t a -> b
    foldl' :: (b -> a -> b) -> b -> t a -> b
    foldr1 :: (a -> a -> a) -> t a -> a
    foldl1 :: (a -> a -> a) -> t a -> a
    toList :: t a -> [a]
    null :: t a -> Bool
    length :: t a -> Int
    elem :: Eq a => a -> t a -> Bool
    maximum :: forall a. Ord a => t a -> a
    minimum :: forall a. Ord a => t a -> a
    sum :: Num a => t a -> a
    product :: Num a => t a -> a

As a result, the type parameter can be safely constrained. If you plug in the code that DeriveFoldable would (hypothetically) generate for T:

{-# LANGUAGE GADTs #-}
module T where

data T a where
    MkT :: Int -> T Int

instance Foldable T where
    foldr f z (MkT a) = f a z
    foldMap f (MkT a) = f a

it compiles without problems, since pattern-matching on the GADT constructor specializes foldr to the type (Int -> b -> b) -> b -> T Int -> b and foldMap to the type Monoid m => (Int -> m) -> T Int -> m.

Again, this only works with Foldable since the type signatures of its methods allow for t a to be refined safely. Functor and Traversable do not permit this, so the universal polymorphism check would still apply for them.

comment:4 Changed 4 years ago by ekmett

Foldable is safe in the presence of any constraint at all on a. It doesn't ever produce one, just consumes them. All occurrences of a in the Foldable dictionary are in negative position, and it can safely ignore any constraints on a you put in place.

Functor and Traversable need to keep the check, though.

comment:5 Changed 4 years ago by simonpj

OK, but can I ask: can you say precisely what Foldable code you would expect to get for these two data types?

data T1 a where
  MkT1 :: (a~Int) => a -> T1 a

data T2 a where
  MkT2 :: Int -> T2 Int

data T3 a where
  MkT3 :: (a~Int) => a -> T3 Int

All three are equivalent. But I am guessing that you intend that they are not equivalent to the "deriving Foldable" algorithm? If so, that had better be clearly stated in the user manual.

(Regrettably, the user manual is silent on how Foldable, Traversable, and Functor are generated. It'd be jolly good to have a wiki page that explained the deriving algorithm; here's how it's done for Eq/Ord etc.)

The documentation is probably the hard bit. The change to lift the condition is pretty easy: look at TcDeriv.cond_functorOK. (This same function actually handles Traversable and Foldable. A little refactoring would be needed to distinguish the Foldable case, perhaps by passing the class name instead of a boolean to the function.)

I'd be happy to review a patch.

Simon

comment:6 Changed 4 years ago by RyanGlScott

The code that I would expect DeriveFoldable to derive for T1, T2, or T3 would be:

instance Foldable T where
    foldr f z (MkT a) = f a z
    foldMap f (MkT a) = f a

which is the same code that you would get for data T a = MkT a deriving Foldable. I don't believe this would require any change to the deriving Foldable algorithm (but I'm not intimately familiar with the implementation details).

I agree that it would be nice to have more documentation on the algorithms themselves. I gained a better intuition for the Functor, Foldable, and Traversable algorithms by reading the comments in TcGenDeriv.hs. For example, here's a somewhat formal description of how deriving Foldable works:

The cases are:

  $(foldr 'a 'b)         =  \x z -> z     -- when b does not contain a
  $(foldr 'a 'a)         =  f
  $(foldr 'a '(b1,b2))   =  \x z -> case x of (x1,x2) -> $(foldr 'a 'b1) x1 ( $(foldr 'a 'b2) x2 z )
  $(foldr 'a '(T b1 b2)) =  \x z -> foldr $(foldr 'a 'b2) z x  -- when a only occurs in the last parameter, b2

comment:7 Changed 4 years ago by shachaf

You expect that the Foldable instance for data T2 a = (a ~ Int) => MkT2 Int give you the Int because there happens to be an Int equality constraint? That seems pretty unintuitive to me. What about e.g. data E = E Int, and data A a = A E vs. data A a = (a ~ Int) => A E, and then inlining E?

I think Simon's examples are worth thinking about, and derived Foldable instances in the presence of equality constraints need some more justification. It's true that Foldable is the wild west of type classes, so parametricity doesn't give us the right answer like for Functor, but I don't expect it to just look for types that happen to be equal.

comment:8 Changed 4 years ago by RyanGlScott

You expect that the Foldable instance for data T2 a = (a ~ Int) => MkT2 Int give you the Int because there happens to be an Int equality constraint?

That is what I am expecting, yes, especially since T1, T2, and T3 are equivalent representations of the same thing.

That seems pretty unintuitive to me. What about e.g. data E = E Int, and data A a = A E vs. data A a = (a ~ Int) => A E, and then inlining E?

I'm not sure how inlining would cause a derived Foldable instance for data A a = (a ~ Int) => A E to fail. Does inlining occur before DeriveFoldable generates its code? If so, I don't see how GHC could confuse E for Int.

derived Foldable instances in the presence of equality constraints need some more justification.

I find myself wanting to derive Foldable instances for deeply embedded DSLs that are represented as GADTs. Here's a simple example from hermit-ghci:

data ShellEffect :: * -> * where
  ShellEffect :: Value -> ShellEffect ()

(which could equivalently be represented as ShellEffect :: a ~ () => Value -> ShellEffect a)

This isn't a Functor or Traversable, but it readly admits a Foldable instance which DeriveFoldable would be able to infer:

instance Foldable ShellEffect where
    foldr f z (ShellEffect a) = z
    foldMap f (ShellEffect a) = mempty

comment:9 Changed 4 years ago by RyanGlScott

I'm probably making an assumption about how robust GHC's equality constraint solver is, but I would think that it'd be capable enough to infer whether an argument's type is equal to the type parameter. If it were undecidable, then wouldn't this (legal) code have to be rejected?

data S a b where
    MkS :: (a ~ Int) => Int -> S a b

deriving instance Foldable (S a)

comment:10 Changed 4 years ago by dolio

Also, for instance

    data Refine a where
      RefineI :: Refine Int
      RefineD :: Refine Double

    data Foo a = Foo (Refine a) Int

Should a Foldable instance for Foo test the Refine a and fold over the Int if it's RefineI?

comment:11 in reply to:  10 Changed 4 years ago by RyanGlScott

Replying to dolio:

Also, for instance

    data Refine a where
      RefineI :: Refine Int
      RefineD :: Refine Double

    data Foo a = Foo (Refine a) Int

Should a Foldable instance for Foo test the Refine a and fold over the Int if it's RefineI?

I assume that Refine is also an instance of Foldable? If so, the respective derived Foldable instances of Refine and Foo would be:

instance Foldable Refine where
    foldr f z RefineI = z
    foldr f z RefineD = z
    foldMap f RefineI = mempty
    foldMap f RefineD = mempty

instance Foldable Foo where
    foldr f z (Foo a1 a2) = foldr f z a1
    foldMap f (Foo a1 a2) = mappend (foldMap f a1) mempty

So no, the derived instance wouldn't fold over the Int, since the Foldable Foo instance would never pattern match on a constructor of Refine.

I don't see this as a problem, especially since this seems consistent with the behavior of deriving Generic1 statements, which also check if a field has the same type as the last type parameter. For example, if you derived a Generic1 instance for Foo, it would give:

type Rep1 Foo = D1 D1Foo (C1 C1_0Foo (S1 NoSelector (Rec1 Refine)
                                  :*: S1 NoSelector (Rec0 Int)))

So deriving Generic1 would also consider the a in Refine a to be distinct from Int, as evidenced by their differing Rec1 and Rec0 wrappers.

comment:12 Changed 4 years ago by dolio

I'm not asking if GHC would derive the instance I specified. I'm asking whether it's an okay Foldable instance based on the same reasoning as for:

data Bar a where
  Bar :: (a ~ Int) -> Int -> Bar a

and the like. "Can we automatically do this," is not the only relevant question. "Should we do this and why," also matters.

comment:13 Changed 4 years ago by RyanGlScott

I think it's entirely reasonable for a derived Foldable Foo instance not to fold over the Int argument. If it did fold over the Int argument, then the implementation of foldr and foldMap would have to pattern-match the Refine argument, making it resemble this:

instance Foldable Foo where
    foldr f z (Foo RefineI a2) = f a2 z
    foldr f z (Foo RefineD a2) = z
    foldMap f (Foo RefineI a2) = mappend mempty (f a2)
    foldMap f (Foo RefineD a2) = mappend mempty mempty

At this point, though, we are no longer defining foldr and foldMap in terms of the Foldable Refine. This feels off to me, since

  1. Almost every other deriving typeclass extension dispatches to arguments' implementations of that particular typeclass (e.g., in data Bar a = Bar (Maybe a) [a] deriving Show, the derived Show (Bar a) instance relies on the Show instances for Maybe a and [a]).
  2. If a user were to manually define an "untraditional" Foldable instance for Refine, the derived Foldable Foo instance wouldn't make use of it.

In addition, requiring derived instances to pattern-match constructors of GADT arguments could break other code, e.g.,

data Refine a where
    RefineI :: Refine Int
    RefineD :: Refine Double

instance Functor Refine where
    fmap = undefined

data Foo a = Foo (Refine a) Int deriving Functor

Since pattern-matching on Refine would refine the type of fmap in the derived Functor Foo instance to something that isn't universally polymorphic.

comment:14 Changed 4 years ago by ekmett

Simon's examples definitely give me pause.

So putting aside type equality constraints for a moment, the other cases seem unambiguous.

This would allow things like

data Set a where
  Empty :: Set a
  NonEmpty :: Ord a => Tree a

but wouldn't get us

data Val a where
  Int :: Int -> Val Int
  Char :: Char -> Val Char

To get there, it seems to me that shachaf's E example seems a step too far. Given

data Bar a = Bar a
data Foo a = Foo (Bar a b)

We don't attempt to walk into Bar and recursing into something that isn't the last argument or relying on expansion seems "deeply magic", and ill specified -- where does the recursion stop?

On the other hand, if we take Simon's examples, and say it'd be nice to generate the same code for semantically equal data declarations, and to allow whatever constraints we want, then using the type equality constraint (a ~ X) as part of the check for naked as and accepting a naked X as an a seems very coherent.

Then in the presence of (a ~ X), folding over all X's immediately inside the structure we're in and delegating all (f X)'s to the Foldable for f would be consistent with the existing behavior.

If you wanted something more exotic where you skipped over some of them, you'd have to write a hand-rolled instance.

comment:15 Changed 4 years ago by simonpj

This is interesting, but I lack the bandwidth to follow it in detail.

I urge against a design that requires the programmer to predict the behaviour of the constraint solver in order to decide what code the "deriving" will generate. I think it's be much better to say that it depends on the syntactic form of the declaration. So for MkT1 and MkT3 you'd get the first argument included in the fold, but for MkT2 you would not.

You need to think what to do about cases like this:

data S a where
  MKS :: b -> c -> S (b,c)

Perhaps y'all can work out a design, document it on a wiki page, and even implement it. (I can advise.)

I'd really love the same page to document the behaviour of deriving for Functor and Traversable too!

Many thanks

Simon

comment:16 Changed 4 years ago by dolio

Yeah, I'm not a big fan (at the moment) of having equality constraints and the like pick up fields that aren't syntactically identical. For instance:

data Foo b a where
  MkFoo :: (T b ~ a) => a -> b -> Int -> Foo b a
 deriving Foldable

Now if T b reduces to Int, we fold over the first and third argument, but if it doesn't, we only fold over the first (because we can't do anything else)? I could probably agree with having it succeed and always fold over the first field, because it's syntactically an a. But I feel like that should be the default behavior, not trying to deduce things from the constraints.

Also consider:

class (b ~ Int) => Foo a b where ...
class Foo a b => Bar a b where ...

data Quux a b where
  MkQ :: Bar a b => Int -> a -> b -> Quux a b

Does this pick up the Int because if we trace back from Bar a b we get b ~ Int? What about more complicated examples?

comment:17 Changed 4 years ago by RyanGlScott

Well, it seems like the most controversial part of this is folding over types that aren't syntactically equivalent to the return type. I'm certainly open to the idea of not folding over types if it's not immediately clear that they're equal, but I'm not sure how many levels of indirection would be considered acceptable. In this data type:

class (a ~ Int) => Foo a

data A a where
  A1 :: Ord a            => a        -> T a
  A2 ::                     Int      -> T Int
  A3 :: b ~ Int          => b        -> T Int
  A4 :: a ~ Int          => Int      -> T a
  A5 :: a ~ Int          => a        -> T a
  A6 :: (a ~ b, b ~ Int) => Int -> b -> T a
  A7 :: Foo a            => Int -> a -> T a

How we would we define "syntactic equality" here? It seems like A1 and A5 are pretty uncontroversial here, but the other constructors require (at least) one type substitution to infer that their fields could be folded over. Which of these constructors would you all consider to be acceptable for folding over? I'd need to get an idea of this before responding to dolio's questions.

Last edited 4 years ago by RyanGlScott (previous) (diff)

comment:18 Changed 4 years ago by simonpj

I vote for, well, syntactic equality. That means A1, A5, and A7, but none of the others.

Simon

comment:19 Changed 4 years ago by ekmett

That seems to make sense to me as well.

comment:20 Changed 4 years ago by RyanGlScott

OK, I can live with that. That way, I can derive Foldable for any GADT and still fold over all the terms that have types equal to the last type parameter (up to rewriting Con :: PlainTy -> PlainTy -> ... -> GADT PlainTy as Con :: a ~ PlainTy => a -> a -> ... -> GADT a).

I'm not the original author of the DeriveFunctor/Foldable/Traversable extensions, but I'd be willing to help write a wiki article on how they currently work (and how DeriveFoldable will eventually work). Is there a good template article from which I could adapt? All I know about the deriving algorithms is what I gleaned from comments in TcDeriv.hs, as well as from TcGenDeriv.hs, but they may be too formal for wiki purposes.

comment:21 Changed 4 years ago by RyanGlScott

Here's a (rough draft of) a wiki article about the DeriveFunctor, DeriveFoldable, and DeriveTraversable trio, included the proposed change from this feature request. Feedback is very much appreciated.

comment:22 Changed 4 years ago by RyanGlScott

Differential Rev(s): Phab:D1031

comment:23 Changed 4 years ago by Ben Gamari <ben@…>

In 2c5c29722c78e089eda0baa7ff89154b58f23165/ghc:

DeriveFoldable for data types with existential constraints (#10447)

Reviewers: dolio, shachaf, ekmett, austin, #core_libraries_committee,
simonpj, bgamari

Reviewed By: simonpj, bgamari

Subscribers: thomie, bgamari

Differential Revision: https://phabricator.haskell.org/D1031

GHC Trac Issues: #10447

comment:24 Changed 4 years ago by thomie

Milestone: 7.12.1
Resolution: fixed
Status: newclosed
Test Case: deriving/should_run/T10447

comment:25 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:26 Changed 2 years ago by RyanGlScott

Keywords: deriving added
Note: See TracTickets for help on using tickets.