Opened 3 years ago

Last modified 9 months ago

#13065 new feature request

Prohibit user-defined Generic and Generic1 instances

Reported by: dfeuer Owned by:
Priority: normal Milestone: 8.10.1
Component: Compiler Version: 8.0.1
Keywords: Generics Cc: RyanGlScott
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Other Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

User-defined Generic and Generic1 instances are problematic.

They are susceptible to breakage

Some details of the classes may change between GHC versions, and indeed have done so in the past. User-defined instances are likely to break in the face of various such "internal" changes. This is one reason why Data.Sequence, for example, does not have a Generic instance.

They require potentially-expensive consistency checks

GHC cannot assume that every type has at most one Generic and Generic1 instance, so it needs to look for possible alternatives at instance resolution time. According to Simon (and maybe also Simon), this may be partly responsible for the performance regressions seen in Phab:D2899.

Downsides

Prohibiting user-defined instances does have some costs. Suppose a type was originally defined concretely, exposing its constructors and a Generic instance. The implementer may decide later to make the type abstract, and export pattern synonyms to retain the same interface. But the Generic instance will either change or disappear. Someone relying on that instance could be in trouble. If the instance disappears, they'll be forced to write code by hand that they didn't need to before. If it changes, their code may change its behavior unexpectedly.

Change History (13)

comment:1 Changed 3 years ago by simonpj

Cc: RyanGlScott added

I'd like to know what Ryan Scott thinks of this idea. We have some classes (like Typeable) where you can't give user instances, but it seems less clear-cut for Generic.

The efficiency issue is to do with the costs of doing the type-family consistency checks for all the Generic instances.

comment:2 Changed 3 years ago by dfeuer

Yes, sorry. I forgot the real troubles were Rep and Rep1 rather than Generic and Generic1. One possible variation might be to disassociate the families from the classes, and only restrict the former.

comment:3 in reply to:  description Changed 3 years ago by RyanGlScott

I believe I support the general idea in this proposal, although I'd like to add some clarifications to the reasons you've listed.

Clarification number one is that hand-written Generic instances are already forbidden as of GHC 7.10, provided you have the Safe extension enabled.


Replying to dfeuer:

User-defined Generic and Generic1 instances are problematic.

Agreed. Generic is truly unique in that any given Generic instance can and should be determined solely by the algebraic structure of the datatype. Any other use of Generic is an abuse of its intended purpose, in my opinion.

They are susceptible to breakage

Some details of the classes may change between GHC versions, and indeed have done so in the past. User-defined instances are likely to break in the face of various such "internal" changes. This is one reason why Data.Sequence, for example, does not have a Generic instance.

It's true that we've changed the internal details of GHC.Generics before, but believe me when I say that we tried to preserve backwards compatibility as much as possible :) There are a good many other breaking changes we could make, but we haven't yet (see #7492). With CPP, it's actually quite feasible to maintain fancy type-level generics hackery all the way back to GHC 7.2.

That being said, I would argue that Data.Sequence shouldn't have a Generic instance for a different reason: it's an abstract type. A Generic instance necessarily leaks every implementation detail about your datatype, so having a Generic instance for an abstract type is nonsense.

They require potentially-expensive consistency checks

GHC cannot assume that every type has at most one Generic and Generic1 instance, so it needs to look for possible alternatives at instance resolution time. According to Simon (and maybe also Simon), this may be partly responsible for the performance regressions seen in Phab:D2899.

If we're pursing this goal in the name of compiler performance (see also #12731, which I believe is very relevant here), we need to be careful in stating our goals. As I stated here, it wouldn't be enough to disallow hand-written Generic instances. You'd also need to check that there's only one Generic instance per datatype, lest you end up with a scenario like this (which is currently possible in GHC 8.0):

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}

import GHC.Generics

data Foo a = Foo a
deriving instance {-# OVERLAPPABLE #-} Generic (Foo a)
deriving instance {-# OVERLAPPING  #-} Generic (Foo Int)

Downsides

Prohibiting user-defined instances does have some costs. Suppose a type was originally defined concretely, exposing its constructors and a Generic instance. The implementer may decide later to make the type abstract, and export pattern synonyms to retain the same interface. But the Generic instance will either change or disappear. Someone relying on that instance could be in trouble. If the instance disappears, they'll be forced to write code by hand that they didn't need to before. If it changes, their code may change its behavior unexpectedly.

I don't follow this argument. If the definition of a datatype changes, then either it's Generic instance must change, or you should just remove the Generic instance altogether. Full stop. Anything else would be a lie.


To contribute one more "downside" of this proposal, drawing from my own experience, there is only one scenario where I've found hand-written Generic instances to be necessary: backwards compatibility. In old versions of GHC, there were numerous bugs that prevented you from deriving Generic instances for sufficiently polykinded datatypes, forcing you to resort to manual implementation. But these bugs have all been squashed, AFAICT, so I don't see any reason why we'd need to worry about this going forward.

In fact, manually implementing Generic instances is precisely how the generic-deriving library allows you to define Generic instances via Template Haskell (in case deriving Generic is buggy on an old GHC). So if we did implement this proposal, that part of generic-deriving would no longer work. But that's a sacrifice I think I'd be OK with.


If we do decide to press on with this propsal, let's not let this Trac ticket be the end of the discussion. Let's advertise this change as a concrete GHC proposal first. I bet there's someone out there who is relying on hand-written Generic instances who hasn't spoken up, and I'd hate to break their code without warning.

comment:4 Changed 3 years ago by rwbarton

Can you comment on why this is different from our stance on Data, which addresses largely similar problems but for which custom instances are apparently approved (e.g., Data.Map.Map has one)?

comment:5 Changed 3 years ago by RyanGlScott

I'm not as experienced with Data.Data as I am with GHC.Generics, so take my comments with a grain of salt. But the impression that I've gathered after working with it a little bit is that Data.Data has slightly different goals than GHC.Generics.

With the former (Data), you mostly care about generically walking over data structures. That seems to explain why Data.Data gives you only a bare-minimum amount of metadata for each datatype and constructor, as they're only useful insofar as they serve as signposts when you're deep in a datatype traversal. Moreover, the documentation in Data.Data mentions the phrase "public representation" in several places. It seems acceptable to lie a little in certain Data instances—e.g., in primitive datatypes like Int, or abstract types like Map—in order to give programmers some interface that they can walk over.

With the latter (Generic), there's an expectation that the representation type provides a faithful view of the datatype's structure and metadata. It occurs to me that we really don't state this expectation anywhere in the documentation, but this is nonetheless the reasoning that David Terei alluded to when he made the change to disallow hand-written Generic instances in 578fbeca31dd3d755e24e910c3a7327f92bc4ee3. If we allow users to hand-write their Generic instances, then I feel that we are allowing them to shoot themselves in the foot.

Of course, the other thing that Data and Generic have in common is that they facilitate the ability to reduce boilerplate code. But even if you can't write Generic instances for your abstract types, you can still use GHC.Generics to help reduce boilerplate: just use the Generic instance for an isomorphic type. For instance, if you want to think of Data.Map as a list of key-value pairs, then you have the power to do so:

module Main (main) where

import Data.Map
import Generics.Deriving.Eq
import GHC.Generics

eqMapKVList :: (GEq k, GEq v) => Map k v -> Map k v -> Bool
eqMapKVList x y = geq (toListRep x) (toListRep y)
  where
    toListRep :: Map k v -> Rep [(k, v)] ()
    toListRep = from . toList

main :: IO ()
main = do
    print $ eqMapKVList (fromList [('a', 'b')]) (fromList [('a', 'b')])
    print $ eqMapKVList (fromList [('a', 'b')]) (fromList [('a', 'c')])

comment:6 Changed 3 years ago by dfeuer

One place I think we still can't derive (seemingly reasonable) Generic instances is with GADTs. For example, today I can write

{-# LANGUAGE DeriveGeneric, StandaloneDeriving, GADTs, DataKinds, KindSignatures,
     FlexibleInstances, TypeFamilies #-}

module GenGADT where
import GHC.Generics

data Foo :: Bool -> * -> * where
  X :: a -> Foo 'False a
  Y :: a -> Foo 'True a

instance Generic (Foo 'False a) where
  type Rep (Foo 'False a) =
     D1 ('MetaData "Foo 'False" "GenGADT" "" 'False)
       (C1 ('MetaCons "X" 'PrefixI 'False)
         (Rec0 a))
  to (M1 (M1 (K1 a))) = X a
  from (X a) = M1 (M1 (K1 a))

instance Generic (Foo 'True a) where
  type Rep (Foo 'True a) =
     D1 ('MetaData "Foo 'True" "GenGADT" "" 'False)
       (C1 ('MetaCons "Y" 'PrefixI 'False)
         (Rec0 a))
  to (M1 (M1 (K1 a))) = Y a
  from (Y a) = M1 (M1 (K1 a))

but I don't think GHC is able to derive such instances.

I'm more concerned about backwards compatibility issues, though. As soon as a library chooses to derive a Generic instance for a type, that instance becomes part of the library API. Users may well come to rely on the existence of that instance, and also some of its details. If we prohibit custom instances, won't that strongly discourage libraries from deriving Generic for any but the most trivial exposed types? Let me get back to pattern synonyms. Suppose we have

data Tree a = Tree a [Tree a] deriving Generic

and we decide we want to play around with bifunctors, so we redefine this as

--newtype Fix p a = In {out :: p (Fix p a) a}
--instance Bifunctor p => Functor (Fix p) where ...

data TreeF t a = TreeF a [t]
instance Bifunctor TreeF where ...
newtype Tree a = Tree (Fix Tree) deriving Functor
-- et cetera

We can recover most of the original interface using bidirectional pattern synonyms to work around the newtypes. But if we can't write our own Generic instance, we'll break everything. Existing library users won't define instances for TreeF, so their Generic-derived instances for Tree will no longer pass the type checker. Ouch. A recent containers version added Generic and Generic1 for Data.Tree; had the prohibition been under discussion at the time, I'd have thought twice and thrice about whether that was wise.

I don't think this issue was quite as significant before the rise of pattern synonyms; in that era, any structural change to an exported transparent datatype was necessarily a breaking one. With pattern synonyms, there are more places we can change representations, and therefore more reasons to avoid preventing such changes.

comment:7 in reply to:  6 Changed 3 years ago by RyanGlScott

GHC isn't able to derive Generic instances for GADTs since as it stands today, there's no sensible way to do it. The ones you hand-defined are bogus, since they would have you believe that there's two separate datatypes named Foo, each with different constructors. And then there's the separate issue that they don't capture the 'True and 'False existential equality information.

I don't see what pattern synonyms change here. The issue is fundamentally about what promises Generic instances make, and if we want them to be honest, then we cannot allow hand-defined Generic instances at all, even if they might be temptingly convenient. GHC.Generics is a feature which necessarily mirrors the definition of datatype, so if a datatype's internal structure changes, then its Generic instance must change too. Full stop.

comment:8 Changed 3 years ago by RyanGlScott

About two weeks have passed since this was first proposed, and there doesn't seem to be any definite consensus. In fact, I'm not sure if you even want this to be implemented anymore, since you seem reticent to give up the ability to define custom Generic instances. If you do wish to pursue this further, I'd highly recommend creating a GHC proposal for it, since this seems to be a topic of contention.

comment:9 in reply to:  8 Changed 3 years ago by dfeuer

Replying to RyanGlScott:

About two weeks have passed since this was first proposed, and there doesn't seem to be any definite consensus. In fact, I'm not sure if you even want this to be implemented anymore, since you seem reticent to give up the ability to define custom Generic instances. If you do wish to pursue this further, I'd highly recommend creating a GHC proposal for it, since this seems to be a topic of contention.

I am of two minds. I see very good reasons to make this change, and I see very good reasons to be wary of it. I'm not the main person pushing for it; I'm just the one who wrote it up.

comment:10 Changed 3 years ago by simonpj

OK, well, since there are good reasons to be wary let's not rush to make a change. The status quo is by definition cheaper.

While the idea was precipitated by the question of tuples (and the overhead of reading interface files etc), it should not be driven by that. Maybe there's another way to address the perf question. It was only that, if there was a quick consensus, it seemed like an easy way.

comment:11 Changed 20 months ago by bgamari

Milestone: 8.4.18.6.1

This ticket won't be resolved in 8.4; remilestoning for 8.6. Do holler if you are affected by this or would otherwise like to work on it.

comment:12 Changed 15 months ago by bgamari

Milestone: 8.6.18.8.1

These won't be fixed for 8.6, bumping to 8.8.

comment:13 Changed 9 months ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of low-priority tickets.

Note: See TracTickets for help on using tickets.