Changes between Version 41 and Version 42 of Records/OverloadedRecordFields/Plan

Mar 2, 2014 1:56:09 PM (6 years ago)

Replace with redirect to new OverloadedRecordFields index page


  • Records/OverloadedRecordFields/Plan

    v41 v42  
    1 = Overloaded record fields: a plan =
    3 This is a plan to implement overloaded record fields, along the lines of SPJ's [wiki:Records/OverloadedRecordFields Simple Overloaded Record Fields] proposal, as a Google Summer of Code project. (See the [ GSoC project details], for reference.) The page on [wiki:Records Records] gives the motivation and many options.  In particular, the proposal for [wiki:Records/DeclaredOverloadedRecordFields Declared Overloaded Record Fields] is closely related but makes some different design decisions.
    5 This page describes the design. Separate [wiki:Records/OverloadedRecordFields/Implementation notes on the implementation] are available, but not necessarily comprehensible. Development of the extension is taking place on forks of the [ ghc], [ packages-base] and [ haddock] repositories (on branch 'overloaded-record-fields'). A [ prototype implementation] is also available.
    7 === Motivation ===
    9 A serious limitation of the Haskell record system is the inability to overload field names in record types: for example, if the data types
    11 {{{
    12 data Person  = Person  { personId :: Int, name :: String }
    13 data Address = Address { personId :: Int, address :: String }
    14 }}}
    16 are declared in the same module, there is no way to determine which type an occurrence of the `personId` record selector refers to. A common workaround is to use a unique prefix for each record type, but this leads to less clear code and obfuscates relationships between fields of different records. Qualified names can be used to distinguish record selectors from different modules, but using one module per record is often impractical.
    18 Instead, we want to be able to write polymorphic record projections, so that the ambiguous identifier `personId` is resolved using the type of `e`. In general, this requires a new form of constraint `r { x :: t }` stating that type `r` has a field `x` of type `t`. For example, the following declaration should be accepted:
    20 {{{
    21 getPersonId :: r { personId :: Int } => r -> Int
    22 getPersonId e = personId e
    23 }}}
    25 A constraint `R { x :: t }` is solved if `R` is a datatype that has a field `x` of type `t` in scope. An error is generated if `R` has no field called `x`, it has the wrong type, or the field is not in scope.
    28 == Design ==
    30 In the sequel, we will describe the `-XOverloadedRecordFields` extension, which permits multiple field declarations with the same label and introduces new record field constraints.
    32 Previous versions of this proposal suggested changing the lexical syntax so that record projections could be written postfix, immediately following a dot. For example, `e.personId` would be roughly equivalent to `personId e`. This would be a breaking change (when the extension was enabled) as composition would need spaces around the dot operator. However, it would mean that the field name would not have to be in scope, allowing better library separation. For example, `e.personId` would be valid even if no `personId` fields were in scope.
    34 In the light of feedback, we propose '''no changes to dot syntax''' for the time being. In the future, we could add a separate extension to treat [wiki:Records/DeclaredOverloadedRecordFields/DotPostfix dot as postfix function application]. Note that the [ lens] library encourages the use of dot with no spaces, as composition is used to chain lenses.
    37 === Record field constraints ===
    39 A record field constraint is introduced when a field is used in an expression. If every `x` in scope is a record field, then an occurrence of `x` has type `a { x :: b } => a -> b` (roughly) instead of generating an ambiguity error. The overloaded `x` is translated using a typeclass, described below. If there are any normal identifiers `x` in scope (as well as fields) then a use of `x` leads to an ambiguity error.
    41 A record field constraint `r { x :: t }` is syntactic sugar for the constraint `Has r "x" t`, where
    43 {{{
    44 type family FldTy (r :: *) (n :: Symbol) :: *
    46 class t ~ FldTy r n => Has r (n :: Symbol) t where
    47   getField :: Proxy# n -> r -> t
    48 }}}
    50 Recall that `Symbol` is the kind of type-level strings. Roughly speaking, an occurrence of a field name `x` is translated into `getField (proxy# :: Proxy# "x")`. (Actually a slightly more general translation is used, as [#Lensintegration discussed below].) The type `Proxy#` is zero-width, so it will be erased at runtime, and is used to pass in the type-level string argument, since we don't have explicit type application (yet).
    52 The syntactic sugar extends to conjunctions:  `r {x :: tx, y :: ty}` means `(Has r "x" tx, Has r "y" ty)`. Note also that `r` and `t` might be arbitrary types, not just type variables or type constructors.  For example, `T (Maybe v) { x :: [Maybe v] }` means `Has (T (Maybe b)) "x" [Maybe v]`.  To make these desugarings accepted, `-XOverloadedRecordFields` implies `-XFlexibleContexts` and `-XConstraintKinds`.
    54 Instances for the `Has` typeclass and `FldTy` type family are automatically generated (for modules with `-XOverloadedRecordFields` enabled) using the record fields that are in scope. For example, the data type
    56 {{{
    57 data T a = MkT { x :: [a] }
    58 }}}
    60 has the corresponding instances
    62 {{{
    63 type instance FldTy (T a) "x" = [a]
    65 instance b ~ [a] => Has (T a) "x" b where
    66   getField _ (MkT x) = x
    67 }}}
    69 The bare type variable `b` in the instance head is important, so that we get an instance match from the first two parameters only, then the equality constraint `(b ~ [a])` improves `b`. For example, if the constraint `Has (T c) "x" d` is encountered during type inference, the instance will match and generate the constraints `(a ~ c, b ~ d, b ~ [a])`. Moreover, the `FldTy` type family ensures that the third parameter is functionally dependent on the first two, which is needed to [#Troubleinparadise avoid ambiguity errors when composing overloaded fields].
    71 The reason for using a three-parameter class, rather than just two parameters and a type family, is to support the syntactic sugar and improve type inference error messags. With a two-parameter class we could easily end up inferring types like the following, and it would be hard to reapply the sugar:
    73 {{{
    74 f :: (Has r "x", Has r "y", FldTy r "x" ~ Int, FldTy r "y" ~ Int) => r -> Int
    75 f r = x r + y r :: Int
    76 }}}
    78 Moreover, error messages would tend to be reported in terms of unification failures for `FldTy` rather than unsolved `Has` class constraints.
    81 === Representation hiding ===
    83 At present, a datatype in one module can declare a field, but if the selector function is not exported, then the field is hidden from clients of the module. It is important to support this. Typeclasses in general have no controls over their scope, but for implicitly generated `Has` instances, the instance is available for a module if `-XOverloadedRecordFields` is enabled for that module and the record field selector function is in scope. Instances are not exported from the module that defines the datatype, but are created implicitly when needed by the typechecker.
    85 This enables representation hiding: just like at present, exporting the field selector permits access to the field. For example, consider the following module:
    87 {{{
    88 module M ( R(x) ) where
    90 data R = R { x :: Int }
    91 data S = S { x :: Bool }
    92 }}}
    94 Any module that imports `M` will have access to the `x` field from `R` but not from `S`, because the instance `Has R "x" Int` will be available but the instance `Has S "x" Bool` will not be. Thus `R { x :: Int }` will be solved but `S { x :: Bool }` will not.
    97 === Multiple modules and automatic instance generation ===
    99 Note that `Has` instances are generated on a per-module basis, using the fields that are in scope for that module, and automatically generated instances are never exported. Thus it doesn't matter whether `-XOverloadedRecordFields` was on in the module that defined the datatype. The availability of the instances in a particular module depends only on whether the flag is enabled for that module.
    101 Suppose module `M` imports module `N`, `N` imports module `O`, and only `N` has the extension enabled. Now `N` can project any field in scope (including those defined in `O`), but `M` cannot access any `Has` instances.
    103 This means that
    104  * the extension is required whenever a `Has` constraint must be solved;
    105  * no new mechanism for hiding instances is required; and
    106  * records defined in existing modules (or other packages) without the extension can still be overloaded.
    109 === Higher-rank fields ===
    111 Higher-rank fields, such as in the declaration
    113 {{{
    114 data U = MkU { x :: forall a . a -> a }
    115 }}}
    117 cannot be overloaded. If such a field is in scope for a module with `-XOverloadedRecordFields` enabled, no Has or Upd instances will be produced. The user can always declare the selector function manually. This is similar to the current situation for existentially quantified variables in fields, which do not give rise to selector functions at all.
    119 Bidirectional type inference for higher-rank types relies on inferring the type of functions, so that types can be pushed in to the arguments. However, the type of an overloaded field cannot immediately be inferred (as some constraint solving is required). This is why higher-rank and overloaded fields are incompatible.
    121 Some previous variants of the design supported rank-1 universally quantified fields (but not rank-2 and above). However, these prevent the third parameter of the `Has` class from being a function of the first two, and hence obstruct type inference for compositions of selectors.
    124 === Qualified names ===
    126 A qualified name must refer to a unique field; it cannot be overloaded. Consider the following example:
    128 {{{
    129 module M where
    130   data S = MkS { foo :: Int }
    132 module N where
    133   data T = MkT { foo :: Int }
    134   data U = MkU { foo :: Int }
    136 module O where
    137   import M
    138   import N
    140   f x = x
    141   g x = x
    142   h x = foo x
    143 }}}
    145 Here `f` is okay, because `` is unambiguous, but `g` is forbidden. This is because we have no way to support polymorphism over fields only from one module. The user must write `h` instead, making it explicit that the field is not qualified.
    148 == Record update ==
    150 Supporting polymorphic record update is rather more complex than polymorphic lookup. In particular:
    151  * the type of the record may change as a result of the update;
    152  * multiple fields must be updated simultaneously for an update to be type correct (so iterated single update is not enough); and
    153  * records may include higher-rank components.
    155 These problems have already been [wiki:Records/OverloadedRecordFields#Recordupdates described in some detail]. In the interests of doing something, even if imperfect, the traditional record update syntax will support only non-overloaded update (that is, update of a unique known record type). Where overloading mean that the fields alone do not determine the type being updated, a type signature may be required. For example,
    156 {{{
    157 e { x = t }
    158 }}}
    159 currently relies on the name `x` to determine the datatype of the record. If this is ambiguous, a type signature can be given either to `e` or to the whole expression. Thus either
    160 {{{
    161   e :: T Int { x = t }
    162 }}}
    163 or
    164 {{{
    165   e { x = t } :: T Int
    166 }}}
    167 will be accepted. (Really only the type constructor is needed, whereas this approach requires the whole type to be specified, but it seems simpler than inventing a whole new syntax.)
    170 === Limited type-changing update ===
    172 As noted above, supporting a polymorphic version of the existing record update syntax (in its full generality) is difficult. However, we can generate instances of the following class and type family, which permit type-changing update of single fields:
    174 {{{
    175 type family UpdTy (r :: *) (n:: Symbol) (a :: *) :: *
    177 class (Has r n (FldTy r n), r ~ UpdTy r n (FldTy r n)) =>
    178           Upd (r :: *) (n :: Symbol) (t :: *) where
    179   setField :: Proxy# n -> r -> t -> UpdTy r n t
    180 }}}
    182 For example, the datatype `T` would give rise to these instances:
    184 {{{
    185 data T a = MkT { x :: [a] }
    187 type instance UpdTy (T a) "x" [c] = T c
    189 instance (b ~ [c]) => Upd (T a) "x" b where
    190   setField _ (MkT _) e = MkT e
    191 }}}
    193 The third parameter of the `Upd` class represents the new type being assigned to the field. Thus it is not functionally dependent on the first two. Consequently, we must use a bare type variable `b` in the instance declaration, with an equality constraint `b ~ [c]` postponed until after the instance matches.
    195 If a type variable is shared by multiple fields, it cannot be changed using `setField`. Moreover, the use of the `UpdTy` type family means that phantom type variables cannot be changed. For example, in
    197 {{{
    198 data V a b c = MkV { foo :: (a, b), bar :: a }
    199 }}}
    201 an update to `foo` must keep `a` and `c` the same, since `a` occurs in the
    202 type of `bar`, and `c` does not occur in the type of `foo`, but the update may change `b`.  Thus we generate:
    204 {{{
    205 type instance UpdTy (V a b c) "foo" (a, b') = V a b' c
    207 instance t ~ (a, b') => Upd (V a b c) "foo" t where
    208   setField _ (MkV _ bar) e = MkV e bar
    209 }}}
    212 === Lens integration ===
    214 It was implied above that a field like `foo` translates into `getField (proxy# :: Proxy# "foo") :: Has r "foo" t => r -> t`, but this is not quite the whole story. We would like fields to be usable as lenses (e.g. using packages such as [ lens], [ data-accessor] or [ data-lens]). This requires a slightly more general translation, using
    216 {{{
    217 field :: Accessor p r n t => Proxy# n -> p r t
    218 field z = accessField z (getField z) (setField z)
    219 }}}
    221 to translate `foo` to `field (proxy# :: Proxy# "foo") :: Accessor p r "foo" t => p r t`. The `Accessor` class is defined thus:
    223 {{{
    224 class Accessor (p :: * -> * -> *) (r :: *) (n :: Symbol) (t :: *) where
    225   accessField :: Proxy# n ->
    226                  (Has r n t => r -> t) ->
    227                  (forall a . Upd r n a => r -> a -> UpdTy r n a) ->
    228                  p r t
    229 }}}
    231 An instance of `Accessor p r n t` means that `p` may contain a getter and setter for the field `n` of type `t` in record type `r`. In particular, we can give an instance for functions that ignores the setter completely:
    233 {{{
    234 instance Has r n t => Accessor (->) r n t where
    235   accessor _ getter setter = getter
    236 }}}
    238 Thus, whenever a field `foo` is used at a function type (by applying it or composing it, for example), this instance will be selected. That is, `foo` translates to `field proxy#`, which computes to `accessor proxy# (getField proxy#) (setField proxy#)`, and hence to `getField proxy#` by the `Accessor` instance for functions.
    240 However, `p` does not have to be the function arrow. Suppose the `lens` library defined the following newtype wrapper:
    242 {{{
    243 newtype WrapLens n r a
    244   = MkWrapLens (forall b . Upd r n b => Lens r (UpdTy r n b) a b)
    246 instance m ~ n => Accessor (WrapLens m) r n where
    247   accessor _ getter setter = MkWrapLens (\ w s -> setter s <$> w (getter s))
    249 fieldLens :: Upd r n b => WrapLens n r a -> Lens r (UpdTy r n b) a b
    250 fieldLens (MkWrapLens l) = l
    251 }}}
    253 Now `fieldLens foo` is a lens whenever `foo` is an overloaded record field.
    255 Other lens libraries can define their own instances of `Accessor`, even if they do not support type-changing update, and the same machinery enables fields to be used with them. For example, here is another possible encoding of lenses:
    257 {{{
    258 data DataLens r a = DataLens
    259    { getDL :: r -> a
    260    , setDL :: r -> a -> r }
    262 instance Upd r n t => Accessor DataLens r n t where
    263   accessField _ g s = DataLens g s
    264 }}}
    266 Now an overloaded record field `foo` can be used as if it had type `DataLens r a`, and it will just work: we do not even need to use a combinator.
    269 === Type-changing update: phantom arguments ===
    271 Consider the datatype
    273 {{{
    274 data T a = MkT { foo :: Int }
    275 }}}
    277 where `a` is a phantom type argument (it does not occur in the type of `foo`). The traditional update syntax can change the phantom argument, for example if `r :: T Int` then `r { foo = 3 } :: T Bool` typechecks. However, `setField` cannot do so, because this is illegal:
    279 {{{
    280 type instance UpdTy (T a) "foo" Int = T b
    281 }}}
    283 Note that the result of the type family involves an unbound variable `b`.
    285 In general, a use of `setField` can change only type variables that occur in the field type being updated, and do not occur in any of the other fields' types.
    288 === Type-changing update: type families ===
    290 Consider the following definitions:
    292 {{{
    293 type family Goo a
    294 data T a = MkT { foo :: Goo a }
    295 }}}
    297 In order to change the type of the field `foo`, we would need to define something like this:
    299 {{{
    300 type instance UpdTy (T a) "foo" (Goo b) = T b
    301 }}}
    303 But pattern-matching on a type family (like `Goo`) doesn't work, because type families are not injective. Thus we cannot change type variables that appear only underneath type family applications. We generate an instance like this instead:
    305 {{{
    306 type instance UpdTy (T a) "foo" x = T b
    307 }}}
    309 On the other hand, in the datatype
    311 {{{
    312 data U a = MkU { bar :: a -> Goo a }
    313 }}}
    315 it is fine to change `a` when updating `bar`, because it occurs rigidly as well as under a type family, so we can generate this:
    317 {{{
    318 type instance UpdTy (U a) "bar" (b -> x) = U b
    319 }}}
    321 This is all a bit subtle. We could make updates entirely non-type-changing if the field type contains a type family, which would be simpler but somewhat unnecessarily restrictive.
    324 == Design choices ==
    326 === Scope issues, or, why we miss dot ===
    328 Consider the following example:
    330 {{{
    331 f :: r { g :: Int } => r -> Int
    332 f x = g x + 1
    333 }}}
    335 Q1. What happens if `g` is not in scope?
    337 A. The code gives an error. This is where dot-notation (or another syntactic form marking a field name) is better: `f x = x.g + 1` can work even if `g` is not in scope. Observe that something similar happens with implicit parameters: `f y = y + ?x` works even if `x` is not in scope, and introduces a new constraint `(?x :: Int)`.
    339 Q2. What if we add `data T = MkT { g :: Char }`?
    341 A. The code compiles correctly, even though the datatype is "obviously" irrelevant because the field `g` it declares has the wrong type, so it cannot be selected. This would not be the case if we treated `g` as an unambiguous reference to the only field of that name in scope.
    343 Q3. What if we subsequently add another datatype with a field `g`?
    345 A. The code still compiles correctly.
    348 === Syntax for record projections ===
    350 An advantage of distinguishing record projections syntactically (as in `x.g`) is that `g` is always treated as a record field, regardless of what is in scope. This allows better separation of concerns, as functions that manipulate records can be defined abstractly rather than referring to particular datatypes. We could consider using an operator less controversial than dot (for example, `#` has been suggested):
    352 {{{
    353 f x = x#g + 1
    354 bar xs = map (#baz) xs
    355 }}}
    357 This should not conflict with `-XMagicHash`, since that allows `#` only as a postfix name modifier. Note that it works perfectly with partial application.
    359 Another alternative, once we have `-XExplicitTypeApplication`, is to use the `field` function defined in `GHC.Records`:
    361 {{{
    362 field @"foo"
    363 }}}
    365 That is a bit long, however, and worse is the version needed at present:
    367 {{{
    368 field (proxy# :: Proxy# "foo")
    369 }}}
    372 === Unambiguous fields ===
    374 What if `foo` occurs in an expression, and there is only one datatype `T` with a field `foo` in scope? There are three obvious choices:
    376 1. Generate a polymorphic use of `field` as normal.
    377 2. Generate a use of `field`, specialised to the type `T`, but still polymorphic in the choice of `Accessor`.
    378 3. Use the record selector for `foo`, without any polymorphism.
    380 The first and second options are likely to be preferred by users who wish to write polymorphic code, while the third is better if the desire is only to overload field names but not write code that is polymorphic in the choice of datatype. The third option severely hampers the integration with lenses, because a field will only be a lens if it is ambiguous. However, the third would allow higher-rank fields to be used when unambiguous. This suggests a fourth option:
    382 4. Use the record selector for `foo` if it is applied to one or more arguments, and generate a use of `field` specialised to the type `T` otherwise.
    384 This makes higher-rank fields usable (though possibly requiring eta-expansion), and it allows lens integration. On the other hand, it is still an impediment to users wishing to write polymorphic code.
    386 Oh, and there's a fifth option:
    388 5. Generate a polymorphic use of `field` as normal, but when defaulting a constraint `Has r "foo" t`, choose the instance for `T`.
    390 This gives the maximum amount of polymorphism and the right behaviour in the presence of the monomorphism restriction, but defaulting is evil and confusing...
    392 At the moment we take the first option.
    395 === Record update: avoiding redundant annotations ===
    397 In an update `e { x = t }`, if `e` is a variable whose type is given explicitly in the context, we could look it up rather than requiring it to be given again. Thus
    398 {{{
    399 f :: T Int -> T Int
    400 f v = v { x = 5 }
    401 }}}
    402 would not require an extra annotation. On the other hand, we would need an annotation on the update in 
    403 {{{
    404   \v -> (v { x = 4 }, [v, w :: T Int])
    405 }}}
    406 because the type of `v` is only determined later, by constraint solving.
    408 Annoyingly, nested updates will require some annotations. In the following example, the outer update need not be annotated (since `v` is a variable that is explicitly given a type by the context) but the inner update must be (since `x v` is not a variable):
    409 {{{
    410   f :: T Int -> T Int
    411   f v = v { x = (x v){ y = 6 } }
    412 }}}
    415 === Hiding record selectors ===
    417 Optionally, we could [wiki:Records/DeclaredOverloadedRecordFields/NoMonoRecordFields add a flag `-XNoRecordSelectorFunctions`] to suppress the record selectors. Just as `-XOverloadedRecordFields` applies to a client module, and generates `Has` instances for that module, so `-XNoRecordSelectorFunctions` in a client module would hide all the record selectors that should otherwise be in scope. The idea is that another record system could use Template Haskell to generate functions in place of selectors, and these would not clash.
    419 Since the selectors are hidden by clients (on import) rather than on export, fields can still be used for record update and mentioned in import and export lists, to control access to them (as discussed in the [wiki:Records/OverloadedRecordFields/Plan#Representationhiding representation hiding] section).
    422 === Syntactic sugar for `Upd` constraints ===
    424 Should we have a special syntax for `Upd` constraints, just as `r { x :: t }` sugars `Has r "x" t`? What should it look like? Perhaps something like `r { x ::= t }`?
    427 == Remarks ==
    430 === Trouble in paradise ===
    432 [ Edward Kmett points out] that a previous version of this proposal, where the third parameter of the `Has` class was not functionally dependent on the first two, fell short in an important respect: composition of polymorphic record fields would lead to ambiguity errors, as the intermediate type cannot be determined. For example, suppose
    434 {{{
    435 foo :: Has b "foo" c => b -> c
    436 bar :: Has a "bar" b => a -> b
    437 }}}
    439 then
    441 {{{
    442 foo . bar :: (Has a "bar" b, Has b "foo" c) => a -> c
    443 }}}
    445 and `b` is an ambiguous type variable. This shows the need for the `FldTy` type family.
    448 === Virtual record fields ===
    450 We could imagine supporting virtual record fields by allowing the user to declare their own instances of `Has` and `FldTy` (and possibly `Upd` and `UpdTy`). For example, the user could write the following:
    452 {{{
    453 data Person = MkPerson { firstName :: String, lastName :: String }
    455 type instance FldTy Person "fullName" = String
    456 instance t ~ String => Has Person "fullName" t where
    457   getField _ p = firstName p ++ " " ++ lastName p
    458 }}}
    460 This means that the `Person` type can be used where a type with a field `fullName` is expected. Since no `Upd` and `UpdTy` instances are provided, the field cannot be updated.
    462 However, this does not bring `fullName` into scope as a field, [#Scopeissuesorwhywemissdot as previously observed]. Moreover, it is difficult to check the type family instances for consistency. For example, given the following declaration
    464 {{{
    465 type instance FldTy a "foo" = Int
    466 }}}
    468 we would need to check that any datatype with a field `foo` in scope gave it the type `Int`. For these reasons, user-defined instances of the classes are not currently permitted, so virtual fields are not available.