Changes between Version 11 and Version 12 of CabalDependency


Ignore:
Timestamp:
Aug 29, 2014 9:02:49 AM (5 years ago)
Author:
simonpj
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CabalDependency

    v11 v12  
    1 This page describes how GHC depends on and makes use of Cabal.
    2 
    3 = General
    4 
    5 GHC uses Cabal in a few ways
    6 
    7  * GHC ships with the Cabal library pre-installed. This is as a convenience to users, and as asked for in the original Cabal specification.
    8  * The GHC build system makes use of the Cabal library. See Building/Architecture/Idiom/Cabal
    9  * The representation for installed packages in the installed package database manipulated by ghc-pkg is conceptually defined by the Cabal specification, and in practice defined by the Cabal library (with types, parsers etc).
    10  * The ghc-pkg program depends on the Cabal library for the types, parsers etc of installed package information.
    11  * The bin-package-db library defines a binary serialization format for the package database read by GHC.
    12  * Historically the GHC library also depended on Cabal (both directly, and indirectly through bin-package-db) for the types of installed packages (for its in-memory representation of the package database). This is no longer the case.
    13 
    14 = Removal of the GHC library dependency on the Cabal library
    15 
    16 See ticket #8244
    17 
    18 The GHC library used to depend on the Cabal library directly, for the representation of installed packages. This was convenient for implementation but had a number of drawbacks:
    19 
    20  * Any package making use of the GHC library would be forced to use the same version of Cabal as GHC used. This was annoying because while the parts of Cabal that GHC used were not very fast moving, other parts of the library are, and so other packages did want to use a different version of Cabal.
    21  * Given the existing limitations and inconveniences of installing multiple versions of the same package, the GHC dependency on Cabal made it hard to upgrade Cabal separately. Of course this is really more of a limitation of the packaging side of things.
    22  * The fact that GHC depended directly on Cabal placed limitations on the implementation of Cabal. GHC must be very careful about which packages it needs to be able to build (so called boot packages). Because Cabal was a boot package, it could itself only depend on other boot packages. In particular, Cabal needs a decent parser combinator library, but no such library is available as a boot package (and GHC developers were understandably reluctant to add dependencies on parsec, mtl, text etc as would be required).
     1This page describes how GHC depends on and makes use of Cabal.  It describes the situation
     2after the changes implemented in #8244 are complete.
    233
    244== Design of GHC-library's non-dependency on Cabal
    255
    26 Under the new approach, we have the following dependency structure for Cabal, ghc-pkg, GHC and bin-package-db:
    27 
     6Here is the overall architecture for Cabal, ghc-pkg, GHC, and bin-package-db:
    287{{{
    298                     ........................
    30                      .                      .
    31                      .  +--------------+  +-v----------+  +------------+
    32                      .  |    cabal     |  |  ghc-pkg   |  |    GHC     |
    33                      .  |  executable  |  | executable |  | executable |
    34                      .  +--------------+  +---+----+---+  +--+---------+
    35            executes  .          |             |    |         |
    36 (an "up-dependency") .          |             |    |  +------v------+
    37                      .          |       +-----+    |  |     ghc     |
    38                      .          |       |          |  |   package   |
    39                      .          |       |          |  +-+-----------+
    40                      .          |       |          |    |
    41                      .        +-v-------v-+ +------v----v----+
    42                      .........+   Cabal   | | bin-package-db |
    43                               |  package  | |    package     |
    44                               +-----+-----+ +--------+-------+
     9                     :                      :
     10                     :  +--------------+  +-v----------+  +------------+
     11                     :  |    cabal     |  |  ghc-pkg   |  |    GHC     |   EXECUTABLES
     12                     :  |  executable  |  | executable |  | executable |
     13                     :  +--------------+  +---+----+---+  +--+---------+
     14           executes  :          |             |    |         |
     15(an "up-dependency") :          |             |    |  +------v------+
     16                     :          |       +-----+    |  |     ghc     |      PACKAGES
     17                     :          |       |          |  |   package   |
     18                     :          |       |          |  +-+-----------+
     19                     :          |       |          |    |
     20                     :        +-v-------v-+    +---v----v-------+
     21                     :........+   Cabal   |    | bin-package-db |          PACKAGES
     22                              |  package  |    |    package     |
     23                              +-----+-----+    +--------+-------+
    4524                                    |                |
    4625                                    |                |
    47                               ......v.......  .......v......
    48                               .    text    .  .   binary   .
    49                               .  database  .  .  database  .
    50                               ..............  ..............
     26                              ......v.......    .....v........
     27                              :    text    :    :   binary   :     DATA FILES
     28                              :  database  :    :  database  :
     29                              ..............    ..............
     30}}}
     31These components are:
    5132
    52 }}}
     33 * Cabal:
     34   * The `cabal` executable, often called "cabal-install" is what you run from the command line (e.g. `cabal install pkg`, `cabal build` etc).
     35   * The `Cabal` package contains much of the guts of Cabal.
     36   GHC ships with the Cabal library pre-installed. This is as a convenience to users, and as asked for in the original Cabal specification.
    5337
    54 Cabal has a `InstalledPackageInfo` type, defined in the Cabal package, which defines a representation for installed packages as per the Cabal specification; however, now `bin-package-db` defines a new variant of the type which contains *only* the fields that GHC relies on. (Call this GHC's type.) ghc-pkg depends on both Cabal and bin-package-db, and is responsible for converting Cabal's types to GHC's types, as well as writing these contents to a binary database, as before. (Cabal invokes ghc-pkg in order to register packages in the installed package database, and as before doesn't directly know about this format.)
     38 * Package database:
     39   * The `ghc-pkg` executable ships with GHC and gives read/write access to the binary package database.
     40   * `bin-package-db` is a Haskell library that reads and writes the binary package database
    5541
    56 Now that there are two types for installed packages, what is the format of the database that bin-package-db writes? The ghc-pkg tool (as required by the Cabal spec) must consume, and regurgitate package descriptions in an external representation defined by the Cabal spec. Thus, the binary package database must contain all the information as per *Cabal's* type; better yet, it would be best if we directly used Cabal's library for this (so that we don't have to keep two representations in sync). However, doing this directly is a bit troublesome for GHC, which doesn't want to know anything about Cabal's types, and only wants its subset of the installed package info (GHC's type).
     42 * GHC consists of:
     43   * The `ghc` executable, which is a very thin layer on top of the `ghc` package
     44   * The `ghc` package, which implements the GHC API.  Most of GHC is in here.
    5745
    58 We employ a trick in the binary database to support both cases: it contains all the packages in two different representations, once using Cabal types and once using GHC's types. These are contained in two sections of the package.cache binary file inside each package database directory. One section contains the Cabal representation. This section is read back by ghc-pkg when reading the package database. The other section contains the GHC representation. This section is read by GHC. The length of Cabal's section is explicitly recorded in the file, so GHC does not need to know anything about the internal contents of the other section to be able to read its own section. The ghc-pkg tool knows about the representation of both sections and writes both.
     46The GHC build system itself makes of Cabal. See Building/Architecture/Idiom/Cabal.
    5947
    60 Note that in principle ghc-pkg could keep just GHC's types in the binary cache file, and read the information for Cabal from the text files in the package database directory. The reason we keep Cabal's types in binary format as well is simply for performance. Reading all the text files is somewhat slow (both in terms of I/O and the parsing). The main case where this matters is `ghc-pkg dump` which is used by tools like `cabal` and `Setup.hs` to get all the installed packages. It's also worth noting that the section of the package.cache binary file that GHC reads comes first, and so it is not slowed down by the presence of the second section.
     48Things we want to be true:
     49 * You can upgrade Cabal (including the `Cabal` package and `cabal` executable) without upgrading GHC.
     50 * You can manually use `ghc-pkg` to install packages.  Hence, all manipulation of the package database must be done via `ghc-pkg`.
     51 * The "text database" is really just a bunch of text files, each describing one package, scattered in the file system in places that Cabal knows.  This is what operating system installers expect.
    6152
    62 Notes
    63  * Cabal only reads/writes the binary package db via the `ghc-pkg` executable.
     53Consequences:
     54
     55 * Cabal only reads/writes the binary package db via the `ghc-pkg` executable.  Cabal cannot maintain a private binary cache of package information, because then it would not know about a package added by a manual call to `ghc-pkg`.  So if Cabal wants a binary cache, it has to rely on `ghc-pkg` to maintain it.
     56
    6457 * GHC reads the binary package db, via `bin-package-db` library.
     58
    6559 * Cabal communicates with `ghc-pkg` via text files representing the Cabal `InstalledPackageInfo` type.  The `Cabal` library offers a parser and pretty-printer for this type, which `ghc-pkg` uses.
    66  * Things we want to be true:
    67    * You can upgrade Cabal (including the `Cabal` package and `cabal` executable) without upgrading GHC.
    68    * You can manually use `ghc-pkg` to install packages, so all manipulation of the package database must be done via `ghc-pkg`.
     60
    6961 * When you upgrade Cabal, you probably want to upgrade `ghc-pkg` too, since it depends on the `Cabal` package.  But you don't ''have'' to.  Cabal and `ghc-pkg` communicate through a text file interface. If new sexy Cabal adds a field, old `ghc-pkg` simply ignores it. Everything works, except that new sexy Cabal won't benefit from the new field being stored in the database.
     62
    7063 * One consequence is that `bin-package-db` ''must not'' depend on Cabal (else you could not upgrade Cabal without upgrading GHC).  So `bin-package-db` cannot know what new sexy meta-data the upgraded Cabal (and `ghc-pkg`) are storing.  Yet it alone writes the binary file. So the read/write interface that `bin-package-db` offers has an argument for "and store this too in the binary file".
     64
     65Implementation notes:
     66
     67 * `InstalledPackageInfo`:
     68   * The `Cabal` package defines a `InstalledPackageInfo` type, defined in the Cabal package, which defines a representation for installed packages as per the Cabal specification.
     69   * `bin-package-db` defines a new variant of the type (with the same name) which contains ''only'' the fields that GHC relies on. (Call this GHC's `InstalledPackageInfo` type.)
     70   * `ghc-pkg` depends on both `Cabal` and `bin-package-db`, and is responsible for converting Cabal's types to GHC's types, as well as writing these contents to a binary database. Cabal invokes `ghc-pkg` in order to register packages in the installed package database, and as before doesn't directly know about this format.
     71
     72 * '''Binary format'''.  Now that there are two types for installed packages, what is the format of the database that bin-package-db writes? The `ghc-pkg` tool (as required by the Cabal spec) must consume, and regurgitate package descriptions in an external representation defined by the Cabal spec. Thus, the binary package database must contain all the information as per ''Cabal's'' type; better yet, it would be best if we directly used Cabal's library for this (so that we don't have to keep two representations in sync). However, doing this directly is a bit troublesome for GHC, which doesn't want to know anything about Cabal's types, and only wants its subset of the installed package info (GHC's type).
     73
     74 We employ a trick in the binary database to support both cases: it contains all the packages in two different representations, once using Cabal types and once using GHC's types. These are contained in two sections of the package.cache binary file inside each package database directory. One section contains the Cabal representation. This section is read back by ghc-pkg when reading the package database. The other section contains the GHC representation. This section is read by GHC. The length of Cabal's section is explicitly recorded in the file, so GHC does not need to know anything about the internal contents of the other section to be able to read its own section. The ghc-pkg tool knows about the representation of both sections and writes both.
     75
     76 * Note that in principle `ghc-pkg` could keep just GHC's types in the binary cache file, and read the information for Cabal from the text files in the package database directory. The reason we keep Cabal's types in binary format as well is simply for performance. Reading all the text files is somewhat slow (both in terms of I/O and the parsing). The main case where this matters is `ghc-pkg dump` which is used by tools like `cabal` and `Setup.hs` to get all the installed packages. It's also worth noting that the section of the package.cache binary file that GHC reads comes first, and so it is not slowed down by the presence of the second section.
     77
    7178
    7279== Technical details
     
    121128}}}
    122129It uses the class to convert to/from the on disk UTF8 representation, and the internal representation (`String` for ghc-pkg, and things like newtype'd `FastString`s in GHC).
     130
     131---------------------------
     132= History: Removal of the GHC library dependency on the Cabal library
     133
     134See ticket #8244
     135
     136The GHC library used to depend on the Cabal library directly, for the representation of installed packages. This was convenient for implementation but had a number of drawbacks:
     137
     138 * Any package making use of the GHC library would be forced to use the same version of Cabal as GHC used. This was annoying because while the parts of Cabal that GHC used were not very fast moving, other parts of the library are, and so other packages did want to use a different version of Cabal.
     139 * Given the existing limitations and inconveniences of installing multiple versions of the same package, the GHC dependency on Cabal made it hard to upgrade Cabal separately. Of course this is really more of a limitation of the packaging side of things.
     140 * The fact that GHC depended directly on Cabal placed limitations on the implementation of Cabal. GHC must be very careful about which packages it needs to be able to build (so called boot packages). Because Cabal was a boot package, it could itself only depend on other boot packages. In particular, Cabal needs a decent parser combinator library, but no such library is available as a boot package (and GHC developers were understandably reluctant to add dependencies on parsec, mtl, text etc as would be required).
     141