Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#10622 closed task (fixed)

Rename Backpack packages to units

Reported by: ezyang Owned by: ezyang
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.11
Keywords: backpack Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D1057
Wiki Page:

Description (last modified by ezyang)

After today's weekly Backpack call, we have come to the conclusion that we have two different types of "packages" in the Backpack world:

  1. Cabal packages, which have a single .cabal file and are a unit of distribution which get uploaded to Hackage, and
  1. Backpack packages, of which there may be multiple defined in a Backpack file shipped with a Cabal package; and are the building blocks for modular development in the small.

It's really confusing to have both of these called packages: thus, we propose to rename all occurrences of Backpack package to unit. A Cabal package may contain MULTIPLE Backpack units, and old-style Cabal files will only define one unit. Every Cabal package has a distinguished unit (with the same name as the package) that serves as the publically visible unit.

A Cabal package remains

  • The unit of distribution
  • The unit that Hackage handles
  • The unit of versioning
  • The unit of ownership (who maintains it etc)

Here are some of the consequences:

  1. The "installed package database" no longer maintains a one-to-one mapping between Cabal packages and entries in the database. This invariant is being dropped for two reasons: (1) With a Nix-style database, a package foo-0.1 may be installed many times with different dependencies / source code, all of which live in the installed package database. (2) With Backpack, a package containing a Backpack file may install multiple units. To avoid having to rename *everything*, we'll keep calling this the installed package database, but really it's more like an installed *unit* database.
  1. We rename PackageKey to UnitKey, as it identifies a unit rather than a Cabal package. (I think this actually makes the function of these identifiers clearer.) We'll also distinguish Cabal-file level PackageNames from Backpack-file UnitNames. Installed units are identified by an InstalledUnitId instead of an InstalledPackageId.
  1. The source-level syntax of Backpack files will use unit in place of where package was used before.
  1. For old-style packages, Cabal will continue to write and register a single entry in the installed package database. For Backpack packages, Cabal will register as many entries as is necessary to install a package. The entry with the same UnitName as PackageName is publically visible to other packages. If a Backpack file defines other packages, those packages are registered with different UnitNames (giving them different InstalledPackageIds) which are not publically visible. The non-publically visible packages will have their description/URL/etc fields blank, and have a pointer to the "real" package.
  1. If when installing a unit, we discover that it is already present in the database, we check if the ABI hashes are the same. If they are, we simply skip installing the unit but otherwise proceed. If the ABI hashes are not the same, we error: the units we are installing need to be recompiled against the unit present in the database.
  1. Dependency tracking should be fine-grained within a PACKAGE, and coarse-grained outside. So we need to let interface files track module dependencies for files which are not in the same unit, but are in the same package.

Change History (14)

comment:1 Changed 4 years ago by ezyang

Description: modified (diff)

comment:2 Changed 4 years ago by ezyang

Differential Rev(s): Phab:D1057

comment:3 Changed 4 years ago by simonpj

Description: modified (diff)

comment:4 Changed 4 years ago by ezyang

Description: modified (diff)

comment:5 Changed 4 years ago by ezyang

Here is a counter-proposal, organized around avoiding changing the package database format:

Cabal already has limited support for multiple "packages" in a distribution unit, namely with its support for testing libraries. These libraries are never installed for other users to use, but internally, they can be installed and used like extra libraries. In this model, the database has a separate entry for each unit.

The downside is that the package database will no longer be organized by distribution units. But this was already the case: if I install a package multiple times with different dependencies, it will occur multiple times in the database.

The big upside is that the changes we have to make are now much smaller.

comment:6 Changed 4 years ago by ezyang

And here is a counter-counter-proposal, where simply REDUCE sharing:

  1. The basic idea is to defer making libraries/bundles of hi files until we have a complete, definite package that has been Cabalized. So if we have something like:
    package p where
      signature H
      module P
    package q where
      module H
      include p
    
    distributed with a q.cabal, p NEVER SHOWS UP in the installed package database; not even the version of it instantiated with q.
  1. This obviously breaks type-checking, since when we build p will still be compiled to a specific package key p(Q -> q:H), but this package key won't be anywhere in our installed package database. So libraries like q will get a new type of entry in the installed package database: they are fat installed packages which can contain files for more than one package key. These keys are enumerated in the entry installed package database, and you just look in import-dirs/key to find the relevant interface files. So p's interface files will live in something like q-install-dir/p_KEY. Cabal also records the ABI hash of each of the sub-packages in a fat package.
  1. Suppose p is an indefinite package with a p.cabal of its own. Neither the generic p nor the instantiated versions of p have direct entries in the package database: you will only file hi files/libraries under the fat installs of other definite packages with used p.
  1. What happens in this situation, when q1 and q2 are built in parallel? (Suppose each package has its own Cabal file)
    package p where
      signature H
      module P
    package h where
      module H
    package q1 where
      include h
      include p
      module Q1
    package q2 where
      include h
      include p
      module Q2
    
    h is a normal package and can get installed as usual. q1 and q2 are FAT installed packages, they get installed with hi files and libraries for p(H -> h:H). In particular, this means that means that p instantiated with h is DUPLICATED in these two fat installed packages. To avoid disaster from incompatible duplicate packages, we verify that for every duplicated package key in the package database, the ABI hashes are the same. This will work great if we have deterministic builds, and not so great if they are nondeterministic.
  1. Let's say we instantiate a package, and we discover that a fat package which we don't directly depend on instantiated it already. What do we do? It should be OK to reuse it, but when Cabal goes and installs, it must copy the interface files and libraries from the other fat package into our new fat package.
  1. This makes the story great for distribution packagers: they don't have to worry about two (morally separate) packages depending on common files/libraries which need to be installed in the same location. This would require a subpackage, but Debian is unlikely to want to create lots and lots of little packages to get the sharing we're aiming for here.

I actually kind of suspect this is what Simon wanted to do from the beginning; apologies for not figuring it out sooner.

comment:7 Changed 4 years ago by simonpj

This sounds plausible to me, though as usual with cabal/backpack I am not 100% sure of my ground.

I think of it like this:

  • Cabal concerns itself with Cabal packages (units of distribution and versioning), including choosing version numbers, downloading them, and figuring out if that particular package (instantiated with its transitive dependencies) is already installed.
  • GHC concerns itself with Backpack units.

Both share a single "installed package database".

But they use it in a different way. For GHC at least, it's just a cache: a place to record the result of previous work (including typechecking indefinite packages), so that we don't need to repeat it. There is no harm in repeating it, but it's a waste of time.

So Cabal need never see the previously-compiled indefinite packages; they are just a way for GHC to save time. Maybe that is what you are saying.

Another way to attack this is to ask "what questions does Cabal ask the installed package database?" and "what questions does GHC ask?". I think the two are different.

Simon

comment:8 Changed 4 years ago by ezyang

Duncan and I had a chat about this, which in particular clarified "What questions Cabal asks the installed package database." cabal-install currently queries the installed package database in order so determine what the installed packages are when it is making an install plan. However, this is going to soon not be the case with Nix style versioning: Cabal will make an install plan using purely source package info, and then query the database to find out what products it already has installed.

So the key idea is: the installed package database is a database of units! There may be more junk in the database than you would expect from the packages you have installed, but in a Nix world where there will be many many installs of the same package, ghc-pkq is already going to not be so useful for finding info about packages.

I'll update the top level description with the final plan.

comment:9 Changed 4 years ago by ezyang

Description: modified (diff)

comment:10 Changed 4 years ago by simonpj

Installed units continue to be identified by InstalledPackageId

Wouldn't InstalledUnitId be clearer? Indeed wouldn't InstalledPackageId be positively misleading?

comment:11 Changed 4 years ago by ezyang

Description: modified (diff)

OK. (I was thinking about BC for Cabal users but I guess it should not be too bad. We'll see.)

comment:12 Changed 4 years ago by ezyang

Description: modified (diff)

Add a remark about how dependency tracking should change between units in the same package.

comment:13 Changed 4 years ago by ezyang

Resolution: fixed
Status: newclosed

comment:14 Changed 4 years ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.