Opened 5 years ago

Last modified 5 years ago

#10270 new bug

inconsistent semantics of type class instance visibility outside recursive modules

Reported by: skilpat Owned by: ezyang
Priority: normal Milestone:
Component: Compiler Version: 7.10.1
Keywords: Cc:
Operating System: MacOS X Architecture: x86_64 (amd64)
Type of failure: GHC rejects valid program Test Case:
Blocked By: Blocking:
Related Tickets: #9562 Differential Rev(s):
Wiki Page:

Description

When you have an instance defined in a module that's part of a recursive module loop, when should modules outside the loop see it? Right now, GHC behaves differently depending on whether it's in batch or in single-shot compilation mode. Which one is correct? Dunno!

Here's the example (code at the bottom). The gist is that you have two modules A and B that define data types T and U, respectively, which depend on each other, so A and B are recursive modules and we need a boot file -- say, for A.

Now suppose we define an instance Eq T in the implementation of A but not in the boot file. B imports the boot file, so it doesn't know about this instance. And then I have some third module, Main, outside the loop, which imports only B. And now the central question: Does Main know about the Eq T defined in A?

We can test how GHC answers this question by defining an (orphan) instance for Eq T in Main. In batch compilation mode, Main is rejected for defining a duplicate instance. In single-shot compilation mode, however, Main is accepted and any equality test in Main uses the locally defined instance; i.e., B doesn't know about A's Eq T and so neither does Main.

So which is correct? If you ask me, the latter semantics is correct, but I can see why the former might be argued as well (e.g., according to the fixed-point semantics of import/export described in (1)). In any case, the semantics between the two compilation modes should probably agree, right?

-- A.hs-boot
module A where
  data T -- a mutually recursive data type, along with B.U
  t :: T -- some value to test for Eq in Main


-- B.hs
module B(module A, module B) where
  -- export A.{T,t} since Main doesn't import A
  
  import {-# SOURCE #-} A

  -- mutually recursive data type across modules
  data U = U | UT T


-- A.hs
module A where
  import B

  -- mutually recursive data type across modules
  data T = T | TU U

  -- the true instance
  instance Eq T where
    _ == _ = True

  -- some value to test Eq instance in Main
  t :: T
  t = T


-- Main.hs
module Main where
  import B -- no import of A

  -- an orphan instance for Eq T.
  -- okay in one-shot mode; not okay in batch (--make) mode
  instance Eq T where
    _ == _ = False

  -- in one-shot mode, this prints False
  main = putStrLn $ show $ t == t

Commands and output for batch mode:

$ ghc --make Main
[1 of 4] Compiling A[boot]          ( A.hs-boot, A.o-boot )
[2 of 4] Compiling B                ( B.hs, B.o )
[3 of 4] Compiling A                ( A.hs, A.o )
[4 of 4] Compiling Main             ( Main.hs, Main.o )

A.hs:8:10:
    Duplicate instance declarations:
      instance Eq T -- Defined at A.hs:8:10
      instance Eq T -- Defined at Main.hs:7:10

Commands and output for single-shot mode:

$ ghc -c A.hs-boot
$ ghc -c B.hs
$ ghc -c A.hs
$ ghc -c Main.hs
$ ghc A.o B.o Main.o -o main
$ ./main
False

Tested on GHC 7.6.3, 7.8.3, and 7.10.1.

(1) A Formal Specification for the Haskell 98 Module System. Iavor S. Diatchki, Mark P. Jones, and Thomas Hallgren. Haskell '02. http://web.cecs.pdx.edu/~mpj/pubs/hsmods.pdf

Change History (6)

comment:1 Changed 5 years ago by ezyang

This is probably due to the retypecheck loop:

See bug #930.  This code fixes a long-standing bug in --make.  The
problem is that when compiling the modules *inside* a loop, a data
type that is only defined at the top of the loop looks opaque; but
after the loop is done, the structure of the data type becomes
apparent.

The difficulty is then that two different bits of code have
different notions of what the data type looks like.

The idea is that after we compile a module which also has an .hs-boot
file, we re-generate the ModDetails for each of the modules that
depends on the .hs-boot file, so that everyone points to the proper
TyCons, Ids etc. defined by the real module, not the boot module.
Fortunately re-generating a ModDetails from a ModIface is easy: the
function TcIface.typecheckIface does exactly that.

Picking the modules to re-typecheck is slightly tricky.  Starting from
the module graph consisting of the modules that have already been
compiled, we reverse the edges (so they point from the imported module
to the importing module), and depth-first-search from the .hs-boot
node.  This gives us all the modules that depend transitively on the
.hs-boot module, and those are exactly the modules that we need to
re-typecheck.

Following this fix, GHC can compile itself with --make -O2.

As a side effect, the instances also get dragged in. In one-shot mode, we use a different mechanism, the tcg_type_env_var, to ensure we can deal with the recursive case.

comment:2 Changed 5 years ago by goldfire

I think #9562 is related.

comment:3 Changed 5 years ago by simonpj

I rather agree that the one-shot story is the right one. And yes, the two should agree.

Edward I recall that we had extensive discussion about how to arrange that the right instances are in scope. We had a New Plan; but (a) I have forgotten the plan and (b) I'm not sure whether it was ever implemented. Can you remind me?

Simon

comment:4 Changed 5 years ago by ezyang

The plan was for orphan instances, to check if any given instance was in the set of orphan modules which are transitively imported from this module. We did implement it, see #2182.

Unfortunately, this doesn't directly help here because the instance in A.hs is NOT an orphan, so we just assume that it's in scope. So in some sense, this is the same problem that we were grappling with hsigs: sometimes, a non-orphan instance should NOT be visible, because it was not defined in the hs-boot/hsig file. We also spent a bit of time thinking about how to solve this, but all of our solutions boiled down to some variant of "put the set of transitively imported modules (including non-orphan modules)" into the interface, with various schemes for optimizing this since this results in quadratic growth of interface files.

Last edited 5 years ago by ezyang (previous) (diff)

comment:5 Changed 5 years ago by ezyang

Actually, the situation is a bit better here, because hs-boot operates strictly locally. Here is a fix proposal:

For non-orphan instances, we can't easily tell if an instance is supposed to be in scope unless we know whether or not the defining module is transitively reachable from the imports of a module. Fortunately, in --make, we do know this information from the import graph. So, for each module we compile, calculate a set of local hs/hs-boot modules which should be visible, and pass that along to the type-checker. Then, like for orphans, we only treat an instance as visible if it is in this set. (Note: we can't do this strategy for external instances, because we don't have the full dependency graph anymore.)

I thought of this technique when I noticed that --make behavior was inconsistent, depending on whether or not a module was compiled before or after we retypecheck an hs-boot loop; if you always make sure modules which transitively have a {-# SOURCE #-} import on a booted module are type-checked BEFORE you typecheck the real implementation (even if there isn't a dependency), you will ensure that only the correct interfaces are in scope. But this is awfully delicate and wouldn't work well in the parallel make case, so just calculating the reachable set of nodes seems better.

comment:6 Changed 5 years ago by ezyang

Owner: set to ezyang
Note: See TracTickets for help on using tickets.