Opened 3 years ago

Last modified 12 months ago

#13152 new feature request

Provide a mechanism to notify build system when .hi file is ready

Reported by: rwbarton Owned by:
Priority: normal Milestone: 8.10.1
Component: Driver Version: 8.1
Keywords: Cc: simonmar, niteria, thomie
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

In one-shot mode GHC typically finishes writing the interface file around halfway through compilation. A dependent module (if it doesn't use Template Haskell) only needs the interface file to start building. If we could start building a module as soon as all its dependencies' interface files are ready, we would cut the critical path by about a factor of 2; and parallelism in the GHC build tree is currently low enough that this should give significantly lower build times even on only modestly parallel systems.

The first obstruction to doing this is that there isn't a good way to know when the interface file becomes ready. I propose the following simple and flexible mechanism:

  • Add a GHC command-line argument -finterface-file-finished=N,F,str. When GHC has finished writing the interface file it uses the send system call to send the string str to file descriptor N using flags F.

The build system might then invoke GHC with file descriptor N open to a UNIX datagram socket, for example, and generate a unique str for each interface file dependency.

Alternative suggestions are more than welcome.

I tentatively milestoned this for 8.2 since it would be nice to have this available in the bootstrapping compiler when we switch to Hadrian, and the GHC-side implementation should be simple.

Change History (13)

comment:1 Changed 3 years ago by ezyang

If GHC wrote out the interface file atomically, could this be implemented just by watching the output files of a GHC invocation? (Maybe not on Windows?)

comment:2 Changed 3 years ago by rwbarton

I'm not sure. What if the interface file already existed, and either (1) we checked that we don't need to recompile (in which we currently don't touch the interface file at all), or (2) we do need to recompile and write out a new interface file?

Programming with inotify can be fiddly (for example, the man page says that it is possible for events to be lost), and I don't know whether we could reliably distinguish a new/unchanged-but-confirmed-valid interface file from a stale one.

comment:3 Changed 3 years ago by rwbarton

It might not even be that hard to retrofit use of this feature into the existing build system, changing a rule like

Foo.hi : Bar.hi Baz.hi
    ghc -c Foo.hs

to something like

Foo.hi : Bar.hi Baz.hi
    ( ( ghc -c Foo.hs -finterface-file-finished=1,0,x ; \
        buildserver finished Foo.o ) & ) | read -n 1

and then for the .o files

Foo.o : Foo.hi
    buildserver await Foo.o

Here buildserver await is a program that will block until the corresponding buildserver finished has been run.

comment:4 Changed 3 years ago by niteria

This could be a great improvement for us. I will ask someone from the Buck team to take a look and see what kind of interface is most convenient for Buck.

comment:5 Changed 3 years ago by niteria

Sadly, the way things are structured right now in Buck it would be hard to take advantage of this. Buck has dependencies between commands, not between resources like Make does.

That said it sounds like a big win for GHC with relatively low effort.

comment:6 Changed 3 years ago by rwbarton

Hmm, I'm not familiar with Buck at all, but could you have an encoding with two commands per module, a command-to-build-.hi and a command-to-build-.o? In reality the command-to-build-.hi exits when ghc reports that it is done writing the .hi file but leaves ghc running in the background to finish building the .o file, and the command-to-build-.o just blocks until that finishes. The command-to-build-.hi only depends on other command-to-build-.his. That's how my make encoding really works anyways. Probably in a more sophisticated system you could express the idea of a command that produces two resources at different times directly.

In fact, now that I look at my make example again, it seems one wouldn't even need a server to do the blocking; probably advisory file locks are enough.

comment:7 Changed 3 years ago by rwbarton

Here's how this could look with locks instead of a buildserver. The semantics of the lock file is that if you ran the action to build Foo.hi and you can take the lock on Foo.o-lock, then Foo.o is ready.

Foo.hi : Bar.hi Baz.hi
    ( flock Foo.o-lock ghc -c Foo.hs -finterface-file-finished=1,0,x & ) \
      | read -n 1

Foo.o : Foo.hi
    flock Foo.o-lock true

Ideally we'd also propagate the exit status of ghc to the rule for Foo.o, and clean up the lock files.

comment:8 Changed 3 years ago by niteria

You're right, I believe it could work with the encoding you proposed. The tricky bit would be ensuring that the processes, locks and handles don't get leaked.

comment:9 Changed 3 years ago by rwbarton

Milestone: 8.2.18.4.1

comment:10 Changed 23 months ago by bgamari

Milestone: 8.4.18.6.1

This ticket won't be resolved in 8.4; remilestoning for 8.6. Do holler if you are affected by this or would otherwise like to work on it.

comment:11 Changed 18 months ago by bgamari

This will not be addressed in GHC 8.6.

comment:12 Changed 18 months ago by bgamari

Milestone: 8.6.18.8.1

These will not be addressed in GHC 8.6.

comment:13 Changed 12 months ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of low-priority tickets.

Note: See TracTickets for help on using tickets.