Opened 9 months ago

Last modified 7 months ago

#15971 new bug

Hadrian fails Shake's linter on Windows

Reported by: bgamari Owned by:
Priority: high Milestone: 8.8.1
Component: Build System (Hadrian) Version: 8.7
Keywords: Cc: alpmestan, snowleopard, NeilMitchell
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

While trying to bring up a Hadrian builder on Windows I found the following; excerpting:

shakeArgsWith                        0.001s    0%                           
Function shake                       0.407s    0%                           
Database read                        0.001s    0%                           
With database                        0.000s    0%                           
Running rules                     3065.940s   99%  =========================
Pool finished (2932 threads, 4 max)  0.002s    0%                           
Lint checking                        1.810s    0%                           
Total                             3068.160s  100%                           
Lint checking error - 520 values have changed since being depended upon:
  Key:  _build/stage1/gmp/objs/zero_p.o
  Old:  (Just File {mod=0x6D312A39,size=0x2EA,digest=0xACFF03E6} recomputed,"")
  New:  File {mod=0x7475B8F5,size=0x2EA,digest=0xACFF03E6}
       
  Key:  _build/stage1/gmp/objs/zero.o
  Old:  (Just File {mod=0x6DA1AAD9,size=0x2F8,digest=0x586B32DC} recomputed,"")
  New:  File {mod=0x74B61957,size=0x2F8,digest=0x586B32DC}
       
  Key:  _build/stage1/gmp/objs/xor_n.o
  Old:  (Just File {mod=0x6D9D3378,size=0x1DF,digest=0xFBB5353} recomputed,"")
  New:  File {mod=0x74B2AAEC,size=0x1DF,digest=0xFBB5353}

It looks like all of the gmp objects were somehow touched during the build, but their contents are unchanged. Strangely, none of these filenames appear elsewhere in the log, so it's unclear when they could have been touched.

Change History (13)

comment:1 Changed 9 months ago by bgamari

Cc: alpmestan snowleopard added

comment:2 Changed 9 months ago by snowleopard

Component: CompilerBuild System (Hadrian)

comment:3 Changed 9 months ago by snowleopard

I'm working on this.

Strangely, none of these filenames appear elsewhere in the log, so it's unclear when they could have been touched.

Just to clarify, they are produced by this build rule:

https://github.com/ghc/ghc/blob/93e86d6103757b43017535c92bc6970e9e2315a5/hadrian/src/Rules/Gmp.hs#L50-L67

Note that we do not know all GMP object files statically, before the build, and therefore cannot tell Shake which files we expect to produce as the result of running this rule.

I'm not entirely clear about how this causes the lint failure.

comment:4 Changed 9 months ago by snowleopard

Cc: NeilMitchell added

comment:5 Changed 9 months ago by NeilMitchell

This lint file means you created the files, as dependencies, then something went round and did a touch on them. Having a point of creation is fine, but what's doing the touch? If you don't know their names, how do they get into a list of .o files? Where is that rule?

comment:6 Changed 9 months ago by snowleopard

If you don't know their names, how do they get into a list of .o files? Where is that rule?

We unpack an archive which contains these .o files, here is the command that does this:

            build $ target gmpContext (Ar Unpack Stage1)
                [top -/- gmpPath -/- gmpLibrary] [gmpPath -/- gmpObjectsDir]

https://github.com/ghc/ghc/blob/93e86d6103757b43017535c92bc6970e9e2315a5/hadrian/src/Rules/Gmp.hs#L64-L65

Having a point of creation is fine, but what's doing the touch?

Right now the only place we ever refer to these .o files again is here:

        gmpPath <- gmpBuildPath
        need [gmpPath -/- gmpLibraryH]
        map unifyPath <$> getDirectoryFiles "" [gmpPath -/- gmpObjectsDir -/- "*.o"]

https://github.com/ghc/ghc/blob/93e86d6103757b43017535c92bc6970e9e2315a5/hadrian/src/Rules/Library.hs#L105-L107

Note that need [gmpPath -/- gmpLibraryH] records the dependency on the above rule. I initially thought that the reason for the lint failure is that we're using getDirectoryFiles instead of getDirectoryFilesIO, but we get the same lint error when using the untracked version too.

As far as I can see, we never touch or refer to these .o files anywhere else.

comment:7 Changed 9 months ago by NeilMitchell

Presumably you need the output of extraObjects at some point? When you do that it records their modification times. The --lint flag causes it to check the modification time at the end, and it spots it is different. Could the insertion into the archive be somehow modifying these files? Do you extract them twice?

comment:8 Changed 9 months ago by snowleopard

Presumably you need the output of extraObjects at some point?

Ah, yes, we do need them in a few places, e.g. in buildStaticLib:

    objs <- libraryObjects context -- this includes 'extraObjects'
    removeFile archivePath
    build $ target context (Ar Pack stage) objs [archivePath]

Could the insertion into the archive be somehow modifying these files? Do you extract them twice?

No, we extract GMP object files only once.

I'll try to model this situation in a small standalone example to make sense of it.

comment:9 Changed 7 months ago by snowleopard

Here is a simple MR: https://gitlab.haskell.org/ghc/ghc/merge_requests/99

Neil: As you can see, I switched to the untracked version of getDirectoryFiles, because GMP object files are generated. However, this does not fix the Lint failure.

I am a bit lost: I don't know how we could possibly touch these object files again. We unpack the archive with these objects. We then use getDirectoryFiles to get the list of objects, and need it when building the integer-gmp library.

Do you have any ideas how we could possibly mess this up? I have a suspicion that this is a Lint bug, but I couldn't reproduce it on a small example.

Until we figure out how to fix this, I think we'll need to drop the flag --lint when invoking Hadrian, to prevent CI bots from failing.

comment:10 Changed 7 months ago by NeilMitchell

The correctness criteria for getDirectoryFiles is nuanced - and you're actually obeying it - I just write a much simpler "no generated files" one in the docs because that's a reasonable safe approximation. As a result, I'm not surprised your change doesn't help.

What we are observing is the modification time changes. I know that on some file systems (e.g. NFS) the modification time is cached at a different resolution from the resolution it is stored on disk. Is there a chance we're hitting this? If so, the only current options are to disable lint checking (which is probably fine - maybe leave it on for your dev work) or disable use of modification times (which will slow down rechecking).

comment:11 Changed 7 months ago by snowleopard

Thanks Neil! I disabled the Lint for now. In fact, having it enabled all the time is probably unnecessary, since it takes up a little bit of time each run. Hadrian developers can pass --lint if need be.

P.S.: As for the possible reason: your explanation sounds plausible, but I don't know how to check or fix it.

Last edited 7 months ago by snowleopard (previous) (diff)

comment:12 Changed 7 months ago by bgamari

Relevant commit: 7218270dee1067db9c4f342b97e07ca89b80b82d.

snowleopard, please do format ticket references as #15971 instead of a URL. This ensures that cross-referencing Trac comments are generated.

comment:13 Changed 7 months ago by snowleopard

Ben: Sure, will do!

Note: See TracTickets for help on using tickets.