Opened 6 years ago

Last modified 4 years ago

#8144 new bug

Interface hashes include time stamp of dependent files (UsageFile mtime)

Reported by: nh2 Owned by:
Priority: normal Milestone:
Component: Test Suite Version: 7.6.3
Keywords: testcase Cc: tkn.akio@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

See https://github.com/nh2/ghc-bug-time-dependent-interface-hashes for a fast test case.

Having Test.hs:

{-# LANGUAGE CPP #-}

module Test () where

and an empty cabal_macros.h,

touch cabal_macros.h
ghc -c Test.hs -optP-include -optPcabal_macros.h

will generate you a different interface hash most of the time in ghc --show-iface Test.hi.

It looks like the modification time of the .h file makes it into the interface hash, but only at second resolution, and you get compilation IS NOT required if you manage to run it twice per second.

I think the interface hash should be independent from the time stamp of the header file.

This bug can trigger unnecessary recompilation and makes compilations with identical inputs non-deterministic.

Change History (38)

comment:1 Changed 6 years ago by akio

Cc: tkn.akio@… added

comment:2 Changed 6 years ago by nh2

(This happens for me with ghc-7.6.3, ghc-7.4.1 and some version of GHC 7.7.)

comment:3 Changed 6 years ago by simonpj

That certainly sounds wrong. Would anyone care to investigate?

Simon

comment:4 Changed 6 years ago by nh2

http://www.haskell.org/ghc/docs/7.6.3/html/libraries/ghc/HscTypes.html#t:Usage see

UsageFile { usg_file_path :: FilePath
          , usg_mtime     :: UTCTime }

https://github.com/ghc/ghc/blame/9082111dc/compiler/iface/MkIface.lhs#L603

iface_hash <- computeFingerprint ...

where computeFingerprint just uses the instance Binary Usage, so the mtime goes into the fingerprint.

comment:5 Changed 6 years ago by nh2

UsageFile was introduced with addDependentFile in https://github.com/ghc/ghc/commit/b994313.

The modification time being only (obtained) in second resolution is what triggers the inconsistent behaviour.

Regarding that commit, I am also not sure how https://github.com/ghc/ghc/commit/b994313#L3R1283 works:

checkModUsage _this_pkg UsageFile{ usg_file_path = file, usg_mtime = old_mtime } = do
  new_mtime <- liftIO $ getModificationTime file
  return $ old_mtime /= new_mtime

I might be wrong, but it looks to me as if this can only ever return False (UpToDate in current GHC head) because the mtime is so coarse - in other words if your computer is too slow, a second will pass and this will never be considered up to date?

comment:6 Changed 6 years ago by refold

Looks like #7473 is related.

comment:7 Changed 6 years ago by nh2

Summary: Interface hashes include time stamp of included .h file when CPP is usedInterface hashes include time stamp of dependent files (UsageFile mtime)

comment:8 in reply to:  6 ; Changed 6 years ago by jstolarek

Replying to refold:

Looks like #7473 is related.

According to discussion in #7473 getModificationTime was improved on Linux to return time with resolution better than 1 second. nh2, what platform are you on?

comment:9 in reply to:  8 Changed 6 years ago by nh2

Replying to jstolarek:

nh2, what platform are you on?

Linux on all the GHCs I mentioned above.

comment:10 Changed 6 years ago by nh2

Status: newpatch

comment:11 Changed 6 years ago by nh2

Owner: set to nh2

comment:12 Changed 6 years ago by simonmar

Looks good to me. Making iface hashes more deterministic is definitely the way to go.

comment:13 Changed 6 years ago by Austin Seipp <aseipp@…>

In 677820ee3a3aadbf2ed414deb3926381d94b13a8/ghc:

Fix interface hashes including time stamp of dependent files.

Fixes #8144.

Before, the modification time of e.g. #included files (and everything
that ends up as a UsageFile, e.g. via addDependentFile) was taken as
input for the interface hash of a module.

This lead to different hashes for identical inputs on every compilation.

We now use file content hashes instead.

This changes the interface file format.
You will get "Binary.get(Usage): 50" when you try to do an incremental
using .hi files that were created with a GHC 7.7 (only) older than this commit.

To calculate the md5 hash (`Fingerprint`) of a file in constant space,
there now is GHC.Fingerprint.getFileHash, and a fallback version
for older GHCs that needs to load the file into memory completely
(only used when compiling stage1 with an older GHC).

Signed-off-by: Austin Seipp <aseipp@pobox.com>

comment:14 Changed 6 years ago by thoughtpolice

This is now fixed but I have not yet added a test, so I will keep it open. @nh2, can you confirm this fixes your bug? Perhaps include a patch for a simpler test if possible :)

comment:15 Changed 6 years ago by Austin Seipp <aseipp@…>

In b6a572b4272274efe117d5e04f4b9d49e124009c/ghc:

Add some more comments to UsageFile.

This brings them up to date with the changes in #8144.

Signed-off-by: Austin Seipp <aseipp@pobox.com>

comment:16 Changed 6 years ago by nh2

There is small bug in the replacement code used for the stage 1 compiler: The input file is never read :P

(Doesn't become obvious until you try to backport my change to 7.6.)

Please pull:

https://github.com/nh2/ghc/compare/fix-interface-hashes-mtime

comment:17 Changed 6 years ago by thoughtpolice

Thanks, I've merged and I'm validating now.

comment:18 Changed 6 years ago by thoughtpolice

Merged

commit 95ebff9aed3e8570c7ac51148594c48a812c5b19
Author: Austin Seipp <aseipp@pobox.com>
Date:   Fri Aug 23 09:40:59 2013 -0500

    Fix validation failure in Fingerprint.hsc
    
    Signed-off-by: Austin Seipp <aseipp@pobox.com>

commit 41be8d30356d050938f61f06c56928d2c3eb2541
Author: Niklas Hambüchen <mail@nh2.me>
Date:   Fri Aug 23 21:46:14 2013 +0900

    Fingerprint.getFileHash: Fix not reading file at all.
    
    This lead to the stage1 compiler calculating random iface hashes.
    
    Signed-off-by: Austin Seipp <aseipp@pobox.com>

commit 48d7b0e963e88101464136419705e3c9e2e947f9
Author: Niklas Hambüchen <mail@nh2.me>
Date:   Fri Aug 23 21:46:02 2013 +0900

    Fingerprint: Fix comment typo
    
    Signed-off-by: Austin Seipp <aseipp@pobox.com>

comment:19 Changed 6 years ago by nh2

A backport to 7.6 can be found at https://github.com/nh2/ghc/releases/tag/fix-interface-hashes-mtime-7.6.3-release-tarball-tag (based on 7.6.3 release tarball, 64 bit Linux ghc binary also provided).

comment:20 Changed 6 years ago by ezyang

Resolution: fixed
Status: patchclosed

comment:21 Changed 6 years ago by ezyang

Keywords: testcase added
Owner: nh2 deleted
Resolution: fixed
Status: closednew

nh2: Would it be possible to get a test case on this one? (Or is there one already?)

comment:22 in reply to:  21 Changed 6 years ago by nh2

Replying to ezyang:

nh2: Would it be possible to get a test case on this one? (Or is there one already?)

I'll make one eventually (that means hopefully this week).

comment:23 Changed 5 years ago by rodlogic

Should this be closed or is a test case still pending?

comment:24 Changed 5 years ago by nh2

Still pending.

comment:25 Changed 5 years ago by nh2

I added a test case at https://github.com/nh2/ghc/compare/testcase-8144. Please pull.

comment:26 Changed 5 years ago by thoughtpolice

Status: newpatch

comment:27 Changed 5 years ago by thomie

Test LGTM.

comment:28 Changed 5 years ago by thoughtpolice

Milestone: 7.10.1
Status: patchinfoneeded

Niklas, can you please rebase this patch? I manually did it, but then the test started failing (the .stdout file had "compilation IS NOT required" once too many times). Can you please take this and verify it should work? Thanks!

comment:30 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:31 Changed 4 years ago by nomeata

What is the status of this? Stable interfaces are important for distributions like Debian!

comment:32 Changed 4 years ago by nomeata

Priority: normalhigh

Also, from reading the patch, it is not clear to me if the filename is also part of the ABI hash. It should not be, as build directory names may change, e.g. on automatic build machines.

Also see http://bugs.debian.org/785282

comment:33 Changed 4 years ago by nomeata

Priority: highnormal

Ah, wait: This is only waiting for a test case, right?

So then why do I still see this

./usr/lib/ghc/base-4.7.0.2/System/Info.hi
--- /dev/fd/63  2015-05-14 10:15:20.278404032 +0200
+++ /dev/fd/62  2015-05-14 10:15:20.278404032 +0200
@@ -5,7 +5,7 @@
 Way: Wanted [],
      got    []
 interface base:System.Info 7084
-  interface hash: 038b3a2a32cd43c6a82b096e3ef5c665
+  interface hash: 4ff1db7940716ca0e906d3f6b7af1166
   ABI hash: 67cae6f933216805d975ee28cd5ea72d
   export-list hash: 89e8b7a3bfc33bdb505eb586ca9fe64e
   orphan hash: 693e9af84d3dfcc71e640e005bdc5e2e
@@ -64,7 +64,7 @@
   divMod e11d60a38cd49de2f6058cc39227517a
 import safe Prelude 74043f272d60acec1777d3461cfe5ef4
   exports: be888345b248b32ace8f805dd49f3def
-addDependentFile "/build/ghc-LIQtDg/ghc-7.8.4/includes/ghcplatform.h"
+addDependentFile "/build/ghc-Na5lKv/ghc-7.8.4/includes/ghcplatform.h"
 addDependentFile "libraries/base/dist-install/build/autogen/cabal_macros.h"
 addDependentFile "/usr/include/stdc-predef.h"
 f85496685a82c42827c38c1ecec0b5fd

Looks like the filename is included in the hash then, because the files are identical in both builds...

comment:34 Changed 4 years ago by rwbarton

nomeata, sorry if I am missing something obvious, but this ticket seems to be about the time stamp not appearing in the interface hash, so the fact that this ticket is just waiting for a test is not incompatible with the interface hash including the file name.

Currently we need the .h file name in the interface file so that we can check whether the file fingerprint matches, and we use the interface hash to determine whether to write a new interface file, so it's not immediately obvious that we can avoid putting the file name in the interface hash. Maybe you'd like to create a new ticket about finding some way to avoid this?

comment:35 Changed 4 years ago by nomeata

and we use the interface hash to determine whether to write a new interface file,

Ah, this is new to me. I assumed that when a module is compiled, its interface is written, and the interface hash is only used to decide whether to rebuild another module or not.

But you are right, this deserves a separate ticket.

comment:36 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:37 Changed 4 years ago by thomie

Component: CompilerTest Suite
Status: infoneedednew

Bug is fixed. Just needs a test.

comment:38 Changed 4 years ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.