Opened 7 years ago

Last modified 8 months ago

#7411 new bug

Exceptions are optimized away in certain situations

Reported by: SimonHengel Owned by: tdammers
Priority: high Milestone: 8.10.1
Component: Compiler Version: 7.6.1
Keywords: seq, deepseq, evaluate, exceptions, Simplifier Cc: dfeuer
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Test Case: simplCore/should_fail/T7411
Blocked By: Blocking:
Related Tickets: #5129 #15225 Differential Rev(s):
Wiki Page:

Description (last modified by bgamari)

The issue came up in this thread on glasgow-haskell-users.

Steps to reproduce:

-- file Foo.hs
import Control.Exception
import Control.DeepSeq
main = evaluate (('a' : undefined) `deepseq` return () :: IO ())
$ ghc -fforce-recomp -fpedantic-bottoms -O Foo.hs

Expected result:

The program should fail with:

Foo: Prelude.undefined

Actual result:

The program succeeds.

Compiling the program with -fno-state-hack helps.

Attachments (5)

nofib-analyse.log.gz (30.7 KB) - added by tdammers 15 months ago.
nofib analyse output (gzipped because of size limit)
module-size-comparison (168.8 KB) - added by tdammers 15 months ago.
Comparing module sizes in a GHC build tree
nofib-gn-gnl.analyze.log (211.7 KB) - added by tdammers 15 months ago.
nofib analyse output: effect of NSH on libraries only vs. NSH on all of GHC
nofib-ns-nn.analyze.log (211.7 KB) - added by tdammers 15 months ago.
nofib analyse output: nofib compiled with NSH vs. normally, on a vanilla GHC build
nofib-ns-nn-gnl.analyze.log (211.7 KB) - added by tdammers 15 months ago.
nofib analyse output: nofib compiled with NSH vs. normally, on a GHC build with NSH libraries

Download all attachments as: .zip

Change History (56)

comment:1 Changed 6 years ago by igloo

difficulty: Unknown
Milestone: 7.8.1
Priority: normalhigh

Thanks for the report.

comment:2 Changed 5 years ago by thoughtpolice

Milestone: 7.8.37.8.4

Moving to 7.8.4.

comment:3 Changed 5 years ago by thoughtpolice

Milestone: 7.8.47.10.1

Moving (in bulk) to 7.10.4

comment:4 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1

comment:5 Changed 4 years ago by thomie

comment:6 Changed 4 years ago by Ben Gamari <ben@…>

In 15cb83d4/ghc:

Add testcase for #7411

comment:7 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:8 Changed 4 years ago by bgamari

Test Case: simplCore/should_fail/T7411

comment:9 Changed 4 years ago by bgamari

Description: modified (diff)

comment:10 Changed 3 years ago by Ben Gamari <ben@…>

In aa61174/ghc:

users-guide: Add references to various issues in bugs section

Test Plan: Read it

Reviewers: austin

Reviewed By: austin

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2052

GHC Trac Issues: #7411, #11197, #11554, #11715

comment:11 Changed 3 years ago by bgamari

Milestone: 8.0.18.0.2

comment:12 Changed 3 years ago by bgamari

Milestone: 8.0.28.0.3

This won't be fixed for 8.0.2.

comment:13 Changed 3 years ago by bgamari

Milestone: 8.0.38.2.1

At this point it is rather unlikely that there will be an 8.0.3. Re-milestoning.

comment:14 Changed 3 years ago by ThreeFx

Even with fno-state-hack this program succeeds under GHC version 8.0.1.

comment:15 Changed 2 years ago by bgamari

Milestone: 8.2.18.4.1

This won't be fixed for 8.2.

comment:16 Changed 19 months ago by bgamari

Milestone: 8.4.18.6.1

Nor for 8.4.1.

comment:17 Changed 17 months ago by tdammers

Compiling on GHC HEAD reproduces the problem:

../ghc/HEAD/_build/stage1/bin/ghc example.hs -fforce-recomp -O produces a program that runs without error, while ../ghc/HEAD/_build/stage1/bin/ghc example.hs -fforce-recomp behaves as it should.

Comparing simplified Core shows that with -O, deepseq no longer appears in the output, so it seems that inlining deepseq is part of the problem.

comment:18 Changed 17 months ago by tdammers

The relevant Core for the unoptimized run:

main
  = evaluate
      (deepseq
         ($fNFData[] $fNFDataChar)
         (: (C# 'a'#)
            (undefined
               ((pushCallStack
                   (unpackCString# "undefined"#,
                    SrcLoc
                      (unpackCString# "main"#)
                      (unpackCString# "Main"#)
                      (unpackCString# "example.hs"#)
                      (I# 4#)
                      (I# 25#)
                      (I# 4#)
                      (I# 34#))
                   emptyCallStack)
                `cast` <Co:4>)))
         (return $fMonadIO ()))

As expected, deepseq is retained. However, in the optimized version, we get this:

Rec {
main_go
main_go
  = \ ds_a2Nc ->
      case ds_a2Nc of {
        [] -> ();
        : x_a2Ng xs_a2Nh ->
          case x_a2Ng of { C# ipv_s2Np -> main_go xs_a2Nh }
      }
end Rec }

main2
main2 = undefined (lvl10_r4dT `cast` <Co:4>)

lvl11_r4dU
lvl11_r4dU = C# 'a'#

lvl12_r4dV
lvl12_r4dV = : lvl11_r4dU main2

lvl13_r4dW
lvl13_r4dW
  = \ eta_B1 -> case main_go lvl12_r4dV of { () -> (# eta_B1, () #) }

main1
main1 = \ s_a2Nw -> (# s_a2Nw, lvl13_r4dW `cast` <Co:3> #)

(boilerplate stripped for clarity)

Which, IIUC, does not respect the spirit of deepseq - specifically, the list construction in lvl12_r4dV isn't strict, so the undefined that main2 produces isn't being forced.

comment:19 Changed 17 months ago by tdammers

Owner: set to tdammers

comment:20 Changed 17 months ago by tdammers

Interestingly, the patch from #5129 does *not* fix this one too, so it's not just due to incorrectly optimizing seq away; there is more happening here.

comment:21 Changed 17 months ago by tdammers

Another data point: this only happens with lists, not with tuples - e.g., the following works as expected:

import Control.Exception
import Control.DeepSeq
main = evaluate (('a', undefined :: [Char]) `deepseq` return () :: IO ())

Which is baffling in that the NFData instances for tuples and lists are strikingly similar and *should* do the same thing.

It gets even more baffling when I implement a custom (but isomorphic) drop-in replacement for [], like so:

import Control.Exception
import Control.DeepSeq

data List a = Nil | Cons a (List a)

instance NFData a => NFData (List a) where
  rnf v = case v of
    Nil -> ()
    (Cons x (Cons y xs)) -> rnf x `seq` rnf y `seq` rnf xs

main = evaluate ((Cons 'a' $ Cons undefined $ Nil) `deepseq` return () :: IO ())

This example reproduces the problem; however, changing the instance like this behaves:

instance NFData a => NFData (List a) where
  rnf v = case v of
    Nil -> ()
    (Cons x (Cons y xs)) -> rnf x `seq` rnf y

In other words, if the NFData instance says *not* to look beyond the second list element, then the failure occurs, but if does look beyond, then it gets optimized away.

comment:22 Changed 17 months ago by tdammers

OK, boiled it down to a simpler test case that doesn't even involve deepseq:

import Control.Exception

dslist :: [a] -> b -> b
dslist xs b = go xs `seq` b
  where
    go [] = ()
    go (x:xs) = x `seq` go xs
    -- the following variation also reproduces the problem:
    -- go (x:xs) = go (x `seq` xs)

main = evaluate (('a':undefined) `dslist` return () :: IO ())

This is essentially what deepseq does on lists, with all the typeclass instances and type variables manually unrolled.

Note that in order to trigger the bug, we actually do need the go function; if instead we pass the b around and recurse directly, the problem goes away, like so:

import Control.Exception

dslist :: [a] -> b -> b
dslist xs b = go xs `seq` b
  where
    dslist [] a = a
    dslist (x:xs) a = x `seq` dslist xs a

main = evaluate (('a':undefined) `dslist` return () :: IO ())
Version 0, edited 17 months ago by tdammers (next)

comment:23 in reply to:  20 Changed 17 months ago by tdammers

Replying to tdammers:

Interestingly, the patch from #5129 does *not* fix this one too, so it's not just due to incorrectly optimizing seq away; there is more happening here.

Following the #5129 lead, I added some trace logging to the code touched in #5129; specifically I trapped this case:

        | SeqOp <- op
        -> all (expr_ok primop_ok) args

...which is what caught the seq# instances that were inappropriately optimized away previously.

However, for the offending examples, this trap doesn't fire, which means we're not hitting the SeqOp case at all. So apparently the seq call gets optimized away at an earlier stage.

comment:24 Changed 17 months ago by dfeuer

Cc: dfeuer added

comment:25 Changed 17 months ago by dfeuer

I think the -fno-state-hack is almost certainly a clue, although I'm not exactly sure how it ends up relating in that specific example. What I suspect is happening (without actually tracing simplification) is roughly this:

evaluate (undefined `seq` return ())
-- ===> inline
IO $ \s -> seq# (undefined `seq` IO (\s' -> (# s', () #))) s
-- ===> simplify the coercions for my sanity
IO $ \s -> seq# (IO $ undefined `seq` \s' -> (# s', () #)) s
-- ===> This is where I suspect things go sideways. I believe we generally
--      assume that it's okay to eta-expand IO things.
IO $ \s -> seq# (IO $ \s'' -> (undefined `seq` (\s' -> (# s', () #))) s'') s

We've effectively turned a bottom with an IO type into a non-bottom IO value that only returns a bottom. Then the seq# ends up going away:

-- ===> beta reduce past seq
IO $ \s -> seq# (IO $ \s'' -> undefined `seq` (# s'', () #)) s
-- ===> PrelRules.seqRule says we can eliminate seq# for WHNF things
IO $ \s -> (# s, IO $ \s'' -> undefined `seq` (# s'', () #))

I imagine the behavior is a bit fragile because it depends on GHC not recognizing that the undefined value is in fact undefined, or at least not recognizing it too early. I suspect the solution is likely to make -fpedantic-bottoms go a little further than it currently does, although I don't know enough to say just what.

comment:26 Changed 17 months ago by simonpj

Here's what is going on:

  • After a late float-out pass we get this
    lvl_s2Yn [Occ=Once]   -- Comes from 'return ()'
      :: GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
    [LclId]
    lvl_s2Yn
      = \ (s_a2z6 [Occ=Once, Dmd=<S,U>]
             :: GHC.Prim.State# GHC.Prim.RealWorld) ->
          (# s_a2z6, GHC.Tuple.() #)
    
    lvl_s2Yo [Occ=Once] :: GHC.Types.IO ()
    [LclId]  -- Comes from (('a':undefined) `dslist` return ())
    lvl_s2Yo
      = case go_s2WQ lvl_s2Ym of { __DEFAULT ->
        lvl_s2Yn
        `cast` (Sym (GHC.Types.N:IO[0] <()>_R)
                :: (GHC.Prim.State# GHC.Prim.RealWorld
                    -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #) :: *)
                   ~R# (GHC.Types.IO () :: *))
        }
    
    main_s2z7
      = \ (s_a2yI [Occ=Once, Dmd=<S,U>]
             :: GHC.Prim.State# GHC.Prim.RealWorld) ->
          GHC.Prim.seq#
            @ (GHC.Types.IO ()) @ GHC.Prim.RealWorld lvl_s2Yo s_a2yI
    

So far so good.

  • But then we see that lvl_s2Yn has arity 1 (which it does). And then we eta-expand lvl_s2Yo, moving the lambda outside the case go lvl_s2Ym, to get
    lvl_s2Yo
      = (\ (eta_B1 [Occ=Once] :: GHC.Prim.State# GHC.Prim.RealWorld) ->
           case go_s2WQ lvl_s2Ym of { () ->
           (lvl_s2Yn `cast` ...) eta_B1
           })
        `cast` ...
    
  • Aargh! Now lvl_s2Yo is a HNF, not bottom as it should be. And so seq# lvl_s2Y# x is (correctly) reduced to x.

How to avoid this? It's the notorious "state hack" that "justifies" floating the lambda out. The motivation is functions like this:

f :: [Int] -> IO ()
f xs = print ys >> print xs
  where
    ys = reverse xs

Without the state hack we get an arity-1 function thus

f = \xs. let ys = reverse xs in
         \s. case print ys s of
               (# s' ,_ #) -> print xs s'

That's signficantly less efficient than an arity-2 function; but that penalty may be worth paying if the application (f xs) is shared. For example:

  let fxs = f [1..100]
  in fxs >> fxs >> fxs

In the arity-1 case we'd reverse [1..100] once; in the arity-2 case we'd reverse it three times. In practice this doesn't seem to come up much.


I'm not sure it's worth fixing this. It doesn't seem to be harming anyone, an it's not clear how to fix it.

The only fix I can think of (apart from abandoning the state hack) would be to restrict the state hack a bit more so that it doesn't apply to an arity-zero binding like lvl_s2Yo -- but still does apply to f above. But if we did that, then this minor variant of #7411 would still misbehave:

{-# NOINLINE f #-}
f2 :: Int -> IO ()
f2 x = ('a':undefined) `dslist` return ()

main = evaluate (f2 3)

Here I've just given f an argument; like fabove it'll be eta-expanded. Morally, it's the same problem as before, but perhaps it just occurs less often.

I'm inclined to focus on other tickets that are causing users actual pain.

But it's a long time since I tried measuring the perf cost of simply switching the state hack off altogether. Would anyone like to to just try that?

comment:27 Changed 17 months ago by dfeuer

Simon, I don't see anything problematic about the floating. It looks to me as though the eta expansion past the case is where meaning shifts, but maybe I'm missing something?

comment:28 Changed 17 months ago by simonpj

I don't see anything problematic about the floating

If we eta-expand lvl_s2Yo its value goes from possibly-bottom (if there are bottoms in the list, which there are) to definitely non-bottom. As a result seq# lvl_s2yo e is optimised to e; which is incorrect here (that is precisely the bug reported), but absolutely fine for definite non-values in the first argument.

comment:29 in reply to:  26 Changed 17 months ago by tdammers

Replying to simonpj:

I'm not sure it's worth fixing this. It doesn't seem to be harming anyone, an it's not clear how to fix it.

OK, so here's what I think.

Right now, the simplifier will change the behavior of programs. This is completely unexpected and undocumented, and it's subtle enough to make it all-night debugging session material. I'm a bit uncomfortable with just shrugging this off.

Last resort would be to just document the problem (where?) and move on; but I think we should at least give the other options a fair try.

comment:30 Changed 17 months ago by simonpj

I think we should at least give the other options a fair try.

Do you have other options in mind? This has bugged us for a decade -- it's why it's called the "state hack".

Getting perf numbers for compiling everything (incl libraries) with -fno-state-hack would give us a bit of data to go on.

comment:31 in reply to:  30 Changed 17 months ago by tdammers

Replying to simonpj:

I think we should at least give the other options a fair try.

Do you have other options in mind? This has bugged us for a decade -- it's why it's called the "state hack".

Getting perf numbers for compiling everything (incl libraries) with -fno-state-hack would give us a bit of data to go on.

I'm thinking in two directions here.

The first one would be to just throw out the state hack altogether, as you suggested, or at least make -fno-state-hack the default and pepper the fstate-hack flag with dire warnings. Timing things with and without the state hack shouldn't be a very difficult task, so I suggest I put a bit of time into this.

For the second one, let me summarize what I think the core of the problem is: The equational reasoning strategies behind this eta expansion and seq are at odds. The eta expansion assumes that floating bottoms around will not make an effective difference (as per fast-and-loose reasoning), but this assumption doesn't hold when seq is involved, because seq is like this magical backdoor to bottoms. Eta expansion says that case x of { _ -> f } is equivalent to \eta -> case x of { _ -> f eta }, even if there are bottoms involved. But this breaks when seq is involved, because the eta expansion allows the argument to slip past seq.

So what we would need to fix this "properly" would be a way to mark the relevant code sections as "here be dragons", and make the state hack skip over those. E.g., in the example program discussed earlier:

import Control.Exception

dslist :: [a] -> b -> b
dslist [] b = b
dslist (x:xs) b = x `seq` dslist xs b

main = evaluate (('a':undefined) `dslist` return () :: IO ())

...we would mark the x `seq` dslist xs b part as "dangerous", and forbid eta-expanding across this boundary.

Or rather, in this bit here:

lvl_s2Yn [Occ=Once]   -- Comes from 'return ()'
  :: GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
[LclId]
lvl_s2Yn
  = \ (s_a2z6 [Occ=Once, Dmd=<S,U>]
         :: GHC.Prim.State# GHC.Prim.RealWorld) ->
      (# s_a2z6, GHC.Tuple.() #)

lvl_s2Yo [Occ=Once] :: GHC.Types.IO ()
[LclId]  -- Comes from (('a':undefined) `dslist` return ())
lvl_s2Yo
  = case go_s2WQ lvl_s2Ym of { __DEFAULT ->
    lvl_s2Yn
    `cast` (Sym (GHC.Types.N:IO[0] <()>_R)
            :: (GHC.Prim.State# GHC.Prim.RealWorld
                -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #) :: *)
               ~R# (GHC.Types.IO () :: *))
    }

main_s2z7
  = \ (s_a2yI [Occ=Once, Dmd=<S,U>]
         :: GHC.Prim.State# GHC.Prim.RealWorld) ->
      GHC.Prim.seq#
        @ (GHC.Types.IO ()) @ GHC.Prim.RealWorld lvl_s2Yo s_a2yI

...we would have to drag along the information that go_s2WQ is dangerous, and infer from that that we cannot eta-expand lvl_s2Yo.

But I suspect this would be a rather invasive change (extending Core with such markers), and detecting such sections correctly could be tricky. Or maybe not, I really don't know.

Last edited 16 months ago by tdammers (previous) (diff)

comment:32 Changed 16 months ago by tdammers

I ran two full builds of GHC (make clean; ./boot; ./configure; make -j2 on a dual-core i5 machine, 4 GB RAM, 7200 RPM HDD, debian stretch, stage0 = GHC 8.4.1), one with the prof flavour, the other with a modified prof flavour with GhcLibHcOpts += -fno-state-hack.

It turns out that the -fno-state-hack version is slightly faster:

Prof build:

real	66m5.407s
user	118m50.668s
sys	6m7.044s

Prof + -fno-state-hack build:

real	65m42.784s
user	118m32.892s
sys	6m5.252s

I haven't done any in-depth profiling yet, but it seems to me that removing the state hack would most not make things significantly worse overall.

Last edited 16 months ago by tdammers (previous) (diff)

comment:33 Changed 16 months ago by tdammers

Pinging simonpj: do you think we need more data than this before deciding to remove the state hack, or at least make -fno-state-hack the default?

comment:34 Changed 16 months ago by bgamari

As I said on IRC, we should have nofib numbers at least.

comment:35 Changed 16 months ago by tdammers

Ran a few test, and I'm getting perplexing results.

First of all, a full GHC build from scratch, with -fno-state-hack everywhere, gives mixed results:

Regular profiling build:

real	66m1.181s
user	118m55.812s
sys	6m9.724s

Profiling build with -fno-state-hack enabled:

real	66m25.030s
user	119m48.580s
sys	6m7.568s

So the difference is small, real and sys (which, I believe, is what counts) slightly in favor of removing the state hack, user in favor of keeping it.

I've also done nofib runs with both compilers, also enabling / disabling the state hack accordingly; however it turns out that disabling the state hack causes nofib to fail:

------------------------------------------------------------------------
== make all --no-print-directory;
 in /home/tobias/well-typed/devel/ghc/HEAD/nofib/shootout/fasta
------------------------------------------------------------------------
HC = /home/tobias/well-typed/devel/ghc/HEAD/inplace/bin/ghc-stage2
HC_OPTS = -O2 -Rghc-timing -H32m -hisuf hi -fno-state-hack -O2 -rtsopts -O2 -XBangPatterns -XOverloadedStrings -package bytestring
RUNTEST_OPTS = -ghc-timing
==nofib== fasta: time to compile Main follows...
/home/tobias/well-typed/devel/ghc/HEAD/inplace/bin/ghc-stage2 -O2 -Rghc-timing -H32m -hisuf hi -fno-state-hack -O2 -rtsopts -O2 -XBangPatterns -XOverloadedStrings -package bytestring -c Main.hs -o Main.o

Main.hs:30:1: warning: [-Wtabs]
    Tab character found here, and in 13 further locations.
    Please use spaces instead.
   |
30 |         let !k = min modulus (floor (fromIntegral modulus * (p::Float) + 1))
   | ^^^^^^^^
<<ghc: 401958512 bytes, 72 GCs, 10720054/28796976 avg/max bytes residency (6 samples), 61M in use, 0.001 INIT (0.001 elapsed), 0.342 MUT (0.420 elapsed), 0.240 GC (0.238 elapsed) :ghc>>
==nofib== fasta: size of Main.o follows...
   text	   data	    bss	    dec	    hex	filename
   7245	   2624	      0	   9869	   268d	Main.o
==nofib== fasta: time to link fasta follows...
<<ghc: 49626024 bytes, 45 GCs, 1356153/2672728 avg/max bytes residency (5 samples), 8M in use, 0.001 INIT (0.001 elapsed), 0.025 MUT (0.468 elapsed), 0.060 GC (0.060 elapsed) :ghc>>
==nofib== fasta: size of fasta follows...
   text	   data	    bss	    dec	    hex	filename
 708996	 127712	  16232	 852940	  d03cc	fasta
==nofib== fasta: time to run fasta follows...
../../runstdtest/runstdtest ./fasta  -o1 fasta.stdout -o1 fasta.stdout  -ghc-timing     2500000;   ../../runstdtest/runstdtest ./fasta  -o1 fasta.stdout -o1 fasta.stdout  -ghc-timing     2500000;   ../../runstdtest/runstdtest ./fasta  -o1 fasta.stdout -o1 fasta.stdout  -ghc-timing     2500000;   ../../runstdtest/runstdtest ./fasta  -o1 fasta.stdout -o1 fasta.stdout  -ghc-timing     2500000;   ../../runstdtest/runstdtest ./fasta  -o1 fasta.stdout -o1 fasta.stdout  -ghc-timing     2500000;
0.43user 0.02system 0:00.45elapsed 99%CPU (0avgtext+0avgdata 4824maxresident)k
0inputs+49656outputs (0major+608minor)pagefaults 0swaps
././fasta 2500000 < /dev/null
expected stdout not matched by reality
--- fasta.stdout	2018-06-02 03:00:43.887025433 +0200
+++ /tmp/runtest4673.1	2018-06-02 03:09:19.651755697 +0200
@@ -83337,333335 +83337,333335 @@
 cttBtatcatatgctaKggNcataaaSatgtaaaDcDRtBggDtctttataattcBgtcg
-tactDtDagcctatttSVHtHttKtgtHMaSattgWaHKHttttagacatWatgtRgaaa
-NtactMcSMtYtcMgRtacttctWBacgaaatatagScDtttgaagacacatagtVgYgt
-cattHWtMMWcStgttaggKtSgaYaaccWStcgBttgcgaMttBYatcWtgacaYcaga
-gtaBDtRacttttcWatMttDBcatWtatcttactaBgaYtcttgttttttttYaaScYa
-HgtgttNtSatcMtcVaaaStccRcctDaataataStcYtRDSaMtDttgttSagtRRca
#### ~3.1 million similar lines follow ####
-taatataagctgcgccaggggatttttccagatcatctggcctgtgtatatgttcaaatc
+gacgaatatattagttatagtttactatccaaataaattaagcgaatcgaaataaactgt
+gacgaatatattagttatagtttactatccaaataaattaagcgaatcgaaataaactgt
+gacgaatatattagttatagtttactatccaaataaattaagcgaatcgaaataaactgt
#### ~208k identical lines follow
+gacgaatatattagttatagtttactatccaaataaattaagcgaatcgaaataaactgt
 taatagccgagagaaattac
../../mk/target.mk:101: recipe for target 'runtests' failed
make[2]: *** [runtests] Error 1
Failed making all in fasta: 1
../mk/ghc-recurse.mk:65: recipe for target 'all' failed
make[1]: *** [all] Error 1
Failed making all in shootout: 1
mk/ghc-recurse.mk:65: recipe for target 'all' failed
make: *** [all] Error 1

comment:36 Changed 16 months ago by bgamari

The correctness issue is very concerning (especially given that this is a rather vanilla program). I would say that this deserves its own ticket. Let's focus on that before turning our attention to the performance issue.

comment:37 Changed 16 months ago by simonpj

A program that might run slower without the state hack is

f :: Char -> IO ()
f 'x' = return ()
f c   = putChar c

Now call that f many times. Without the state hack I think it'll have arity 1, returning a lambda that does the I/O.

We must find out what happens with fasta, yes, but what are the perf results from nofib?

comment:38 Changed 15 months ago by tdammers

comment:39 Changed 15 months ago by bgamari

Milestone: 8.6.18.8.1

This won't be fixed in 8.6. Bumping to 8.8.

comment:40 Changed 15 months ago by tdammers

To do:

  • turn on core linter
  • see if it still crashes when compiling with the stage1 compiler
  • compile nofib with -fno-state-hack, but with the "clean" compiler (compiled with state hack)

comment:41 Changed 15 months ago by tdammers

Did 4 full runs of compiling GHC from scratch and then compiling and running nofib, with different configurations. Here's what happened.

  1. "vanilla": compile everything normally (the prof build that comes with GHC). Everything works as expected.
  2. "nsh": compile everything with -fno-state-hack (GHC itself, libraries, nofib). Everything works as expected.
  3. "stage1": compile GHC normally, nofib with -fno-state-hack, but use stage1 compiler instead of stage2. Everything works as expected.
  4. "nofib-nsh": compile GHC normally, nofib with -fno-state-hack. GHC segfaults while compiling Main.hs from the bernouilli program (which simply happens to be the first one it encounters). Despite passing -dcore-lint, I get no linter output.

Here's the log:

------------------------------------------------------------------------
== Recursively making `all' in runstdtest nofib-analyse imaginary spectral real shootout ...
PWD = /home/tobias/well-typed/devel/ghc/HEAD/nofib
------------------------------------------------------------------------
------------------------------------------------------------------------
== make all ;
 in /home/tobias/well-typed/devel/ghc/HEAD/nofib/runstdtest
------------------------------------------------------------------------
rm -f -f runstdtest
echo '#!/usr/bin/perl'                               >> runstdtest
echo '$RM             = "rm -f";'             >> runstdtest
echo '$CONTEXT_DIFF   = "diff -U 1";'   >> runstdtest
cat runstdtest.prl                                >> runstdtest
chmod +x runstdtest
Finished making all in runstdtest: 0
------------------------------------------------------------------------
== make all ;
 in /home/tobias/well-typed/devel/ghc/HEAD/nofib/nofib-analyse
------------------------------------------------------------------------
make[1]: Nothing to be done for 'all'.
Finished making all in nofib-analyse: 0
------------------------------------------------------------------------
== make all ;
 in /home/tobias/well-typed/devel/ghc/HEAD/nofib/imaginary
------------------------------------------------------------------------
------------------------------------------------------------------------
== Recursively making `all' in bernouilli digits-of-e1 digits-of-e2 exp3_8 gen_regexps integrate paraffins primes queens rfib tak wheel-sieve1 wheel-sieve2 x2n1 kahan ...
PWD = /home/tobias/well-typed/devel/ghc/HEAD/nofib/imaginary
------------------------------------------------------------------------
------------------------------------------------------------------------
== make all --no-print-directory;
 in /home/tobias/well-typed/devel/ghc/HEAD/nofib/imaginary/bernouilli
------------------------------------------------------------------------
HC = /home/tobias/well-typed/devel/ghc/HEAD/inplace/bin/ghc-stage2
HC_OPTS = -O2 -Rghc-timing -H32m -hisuf hi -fno-state-hack -O2 -dcore-lint -rtsopts
RUNTEST_OPTS = -ghc-timing
==nofib== bernouilli: time to compile Main follows...
/home/tobias/well-typed/devel/ghc/HEAD/inplace/bin/ghc-stage2 -O2 -Rghc-timing -H32m -hisuf hi -fno-state-hack -O2 -dcore-lint -rtsopts -c Main.hs -o Main.o
../../mk/suffix.mk:23: recipe for target 'Main.o' failed
make[2]: *** [Main.o] Segmentation fault (core dumped)
Failed making all in bernouilli: 1
../mk/ghc-recurse.mk:65: recipe for target 'all' failed
make[1]: *** [all] Error 1
Failed making all in imaginary: 1
mk/ghc-recurse.mk:65: recipe for target 'all' failed
make: *** [all] Error 1

Unfortunately I hadn't done a run with GHC stage1 compiled with -fno-state-hack; I'll do that next.

comment:42 Changed 15 months ago by tdammers

Additional runs:

  • stage1 GHC without state hack and nofib without state hack -> succeeds
  • stage2 GHC without state hack and nofib with state hack -> succeeds

So either there is something strange going on with the combination GHC + libs with state hack, nofib without state hack; or my testing leaves artifacts somehow.

Changed 15 months ago by tdammers

Attachment: nofib-analyse.log.gz added

nofib analyse output (gzipped because of size limit)

comment:43 Changed 15 months ago by tdammers

OK, so it turns out that make clean doesn't actually clean things up thoroughly enough. I did another test run, building both GHC and nofib from a fresh checkout for each run, and now everything works just fine. So it seems that the core dumps I got earlier were somehow caused by artifacts from a previous build still being in the build tree somewhere; clearly if you link things together in ways they weren't meant to, strange behavior is to be expected.

Now armed with that, and a fixed fasta implementation from #15225, I managed to do six nofib runs; the interesting ones are:

  • Both GHC and nofib compiled with the state hack in place - this is the current situation (nofibrun-gs_ns2.log)
  • GHC with the state hack, nofib compiled with -fno-state-hack (nofibrun-gs-nn2.log
  • Both GHC and nofib compiled with -fno-state-hack (nofibrun-gn_nn2.log)

Results from nofib-analyse attached; key takeaway however is that the difference overall is small and slightly in favor of removing the state hack. A few interesting deviations that stand out:

  • GHC compiled with -fno-state-hack consistently produces larger binaries, by about 5%
  • Programs compiled with a GHC compiled with -fno-state-hack run faster by 3.2% on average, but there is a lot of deviation
  • Most of the above improvements seem to be from more efficient GC
  • Compile times differ drastically; vanilla GHC compiling programs with -fno-state-hack is 23.6% faster on average than the same compiler with the state hack; compiling GHC itself with -fno-state-hack and then compiling the nofib suite also with -fno-state-hack is almost 50% faster than compiling everything normally. The difference in allocations is even more whopping, with 35.5% and 58.3% respectively.

comment:44 Changed 15 months ago by simonpj

Well this is surprising news, but I think good news too. Maybe we can kill off the state hack altogether!

But before we do that, let's really understand our data. (A minor point: in the nfib-analyse output, the column headings are cut off in the summary table; can you use shorter file names?)

  • I'm astounded by the reduction in compile time (and compiler allocation), in both variants. Is that caused by (a) a faster GHC stage 2, or (b) smaller Core sizes? Could you pick a module with a particularly big reduction, and dig into why? I confess that I am a bit suspicious. Compile times don't usually halve!
  • Binary sizes go up 5% when "GHC is compiled with -fno-state-hack". I think that the important thing is that the libraries (which are linked into the nofib binary) are compiled with -fno-state-hack. I doubt it's caused by the fact that ghc-stage2 is compiled with -fno-state-hack. Can you confirm?
  • And if binary sizes go up by 5% because the libraries are getting bigger, is that a generic 5% increase in .o file size? Or do a few key modules get a lot bigger? It's puzzling because the per-moudule "Module Sizes" stats in the nofib log suggest that there is an essentially-zero effect on nofib module sizes.
  • As I wrote above, the motivation for the state hack is functions lie
    f :: [Int] -> IO ()
    f xs = print ys >> print xs
      where
        ys = reverse xs
    
    test 0 = return ()
    test n = f [1,n] >> test (n-1)
    
    which I expected to be much less efficient without the state hack. Would it be worth trying to demonstrate a program that does get worse in this way? I'm still amazed at how good the results are!

comment:45 Changed 15 months ago by simonpj

PS: the bad-ness of 'make clean' is also a worry. It wasted a lot of your time.

comment:46 in reply to:  44 Changed 15 months ago by tdammers

Replying to simonpj:

  • As I wrote above, the motivation for the state hack is functions lie
    f :: [Int] -> IO ()
    f xs = print ys >> print xs
      where
        ys = reverse xs
    
    test 0 = return ()
    test n = f [1,n] >> test (n-1)
    
    which I expected to be much less efficient without the state hack. Would it be worth trying to demonstrate a program that does get worse in this way? I'm still amazed at how good the results are!

OK, I did some experimenting, and it seems that the state hack makes absolutely no difference for such programs. Specifically, I used the following fully-fledged example program:

module Main where

import System.Environment
import System.IO

f :: Handle -> [Int] -> IO ()
f h xs = hPrint h ys >> hPrint h xs
  where
      ys = reverse xs

test :: Int -> Handle -> IO ()
test 0 h = return ()
test n h = f h [1,n] >> test (n-1) h

main = do
  args <- getArgs
  case args of
    x:y:_ -> withFile y WriteMode $ test (read x)
    x:_ -> test (read x) stdout
    _ -> test 0 stdout

This allows us to pass an arbitrary number of rounds to test on the command line, as well as an output file; passing /dev/null makes it easier to run the whole thing through time. Now given two GHC builds in ghc-prof (normal profiling build) and ghc-prof-nsh (GHC and boot libs built with -fno-state-hack), and with our example program in bad-ex.hs, I can run the following session:

tobias@zoidberg:~/well-typed/devel/ghc-7411/ > ./ghc-prof/inplace/bin/ghc-stage2 bad-ex.hs -fforce-recomp -O2 && time ./bad-ex 50000000 /dev/null      
[1 of 1] Compiling Main             ( bad-ex.hs, bad-ex.o )
Linking bad-ex ...
./bad-ex 50000000 /dev/null  53.50s user 0.20s system 99% cpu 53.713 total
tobias@zoidberg:~/well-typed/devel/ghc-7411/ > ./ghc-prof-nsh/inplace/bin/ghc-stage2 bad-ex.hs -fforce-recomp -fno-state-hack -O2 && time ./bad-ex 50000000 /dev/null
[1 of 1] Compiling Main             ( bad-ex.hs, bad-ex.o )
Linking bad-ex ...
./bad-ex 50000000 /dev/null  52.75s user 0.19s system 99% cpu 52.956 total

Which suggests that even for a program that is presumably about as bad as it gets, the state hack doesn't make a difference for the better.

comment:47 in reply to:  44 Changed 15 months ago by tdammers

Replying to simonpj:

  • Binary sizes go up 5% when "GHC is compiled with -fno-state-hack". I think that the important thing is that the libraries (which are linked into the nofib binary) are compiled with -fno-state-hack. I doubt it's caused by the fact that ghc-stage2 is compiled with -fno-state-hack. Can you confirm?
  • And if binary sizes go up by 5% because the libraries are getting bigger, is that a generic 5% increase in .o file size? Or do a few key modules get a lot bigger? It's puzzling because the per-moudule "Module Sizes" stats in the nofib log suggest that there is an essentially-zero effect on nofib module sizes.

I did some more analyzing, and unfortunately the answer is "it's complicated". I compared module sizes for every .o in the GHC build tree between the two versions, and I'm getting mixed results. Some modules get significantly smaller when disabling the state hack, on the order of 10-50%, with some outliers as high as 70%. On the other end of the spectrum, there are lots of modules that blow up by several megabytes per module, percentages typically somewhere between 25% and 80%. There is also a large number of modules where the difference is small or zero.

The differences aren't isolated in libraries; in all 3 categories, modules are mostly from /libraries, /bootstrapping, and /compiler. So it's definitely not just the libraries that blow up; it seems that GHC actually produces larger modules, but not always. I have not managed to find a pattern to the increase yet.

So here's the 30 modules with the biggest shrinkage:

diff       diff% hack     no hack   file
-598704      -4% 14142232 13543528 compiler/stage2/build/HsInstances.o
-232504     -36%  637000  404496 bootstrapping/Distribution/Simple/Utils.o
-222928     -34%  638856  415928 libraries/Cabal/Cabal/dist-boot/build/Distribution/Simple/Utils.o
-179816     -17% 1020496  840680 libraries/Cabal/Cabal/dist-boot/build/Distribution/PackageDescription/Check.o
-177744     -17% 1015896  838152 bootstrapping/Distribution/PackageDescription/Check.o
-144352     -24%  595592  451240 libraries/Cabal/Cabal/dist-boot/build/Distribution/Simple/Haddock.o
-142432     -14%  968752  826320 bootstrapping/Distribution/Simple/Configure.o
-137632     -14%  973336  835704 libraries/Cabal/Cabal/dist-boot/build/Distribution/Simple/Configure.o
-136384     -23%  583088  446704 bootstrapping/Distribution/Simple/Haddock.o
-131840     -13%  952160  820320 bootstrapping/Distribution/Simple/GHC.o
-125008     -13%  955352  830344 libraries/Cabal/Cabal/dist-boot/build/Distribution/Simple/GHC.o
-117728     -27%  422008  304280 bootstrapping/Distribution/Simple/Build.o
-116264     -27%  424320  308056 libraries/Cabal/Cabal/dist-boot/build/Distribution/Simple/Build.o
-108816     -18%  579808  470992 bootstrapping/Distribution/Simple/GHCJS.o
-103808     -11%  884696  780888 utils/ghc-pkg/dist/build/Main.o
-103280     -17%  581736  478456 libraries/Cabal/Cabal/dist-boot/build/Distribution/Simple/GHCJS.o
 -98288     -18%  532488  434200 utils/genprimopcode/dist/build/Main.o
 -92480     -16%  570120  477640 bootstrapping/Distribution/Simple.o
 -85024     -14%  573280  488256 libraries/Cabal/Cabal/dist-boot/build/Distribution/Simple.o
 -83632     -19%  433448  349816 utils/hsc2hs/dist/build/CrossCodegen.o
 -82520     -29%  281280  198760 bootstrapping/Distribution/Simple/SrcDist.o
 -81808     -28%  282832  201024 libraries/Cabal/Cabal/dist-boot/build/Distribution/Simple/SrcDist.o
 -78920     -55%  141696   62776 bootstrapping/Data/Text/Internal/Lazy/Encoding/Fusion.o
 -77768     -43%  180664  102896 bootstrapping/Data/Text/Encoding.o
 -72824     -71%  102400   29576 bootstrapping/Data/Text/Internal.o
 -69456     -49%  140888   71432 bootstrapping/System/FilePath/Posix.o
 -66136     -39%  168512  102376 bootstrapping/Distribution/PackageDescription/Quirks.o
 -61280     -15%  392096  330816 libraries/Cabal/Cabal/dist-boot/build/Distribution/Simple/PreProcess.o
 -59848     -35%  167816  107968 libraries/Cabal/Cabal/dist-boot/build/Distribution/PackageDescription/Quirks.o
 -58456     -15%  385944  327488 bootstrapping/Distribution/Simple/PreProcess.o

And the 30 worst offenders:

diff       diff% hack     no hack   file
 242872     +42%  330888  573760 libraries/Cabal/Cabal/dist-boot/build/Distribution/PackageDescription/Parsec.o
 252096     +18% 1130080 1382176 libraries/Cabal/Cabal/dist-boot/build/Language/Haskell/Extension.o
 276872      +9% 2649696 2926568 libraries/containers/dist-install/build/Data/Sequence/Internal.o
 282112     +50%  280448  562560 compiler/stage2/build/TcBackpack.o
 286208     +21% 1039800 1326008 bootstrapping/Language/Haskell/Extension.o
 293664     +43%  387048  680712 bootstrapping/Distribution/Types/GenericPackageDescription.o
 295776     +42%  400880  696656 libraries/Cabal/Cabal/dist-boot/build/Distribution/Types/GenericPackageDescription.o
 308792     +40%  462576  771368 libraries/template-haskell/dist-boot/build/Language/Haskell/TH/Ppr.o
 320800     +28%  786912 1107712 compiler/stage2/build/PPC/CodeGen.o
 326400     +38%  520592  846992 libraries/text/dist-install/build/Data/Text/Lazy/Builder/RealFloat.o
 350016     +32%  719512 1069528 compiler/stage2/build/PlatformConstants.o
 410272      +2% 14146704 14556976 libraries/base/dist-install/build/HSbase-4.12.0.0.o
 413960     +35%  748952 1162912 libraries/Cabal/Cabal/dist-install/build/Distribution/Simple/Haddock.o
 430824     +27% 1157400 1588224 compiler/stage2/build/X86/CodeGen.o
 450928     +29% 1078688 1529616 compiler/stage2/build/TcRnDriver.o
 530216    +100%       0  530216 libraries/xhtml/dist-install/build/HSxhtml-3000.2.2.o
 563744     +77%  166952  730696 bootstrapping/Distribution/Types/InstalledPackageInfo/FieldGrammar.o
 581784     +77%  170552  752336 libraries/Cabal/Cabal/dist-boot/build/Distribution/Types/InstalledPackageInfo/FieldGrammar.o
 583760     +44%  731800 1315560 libraries/ghci/dist-boot/build/GHCi/Message.o
 593424     +12% 4286120 4879544 libraries/containers/dist-install/build/HScontainers-0.5.11.0.o
 614096     +16% 3213320 3827416 libraries/text/dist-install/build/HStext-1.2.3.0.o
 747024     +41% 1063368 1810392 libraries/ghci/dist-boot/build/GHCi/TH/Binary.o
 952864     +27% 2503640 3456504 libraries/Cabal/Cabal/dist-boot/build/Distribution/SPDX/LicenseId.o
 992784     +29% 2330808 3323592 bootstrapping/Distribution/SPDX/LicenseId.o
1004432     +43% 1328968 2333400 libraries/Cabal/Cabal/dist-install/build/Distribution/Simple/GHC.o
1032072     +72%  382160 1414232 bootstrapping/Distribution/PackageDescription/FieldGrammar.o
1035032     +72%  392264 1427296 libraries/Cabal/Cabal/dist-boot/build/Distribution/PackageDescription/FieldGrammar.o
1212056     +29% 2905776 4117832 libraries/template-haskell/dist-boot/build/Language/Haskell/TH/Syntax.o
2236208     +62% 1317096 3553304 libraries/Cabal/Cabal/dist-install/build/Distribution/Simple/Configure.o
4569096     +10% 38406248 42975344 libraries/Cabal/Cabal/dist-install/build/HSCabal-2.3.0.0.o

I'll attach the full comparison log.

Changed 15 months ago by tdammers

Attachment: module-size-comparison added

Comparing module sizes in a GHC build tree

comment:48 Changed 15 months ago by tdammers

Oh dear. It turns out my bash script that I used to automatically build the required GHC's and run nofib had a bug that caused it to silently ignore mistyped build flavours, so instead of "GHC compiled with -fno-state-hack", I got "GHC compiled with the default flavour". Which explains the whopping difference: where I thought I was comparing "GHC compiled with -prof -fno-state-hack vs. GHC compiled with -prof", I was actually comparing "GHC release build vs. GHC prof build".

Fortunately, the nofib flags were correct, so the part about the effect of -fno-state-hack on the nofib suite remains valid; only those results that involve "GHC compiled with -fno-state-hack" are affected. I'll re-run the tests for "GHC compiled with -fno-state-hack", and I'll also throw in "Boot libraries compiled with -fno-state-hack, GHC compiled normally".

Last edited 15 months ago by tdammers (previous) (diff)

Changed 15 months ago by tdammers

Attachment: nofib-gn-gnl.analyze.log added

nofib analyse output: effect of NSH on libraries only vs. NSH on all of GHC

Changed 15 months ago by tdammers

Attachment: nofib-ns-nn.analyze.log added

nofib analyse output: nofib compiled with NSH vs. normally, on a vanilla GHC build

Changed 15 months ago by tdammers

Attachment: nofib-ns-nn-gnl.analyze.log added

nofib analyse output: nofib compiled with NSH vs. normally, on a GHC build with NSH libraries

comment:49 Changed 15 months ago by tdammers

New nofib results.

  • nofib-gn-gnl: GHC and boot libraries compiled with -fno-state-hack compiling nofib normally, vs. GHC compiled normally and boot libraries compiled with -fno-state-hack compiling nofib normally. This measures the effect of compiling GHC itself with -fno-state-hack.
  • nofib-ns-nn: GHC compiled normally compiling nofib normally vs. GHC compiled normally compiling nofib with -fno-state-hack. This measures the effect of the state hack on nofib when using a vanilla compiler.
  • nofib-ns-nn-gnl: GHC compiled normally with boot libraries compiled with -fno-state-hack compiling nofib normally vs. GHC compiled normally with boot libraries compiled with -fno-state-hack compiling nofib with -fno-state-hack. This measures the effect of the state hack on nofib when using a vanilla compiler with state-hack-free boot libraries.

Key observations:

  • Differences are now much closer to what we expect, specifically, whether we compile GHC and / or boot libraries with or without the state hack does not significantly change nofib performance (< 1%).
  • Compiling GHC itself and boot libraries without the state hack makes nofib compilation slightly slower than compiling only boot libraries without the state hack (~4%).
  • Compiling nofib without the state hack makes compilation significantly faster (~25%) than compiling nofib with the state hack, regardless of how GHC itself and the boot libraries were compiled.

comment:50 Changed 9 months ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of some high and highest priority tickets.

comment:51 Changed 8 months ago by simonpj

Keywords: Simplifier added
Note: See TracTickets for help on using tickets.