Opened 3 years ago

Closed 3 years ago

#12234 closed bug (fixed)

'deriving Eq' on recursive datatype makes ghc eat a lot of CPU and RAM

Reported by: slyfox Owned by:
Priority: normal Milestone: 8.2.1
Component: Compiler Version: 8.0.1
Keywords: deriving-perf Cc: RyanGlScott, niteria
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: perf/should_compile/T12234
Blocked By: Blocking:
Related Tickets: #13056, #13081 Differential Rev(s):
Wiki Page:

Description (last modified by ezyang)

The example is slimmed down unit test of Annotations-0.2.1 hackage package.

If we try to compile Bug.hs with -O0 it compiles quickly. Trying it with -O1 makes GHC-8.0.1 takes a minute to finish.

-- Bug1.hs:
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE UndecidableInstances #-}

module Bug () where

import Prelude (Eq)

data ExprF rT = ExprF rT rT deriving Eq

newtype Expr   = Expr (Fix ExprF) deriving Eq
newtype Fix fT = In (fT (Fix fT))

deriving instance Eq (f (Fix f)) => Eq (Fix f)
$ time ghc-8.0.1 -c -O0 Bug1.hs -fforce-recomp
real    0m0.611s
user    0m0.549s
sys     0m0.053s

$ time ghc-8.0.1 -c -O1 Bug1.hs -fforce-recomp
real    1m2.199s
user    1m1.676s
sys     0m0.465s

7.10.2 for comparison is very quick in both O0/O1:

$ time ghc-7.10.2 -c -O0 Bug1.hs -fforce-recomp
real    0m0.220s
user    0m0.183s
sys     0m0.036s

$ time ghc-7.10.2 -c -O1 Bug1.hs -fforce-recomp
real    0m0.237s
user    0m0.213s
sys     0m0.023s

The real ExprF datatype uses more constructors and instances:

data ExprF rT
  =  Add  rT  rT
  |  Sub  rT  rT
  |  Mul  rT  rT
  |  Div  rT  rT
  |  Num  Int
  deriving (Eq, Show)

That requires a lot of time and space to finish (Bug2.hs in attach). I've stopped it after 5 minutes (took ~8GB RAM).

Attachments (2)

Bug1.hs (311 bytes) - added by slyfox 3 years ago.
Bug1.hs makes a minute to compile
Bug2.hs (387 bytes) - added by slyfox 3 years ago.
Bug2.hs takes more than 5 minutes to compile

Download all attachments as: .zip

Change History (20)

Changed 3 years ago by slyfox

Attachment: Bug1.hs added

Bug1.hs makes a minute to compile

Changed 3 years ago by slyfox

Attachment: Bug2.hs added

Bug2.hs takes more than 5 minutes to compile

comment:1 Changed 3 years ago by osa1

Nice find. Here is some stats:

=========================================Bug=========================================
CodeGen                                 19976.00 ms  40880.83 mb  62.9% of total time
Simplifier (7)                          10302.00 ms   9418.43 mb  32.4% of total time
CorePrep                                  762.00 ms    323.56 mb   2.4% of total time
CoreTidy                                  469.00 ms    274.51 mb   1.5% of total time
Common sub-expression                      87.00 ms     60.19 mb   0.3% of total time
Float inwards (2)                          81.00 ms     51.34 mb   0.3% of total time
Renamer/typechecker                        47.00 ms     44.12 mb   0.1% of total time
Demand analysis                            21.00 ms     19.45 mb   0.1% of total time
Worker Wrapper binds                        3.00 ms      1.89 mb   0.0% of total time
Parser                                      2.00 ms      0.50 mb   0.0% of total time
Specialise                                  0.00 ms      0.23 mb   0.0% of total time
Desugar                                     0.00 ms      0.65 mb   0.0% of total time
Called arity analysis                       0.00 ms      0.82 mb   0.0% of total time
-------------------------------------------------------------------------------------
Total                                   31750.00 ms  51076.53 mb 100.0% of total time

======================Total======================
CodeGen                                    62.92%
Simplifier                                 32.45%
CorePrep                                    2.40%
CoreTidy                                    1.48%
Common sub-expression                       0.27%
Float inwards                               0.26%
Renamer/typechecker                         0.15%
Demand analysis                             0.07%
Worker Wrapper binds                        0.01%
Parser                                      0.01%
Specialise                                  0.00%
Desugar                                     0.00%
Called arity analysis                       0.00%

Simplifier is running 7 times! More interestingly, I wanted to have a look at Core and STG, and tried -ddump-simpl. It took about 5 minutes until I killed the process and realized that generated incomplete Core output is 664M already

-rw-r--r--   1 omer users 664M Jun 26 14:18 Bug1.dump-simpl

comment:2 Changed 3 years ago by ezyang

Description: modified (diff)

comment:3 Changed 3 years ago by vladki

Just in case, removing one of the parameters in the ExprF constructor makes the problem go away:

data ExprF rT = ExprF rT deriving Eq

comment:4 Changed 3 years ago by RyanGlScott

Keywords: deriving-perf added

comment:5 Changed 3 years ago by simonpj

Type of failure: None/UnknownCompile-time performance bug

comment:6 Changed 3 years ago by RyanGlScott

Cc: RyanGlScott added

Yikes.

Interestingly, the way the Eq (Fix f) instance is constructed plays a large role in the slowdown. Since it's a newtype, GHC defaults to deriving that instance via GeneralizedNewtypeDeriving, i.e.,

instance Eq (f (Fix f)) => Eq (Fix f) where
  (==) = coerce ((==) :: f (Fix f) -> f (Fix f) -> Bool) :: Fix f -> Fix f -> Bool

But if you implement it the "stock" way, i.e.,

instance Eq (f (Fix f)) => Eq (Fix f) where
  In x == In y = x == y

Then it compiles instantly. So perhaps the coercion solver is to blame here?

comment:7 Changed 3 years ago by goldfire

The solver for Coercible is known to struggle with recursive newtypes, where coercibility is (I believe) undecidable. Maybe we should use "stock" for recursive newtypes?

comment:8 in reply to:  7 Changed 3 years ago by RyanGlScott

Cc: niteria added

Replying to goldfire:

Maybe we should use "stock" for recursive newtypes?

Maybe. But it bothers me that this used to work fine in 7.10.3. I did some git archaeology and discovered that this commit ( http://git.haskell.org/ghc.git/commit/9cb192ce4b34a472041010df9c30f5d741eb0261 ) is responsible for this regression, but I don't understand why a change to the digraph code led to this.

niteria, any thoughts as the author of the commit?

comment:9 Changed 3 years ago by niteria

Shot in the dark: does solving coercibility for recursive newtypes involve finding fixpoints?

I've run into one place in the code generator once where not returning the elements of a set in the Unique order would make it never find a fixpoint, because the insertion order would oscillate between 2 values.

Otherwise, maybe using Unique order for SCC had some nice properties in this case.

But under normal circumstances, I wouldn't expect my commit to cause that.

comment:10 Changed 3 years ago by niteria

I was able to reproduce it, and indeed my commit has some effect. I narrowed it down with https://phabricator.haskell.org/P127 to ordering inside occAnalRecBind. It appears that the simplifier is sensitive to this order in this case.

comment:11 Changed 3 years ago by niteria

Simplifier stats reveal significantly more Class op == rule rewrites, and passing -fno-enable-rewrite-rules fixes the issue.

comment:12 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In 517d03e/ghc:

Fix an asymptotic bug in the occurrence analyser

Trac #12425 and #12234 showed up a major and long-standing
bug in the occurrence analyser, whereby it could generate
explonentially large program!

There's a lot of commentary on #12425; and it's all described
in Note [Loop breakers, node scoring, and stability]

I did quite a lot of refactoring to make the code comprehensibe
again (its structure had bit-rotted rather), so the patch
looks bigger than it really is.

Hurrah!

I did a nofib run to check that I hadn't inadertently ruined
anything:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
          fluid          -0.3%     -1.5%      0.01      0.01     +0.0%
         parser          -0.9%     +0.6%      0.04      0.04     +0.0%
         prolog          -0.1%     +1.2%      0.00      0.00     +0.0%

--------------------------------------------------------------------------------
            Min          -0.9%     -1.5%     -8.6%     -8.7%     +0.0%
            Max          +0.1%     +1.2%     +7.7%     +7.8%     +2.4%
 Geometric Mean          -0.2%     -0.0%     -0.2%     -0.3%     +0.0%

I checked what happened in 'prolog'.  It seems that we have a
recursive data structure something like this

   f :: [blah]
   f x = build (\cn.  ...g...  )

   g :: [blah2]
   g y = ....(foldr k z (f y))....

If we inline 'f' into 'g' we get better fusion than the other
way round, but we don't have any way to spot that at the moment.
(I wonder if we could do worker/wrapper for functions returning
a 'build'?)  It was happening before by a fluke.

Anyway I decided to accept this; it's relatively rare I think.

comment:13 Changed 3 years ago by simonpj

Milestone: 8.0.3
Status: newmerge
Test Case: perf/should_compile/T12234

Fixed! Thanks for a great report.

comment:14 Changed 3 years ago by RyanGlScott

#13056 demonstrates another occurrence of this bug, this time involving Foldable. But unlike the program in this ticket, it doesn't require any GHC extensions like UndecidableInstances to trigger it.

comment:15 Changed 3 years ago by Ryan Scott <ryan.gl.scott@…>

In 5088110/ghc:

Add performance test for #13056

This performance regression was fixed by commit
517d03e41b4f5c144d1ad684539340421be2be2a (#12234). Let's add a performance test
to ensure that it doesn't break again.

comment:16 Changed 3 years ago by RyanGlScott

#13081 is yet another occurrence of this.

comment:17 Changed 3 years ago by bgamari

Milestone: 8.0.38.2.1

At this point it is rather unlikely that there will be an 8.0.3. Re-milestoning.

comment:18 Changed 3 years ago by bgamari

Resolution: fixed
Status: mergeclosed
Note: See TracTickets for help on using tickets.