Opened 3 years ago

Last modified 19 months ago

#12506 new bug

Compile time regression in GHC 8.

Reported by: deech Owned by: bgamari
Priority: high Milestone: 8.4.1
Component: Compiler Version: 8.0.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by deech)

This bug is entirely reproducible but unfortunately takes some setup. I don't have a minimal example.

The regression is happening in FLTKHS (https://github.com/deech/fltkhs), the Haskell bindings to the FLTK(http://fltk.org) toolkit, and in particular to this GUI (https://github.com/deech/fltkhs/blob/master/images/tree-complex-windows.png). In GHC 7.8, it taks 5-9 seconds to compile and link in 7.10.3, that goes up to 12-15 seconds and finally in GHC 8 it shoots up to 45 seconds. While these numbers come from my i7, 16GB RAM Linux laptop, the compile time regression is proportionally reproducible across machines and operating systems.

Here are the steps to reproduce it

  • Install FLTK Please follow the instructions here (https://github.com/deech/fltkhs#linux-and-bsd) but only the first set which is about installing FLTK 1.3.
  • Clone the repo with the offending application: git clone https://github.com/deech/fltkhs-fluid-demos
  • Build the application and reproduce the problem:
    • cabal install c2hs
    • cabal install fltkhs
    • At the root of the repo do cabal build fltkhs-fluid-tree

Attachments (2)

test.hs (4.1 KB) - added by tdammers 19 months ago.
Test case to reproduce
test1000.hs (11.9 KB) - added by tdammers 19 months ago.
The actual test input used when comparing 7.10.3 against HEAD (8.4.x)

Download all attachments as: .zip

Change History (34)

comment:1 Changed 3 years ago by deech

Description: modified (diff)

comment:2 Changed 3 years ago by simonpj

Type of failure: None/UnknownCompile-time performance bug

comment:3 Changed 3 years ago by bgamari

Owner: set to bgamari
Priority: normalhigh

I'll be adding this one to my queue. Thanks deech.

comment:4 Changed 3 years ago by deech

Description: modified (diff)

comment:5 Changed 3 years ago by deech

Description: modified (diff)

comment:6 in reply to:  3 Changed 3 years ago by deech

Replying to bgamari:

I'll be adding this one to my queue. Thanks deech.

Great. Let me know if I can help.

comment:7 Changed 3 years ago by deech

Description: modified (diff)

comment:8 Changed 3 years ago by bgamari

The issue here seems to be the Tree module. With -O1 compilation takes four minutes with much of the time being spent in the simplifier.

However, deech said that his real concern was the compilation time with -O0. The culprit here appears to be desugaring, which takes over 10 seconds of the 30 second total compilation time (with a profiled compiler).

Looking into the profile, it seems that the culprit is simpleOptPgm. In particular, it appears that the majority of desugaring time is spent in optCoercion called from the Cast branch of simple_opt_expr.go.

comment:9 Changed 3 years ago by bgamari

Looking at the desugared Core of Tree (which is 2.4 gigabytes!) it seems we end up with lots of equivalent yet repeated equality coercions at the top level. For instance, there are 64 coercions of the form,

-- RHS size: {terms: 2, types: 22, coercions: 691}
$d~_aDMR
  :: (Match
        (Graphics.UI.FLTK.LowLevel.Hierarchy.CMenuItem Base) :: *)
     ~
     (FindOp MenuItem MenuItem (SetLabelsize ()) :: *)

However, it turns out that this is a drop in the bucket compared to,

-- RHS size: {terms: 8,634, types: 10,620, coercions: 1,621,550}
make_window :: IO (Ref Window)
Last edited 3 years ago by bgamari (previous) (diff)

comment:10 Changed 3 years ago by bgamari

I clarify that I suspect the large term counts here are really the result of the library itself making heavy use of type functions, essentially making this yet another manfestation of #8095. The high desugaring times observed in compiling tree are due to the fact that we run simpleOptPgm during desugaring.

I didn't have a chance to look at compilation time regressed from 7.8 to 7.10 to 8.0. When I get a chance the best next step will likely to focus on the 7.10 to 8.0 regression (since it is both newer (and therefore easier to debug) as well as larger). First verify that the desugared Core is equivalent, then look at compiler profiles to see if the compiler implementation has regressed.

Last edited 3 years ago by simonpj (previous) (diff)

comment:11 Changed 2 years ago by dfeuer

Based on the date, it appears this ticket likely refers to fltkhs-0.4.0.9 (commit 60ace4c5e7743f5a24ac959fba877aeb27d80618) from June 25, 2016.

comment:12 Changed 2 years ago by deech

This is still reproducible from the latest FLTKHS with GHC 8.0.2.

comment:13 Changed 2 years ago by dfeuer

It appears that the absurd number of coercions is present in 7.10 as well as 8.0. So we'll need to look at something else to find the perf difference between those versions.

comment:14 Changed 19 months ago by tdammers

I tried reproducing the problem with GHC 8.2.1, but unfortunately the build instructions no longer work with Cabal 2.0.x. It seems that fltkhs doesn't support Cabal 2.0, and [this github issue](https://github.com/deech/fltkhs/issues/38) tells me this might not get fixed anytime soon.

Is there an up-to-date way of reproducing the issue that I could use? Preferably one that doesn't involve stack, simply because I want to hook a custom-built GHC directly from git into the build so I can play with some things and cherry-pick selected changes, and this isn't trivial with stack.

comment:15 in reply to:  14 Changed 19 months ago by deech

Replying to tdammers:

I tried reproducing the problem with GHC 8.2.1, but unfortunately the build instructions no longer work with Cabal 2.0.x. It seems that fltkhs doesn't support Cabal 2.0, and [this github issue](https://github.com/deech/fltkhs/issues/38) tells me this might not get fixed anytime soon.

Is there an up-to-date way of reproducing the issue that I could use? Preferably one that doesn't involve stack, simply because I want to hook a custom-built GHC directly from git into the build so I can play with some things and cherry-pick selected changes, and this isn't trivial with stack.

I should have closed that issue, the master branch and the latest stable release should work with Cabal 2.0.0.2 and GHC 8.2.1.

I'm happy to help you through it.

Last edited 19 months ago by deech (previous) (diff)

comment:16 Changed 19 months ago by deech

I've put together a reduced example that has the same issues. It's much easier to work with. In putting this together I noticed there is a vast difference in compilation time and memory between GHC 8.02 and GHC 8.22; the former takes 45 seconds and uses 1.4GB RAM to compile, the latter, 10 seconds and 200 MB so that is a great improvement.

However while compilation speed is much improved between 8.0.2 and 8.2.2 that still only brings it up to par with GHC 7.10 which is GHC version against which I reported the issue in the first place so the problem still exists.

Hopefully this reduced example helps clarify the underlying issue.

comment:17 Changed 19 months ago by deech

I've added a rough comparison table showing compile times and memory usage going back to 7.8.4.

comment:18 Changed 19 months ago by bgamari

Thanks deech! We can try to have someone look at this soon.

comment:19 in reply to:  16 Changed 19 months ago by tdammers

Replying to deech:

I've put together a reduced example that has the same issues. It's much easier to work with. In putting this together I noticed there is a vast difference in compilation time and memory between GHC 8.02 and GHC 8.22; the former takes 45 seconds and uses 1.4GB RAM to compile, the latter, 10 seconds and 200 MB so that is a great improvement.

However while compilation speed is much improved between 8.0.2 and 8.2.2 that still only brings it up to par with GHC 7.10 which is GHC version against which I reported the issue in the first place so the problem still exists.

Hopefully this reduced example helps clarify the underlying issue.

Great, that makes things so much easier! Thanks a lot, I'll look into it.

comment:20 Changed 19 months ago by tdammers

OK, got it compiled against GHC head. The output of -ddump-timings shows two obvious suspects:

[5 of 5] Compiling Main             ( app/Main.hs, app/Main.o )
*** Parser [Main]:
Parser [Main]: alloc=14721280 time=9.927
*** Renamer/typechecker [Main]:
Renamer/typechecker [Main]: alloc=585898184 time=956.617
*** Desugar [Main]:
Desugar [Main]: alloc=76966472 time=214.242
*** Simplifier [Main]:
Simplifier [Main]: alloc=3469053424 time=2430.489
*** CoreTidy [Main]:
CoreTidy [Main]: alloc=220130032 time=371.644
*** CorePrep [Main]:
CorePrep [Main]: alloc=7744 time=0.030
*** CodeGen [Main]:
CodeGen [Main]: alloc=3436008728 time=4730.477
Linking app/Main ...

In other words, while compiling Main, we get very long timings on the Simplifier and CodeGen steps. I suspect that the latter is caused by the former.

comment:21 Changed 19 months ago by tdammers

It turns out that Main.hs produces about 2.8G of Core. This part seems to be the culprit:

-- RHS size: {terms: 6,002,
              types: 8,003,
              coercions: 1,415,000,
              joins: 0/0}
main :: IO ()

comment:22 Changed 19 months ago by simonpj

That's one big coercion! Can you distil a standalone test case?

comment:23 Changed 19 months ago by tdammers

I tried, however concatenating the relevant modules into one reduces the impact a bit. I'll attach a reduced version anyway, where we still get this:

-- RHS size: {terms: 8, types: 11, coercions: 1,424, joins: 0/0}
main :: IO ()

Changed 19 months ago by tdammers

Attachment: test.hs added

Test case to reproduce

comment:24 Changed 19 months ago by tdammers

Copying the last line a thousand times or so over increases the number of coercions to similar levels as the original one, but this simpler cases makes the dumped Core easier to handle.

comment:25 Changed 19 months ago by tdammers

  • Make sure this is truly a regression
  • Figure out whether the huge coercion is actually the cause
  • Bisect?

comment:26 Changed 19 months ago by bgamari

Milestone: 8.4.1

For the record, this is another case which would stand to be improved by the coercion dropping proposed in #8095.

comment:27 Changed 19 months ago by tdammers

I can confirm that this *is* a regression; I'm seeing ~34 seconds overall compilation time on 7.10.3, vs. ~341 seconds on GHC HEAD.

Changed 19 months ago by tdammers

Attachment: test1000.hs added

The actual test input used when comparing 7.10.3 against HEAD (8.4.x)

comment:28 Changed 19 months ago by simonpj

I can confirm that this *is* a regression; I'm seeing ~34 seconds overall compilation time on 7.10.3, vs. ~341 seconds on GHC HEAD.

Aha! Excellent. Does that show on (some version of) the repro case test.hs?

PS: maybe comment:27 answers that affirmatively; I missed that.

Last edited 19 months ago by simonpj (previous) (diff)

comment:29 in reply to:  28 Changed 19 months ago by tdammers

Replying to simonpj:

I can confirm that this *is* a regression; I'm seeing ~34 seconds overall compilation time on 7.10.3, vs. ~341 seconds on GHC HEAD.

Aha! Excellent. Does that show on (some version of) the repro case test.hs?

PS: maybe comment:27 answers that affirmatively; I missed that.

Yes on both; test1000.hs is pretty much an exact replica of the code in the original test case, I really just unrolled the imports to get it into one file, and played with the number of repetitions in main.

comment:30 Changed 19 months ago by tdammers

The problem persists with D4394 applied, so the idea that that particular nonlinearity in the coercion handling has anything to do with it doesn't hold.

comment:31 Changed 19 months ago by simonpj

The problem persists with D4394 applied

I was not expecting that D4394 would help. The problem here is that the coercions are too big. The encouraging news is that 7.10.3 didn't have that problem, so it's an existence proof that we don't need big coercions.

comment:32 Changed 19 months ago by simonpj

I've had a look. I don't think it's fruitful for you to investigate further, Tobias, until I or Richard have had a look at the constraints generated etc.

Note: See TracTickets for help on using tickets.