Opened 18 months ago
Last modified 17 months ago
#14944 new bug
Compile speed regression
Reported by: | br1 | Owned by: | dfeuer |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Compiler | Version: | 8.4.1 |
Keywords: | Cc: | ||
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Compile-time performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description (last modified by )
Compile time with -O went from 5 seconds in 7.6.1 to 12 seconds in 8.2.2 and then down to 10 seconds in 8.4.1.
Attachments (8)
Change History (24)
Changed 18 months ago by
Changed 18 months ago by
comment:1 Changed 18 months ago by
Description: | modified (diff) |
---|
comment:2 Changed 18 months ago by
Owner: | set to dfeuer |
---|
comment:3 Changed 18 months ago by
The reason for the big version gap seems likely to be the fact that the code didn't compile at all under 7.8.4 or 7.10.3 without extra help. Here are the times I get:
7.8.4
This version needed -fcontext-stack=197
to compile.
<<ghc: 2588144288 bytes, 592 GCs, 25821661/80891272 avg/max bytes residency (13 samples), 193M in use, 0.00 INIT (0.00 elapsed), 2.29 MUT (2.62 elapsed), 1.11 GC (1.19 elapsed) :ghc>>
7.10.3
This version needed -fcontext-stack=197
to compile.
<<ghc: 5430939912 bytes, 1222 GCs, 39907279/99169560 avg/max bytes residency (14 samples), 271M in use, 0.001 INIT (0.001 elapsed), 5.319 MUT (5.672 elapsed), 1.918 GC (1.999 elapsed) :ghc>>
8.0.2
<<ghc: 9858859496 bytes, 2263 GCs, 37541990/110403672 avg/max bytes residency (13 samples), 291M in use, 0.000 INIT (0.001 elapsed), 7.649 MUT (8.086 elapsed), 1.850 GC (1.959 elapsed) :ghc>>
8.2.2
<<ghc: 5344419472 bytes, 543 GCs, 33955491/118029880 avg/max bytes residency (14 samples), 265M in use, 0.000 INIT (0.001 elapsed), 5.230 MUT (5.678 elapsed), 1.589 GC (1.674 elapsed) :ghc>>
8.4.1
<<ghc: 5160053968 bytes, 577 GCs, 30157126/79839112 avg/max bytes residency (14 samples), 207M in use, 0.000 INIT (0.005 elapsed), 4.415 MUT (4.930 elapsed), 1.509 GC (1.654 elapsed) :ghc>>
So there was a major regression from 7.8.4 to 7.10.3, another large one from 7.10.3 to 8.0.2, a large improvement from 8.0.2 to 8.2.2, and a smaller improvement from 8.2.2 to 8.4.1. I can verify that we're still running considerably slower, and allocating considerably more, than 7.8.4. FYI, while we no longer need -fcontext-stack
, that's because the default limit has been raised or removed. It seems versions 8.0.2 through 8.4.1 need a reduction depth of 199.
comment:6 Changed 18 months ago by
Yes, the size of the program changes quite a lot. The most notable thing about this program is the sheer number of types, which is enormous in every case, but varies quite a lot. In each case I give the result of CorePrep
from Paper
(which always compiles reasonably quickly) along with pertinent-looking information about Main
(which takes a while).
7.8.4
Paper:
{terms: 1,719, types: 4,049, coercions: 768}
Main:
Result size of Desugar (after optimization) = {terms: 3,973, types: 630,023, coercions: 196} ... Result size of Simplifier = {terms: 3,384, types: 628,456, coercions: 5} Result size of Specialise = {terms: 4,785, types: 865,456, coercions: 5} Result size of Float out(FOS {Lam = Just 0, Consts = True, PAPs = False}) = {terms: 6,373, types: 1,100,676, coercions: 5} Result size of Simplifier iteration=1 = {terms: 8,674, types: 1,377,245, coercions: 211,797} Result size of Simplifier iteration=2 = {terms: 4,051, types: 730,227, coercions: 84,460} -- things go up and down a few times Result size of Tidy Core = {terms: 5,157, types: 1,104,496, coercions: 84,451}
7.10.3
Paper:
Result size of CorePrep = {terms: 1,736, types: 3,973, coercions: 778}
Main:
Result size of Desugar (after optimization) = {terms: 3,973, types: 630,023, coercions: 196} Result size of Specialise = {terms: 12,088, types: 2,092,764, coercions: 392} *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}): Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) = {terms: 13,674, types: 2,327,982, coercions: 392} Result size of Simplifier iteration=1 = {terms: 14,824, types: 2,416,989, coercions: 371,022} Result size of Simplifier iteration=2 = {terms: 10,101, types: 2,096,313, coercions: 321,082} Result size of Simplifier iteration=1 = {terms: 12,295, types: 2,650,706, coercions: 321,081} ... nothing much changes for a long time -- Sudden size drop in core tidy Result size of Tidy Core = {terms: 5,146, types: 1,104,452, coercions: 84,448}
This comment is getting long; I'll continue in the next one.
comment:7 Changed 18 months ago by
8.0.2
Paper:
Result size of CorePrep = {terms: 2,704, types: 5,400, coercions: 996}
Main:
Result size of Desugar (after optimization) = {terms: 4,574, types: 632,608, coercions: 196} Result size of Specialise = {terms: 13,274, types: 2,100,628, coercions: 392} Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) = {terms: 14,860, types: 2,335,846, coercions: 392} Result size of Simplifier iteration=1 = {terms: 15,614, types: 2,423,788, coercions: 366,704} Result size of Simplifier iteration=2 = {terms: 10,121, types: 2,098,814, coercions: 322,579} .... Result size of Simplifier = {terms: 12,570, types: 2,807,500, coercions: 320,217} Result size of Tidy Core = {terms: 5,208, types: 1,104,414, coercions: 84,359}
8.2.2
Paper:
Result size of CorePrep = {terms: 3,280, types: 5,538, coercions: 1,000, joins: 2/50}
Main:
Result size of Desugar (after optimization) = {terms: 4,564, types: 632,574, coercions: 196, joins: 0/982} Result size of Simplifier iteration=1 = {terms: 6,858, types: 1,023,896, coercions: 86,509, joins: 0/786} Result size of Simplifier iteration=2 = {terms: 4,014, types: 726,796, coercions: 85,137, joins: 1/2} ... Result size of Specialise = {terms: 9,329, types: 1,646,724, coercions: 168,346, joins: 2/2} *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) [Main]: Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) = {terms: 10,823, types: 1,939,926, coercions: 168,346, joins: 1/2} Result size of Simplifier iteration=1 = {terms: 9,648, types: 1,706,501, coercions: 227,811, joins: 2/2} Result size of Simplifier iteration=2 = {terms: 9,349, types: 1,647,213, coercions: 168,353, joins: 2/2} ... Result size of Tidy Core = {terms: 5,212, types: 1,104,424, coercions: 84,359, joins: 2/2}
comment:8 Changed 18 months ago by
8.4.1
Paper:
Result size of CorePrep = {terms: 3,228, types: 5,423, coercions: 1,008, joins: 3/50}
Main:
Result size of Desugar (after optimization) = {terms: 4,564, types: 632,574, coercions: 196, joins: 0/982} Result size of Simplifier iteration=1 = {terms: 6,858, types: 1,023,896, coercions: 86,509, joins: 0/786} Result size of Simplifier iteration=2 = {terms: 4,014, types: 726,796, coercions: 85,137, joins: 1/2} Result size of Specialise = {terms: 9,329, types: 1,646,724, coercions: 168,346, joins: 2/2} Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) = {terms: 10,431, types: 1,862,310, coercions: 168,346, joins: 1/2} Result size of Simplifier iteration=1 = {terms: 9,648, types: 1,706,501, coercions: 227,811, joins: 2/2} Result size of Simplifier iteration=2 = {terms: 9,349, types: 1,647,213, coercions: 168,353, joins: 2/2} ... Result size of Simplifier = {terms: 9,334, types: 1,647,199, coercions: 168,352, joins: 2/2} -- What happened here?????? Result size of Simplifier iteration=1 = {terms: 110,206, types: 1,818,532, coercions: 379,933, joins: 593/20,293} Result size of Simplifier = {terms: 8,914, types: 1,272,932, coercions: 85,550, joins: 5/203} .... Result size of Demand analysis = {terms: 8,323, types: 1,272,755, coercions: 85,550, joins: 5/6} !!! Demand analysis [Main]: finished in 200.01 milliseconds, allocated 115.130 megabytes *** CoreTidy [Main]: Result size of Tidy Core = {terms: 1,867, types: 166,695, coercions: 1,988, joins: 5/6}
comment:9 Changed 18 months ago by
Thanks for this data. Several interesting things here.
- There's a big jump in compilation cost between 7.8 and 7.10. (Like a factor of 2 in compiler allocation, according to comment:3. According to comment:6, the result of Specialise is much bigger in 7.10 than in 7.8. This extra code appears to be discarded at the end, but it persists through the compilation pipeline. Questions: what are the extra specialisations? Do we need them? Why are they eventually discarded? Could we discard them earlier?
- As you say in comment:8 it seems that GHC 8.4 has a new, and huge (temporary) blow-up in porgram size. It'd be really good to work out what this is. (You'll presumably need to cut down the program size to make this tractable.)
comment:10 Changed 18 months ago by
In case this helps any, the simplifier runs iteration 1,2,1,2,3,1,2,3, and then the next iteration (1) is the one that blows up the term count in 8.4.1. It appears that the blow-up is non-linear. When I increase the number of lines in the Main.hs
source, the ratio of terms in that iteration to terms in the previous one increases. Similarly, the ratio of time and allocation in that simplifier run compared to the previous one increases. I've tried looking at a very cut-down version with -ddump-simpl-iterations
, but there are still an awful lot of terms to look at. I'll keep at it.
comment:11 Changed 17 months ago by
Running with a development build of either 8.2.2 or 8.4ish (I haven't checked others yet) turns up an interesting warning when compiling Paper.hs
: the simplifier bails out after 4 iterations. So it seems at least possible that whatever goes wrong in Main.hs
may start in Paper.hs
, if we end up with incompletely simplified unfoldings. One problem I'm facing is that this source code is ... not very simple. It's full of rather involved constraint-based programming that's presumably documented in a paper somewhere.
comment:12 Changed 17 months ago by
@dfeuer Here's the paper https://www.fing.edu.uy/~mviera/papers/pepm13.pdf
comment:13 Changed 17 months ago by
Simon suggests -dshow-passes
comparing iterations in different GHC versions.
Changed 17 months ago by
Attachment: | passes-7.8 added |
---|
ghc -O -dshow-passes -fforce-recomp -fcontext-stack=400 Main.hs
Changed 17 months ago by
Attachment: | passes-7.10 added |
---|
Changed 17 months ago by
Attachment: | passes-8.0 added |
---|
Changed 17 months ago by
Attachment: | passes-8.2 added |
---|
Changed 17 months ago by
Attachment: | passes-8.4 added |
---|
comment:15 Changed 17 months ago by
I'm going to attach -v
results too, so we can look at how many simplifier passes were used. Is there a way to find out when the simplifier bailed out without a debug build? I don't really know how to get a debug build of 7.8. Building 7.8 with 7.8 didn't work, and I don't have 7.6 lying around.
comment:16 Changed 17 months ago by
In 7.8, Main starts compiling like this (from -dshow-passes):
Result size of Desugar (after optimization) = {terms: 3,973, types: 630,023, coercions: 196} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 3,384, types: 628,452, coercions: 5} Result size of Simplifier iteration=2 = {terms: 3,386, types: 628,458, coercions: 5} Result size of Simplifier iteration=3 = {terms: 3,384, types: 628,456, coercions: 5} Result size of Simplifier = {terms: 3,384, types: 628,456, coercions: 5} *** Specialise: Result size of Specialise = {terms: 4,785, types: 865,456, coercions: 5}
In 7.10, on the other hand, it starts like this:
Result size of Desugar (after optimization) = {terms: 3,973, types: 630,023, coercions: 196} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 4,362, types: 630,802, coercions: 196} Result size of Simplifier = {terms: 4,362, types: 630,802, coercions: 196} *** Specialise: Result size of Specialise = {terms: 12,088, types: 2,092,764, coercions: 392}
So 7.8 immediately tossed a slew of coercions in the first simplifier iteration, and the simplification otherwise kept sizes pretty much the same. In 7.10, we only do one round of simplification, the coercion count doesn't go down, the term count goes up a bit, and then specialization goes wild. No, I don't know what any of this really means.
David will characterise this.