Opened 2 years ago

Last modified 16 months ago

#13745 new task

Investigate compile-time regressions in regex-tdfa-1.2.2

Reported by: bgamari Owned by:
Priority: high Milestone: 8.2.3
Component: Compiler Version: 8.0.1
Keywords: Cc: asr, hvr, simonpj, michalt, dfeuer, ntc2, gbaz
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

hvr points out that GHC 8.2 takes significantly longer to compile regex-tdfa-1.2.2 than 8.0.2. Interestingly, it seems that compiler allocations fell dramatically while compilation time rose.

GHC 8.0.2:

<<ghc: 89544610040 bytes, 2301 GCs, 189183965/391103872 avg/max bytes
residency (29 samples), 1075M in use, 0.001 INIT (0.001 elapsed),
55.747 MUT (61.520 elapsed), 23.276 GC (23.278 elapsed) :ghc>>

GHC 8.2.1:

<<ghc: 115210990008 bytes, 6383 GCs, 116487828/234031864 avg/max bytes
residency (77 samples), 670M in use, 0.001 INIT (0.000 elapsed),
64.154 MUT (68.262 elapsed), 37.114 GC (37.077 elapsed) :ghc>>

Reproducing

$ cabal unpack regex-tdfa-1.2.2
$ cd regex-tdfa-1.2.2
$ cabal install --only-dependencies
$ cabal configure
$ cabal build
$ time ghc -idist/build/autogen -O2 Text/Regex/TDFA.hs \
   -XMagicHash -XFlexibleInstances -XMultiParamTypeClasses \
   -XRecursiveDo -XBangPatterns -XRankNTypes -XUnboxedTuples \
   -XFlexibleContexts -XUnliftedFFITypes

Attachments (3)

regex-tdfa-reduced.tar.gz (17.4 KB) - added by RyanGlScott 2 years ago.
pre-set-levels.gz (200.2 KB) - added by dfeuer 2 years ago.
Before setlevels patch
post-set-levels.bz2 (94.6 KB) - added by dfeuer 2 years ago.
After setlevels patch

Download all attachments as: .zip

Change History (25)

comment:1 Changed 2 years ago by asr

Cc: asr added

comment:2 Changed 2 years ago by hvr

Cc: hvr added

Changed 2 years ago by RyanGlScott

Attachment: regex-tdfa-reduced.tar.gz added

comment:3 Changed 2 years ago by RyanGlScott

I've attached a minimized version of regex-tdfa-1.2.2 with no external dependencies. Moreover, the slowdown in GHC 8.2 seems primarily concentrated at the *Engine files, so I only included what was necessary to support one of them. Here are the times that I got:

$ time /opt/ghc/8.0.2/bin/ghc -O2 -fforce-recomp TextRegexTDFANewDFAEngine_FA.hs
real    0m10.139s
user    0m9.996s
sys     0m0.116s

$ time /opt/ghc/8.2.1/bin/ghc -O2 -fforce-recomp TextRegexTDFANewDFAEngine_FA.hs
real    0m18.072s
user    0m17.912s
sys     0m0.156s

The TextRegexTDFANewDFAEngine_FA.hs file in particular is what is contributing to the difference in time, as all other modules take about the same time to compile on 8.0.2 and 8.2.1-rc2.

comment:4 Changed 2 years ago by RyanGlScott

Cc: simonpj added

A sizeable chunk of that increase in time came from commit 2effe18ab51d66474724d38b20e49cc1b8738f60 (The Early Inline Patch):

Commit 55efc9718b520ef354e32c15c4b49cdfecce412f (Combine identical case alternatives in CSE)
-----
real	0m10.682s
user	0m10.392s
sys	0m0.272s

Commit 2effe18ab51d66474724d38b20e49cc1b8738f60 (The Early Inline Patch)
-----
real	0m13.742s
user	0m13.524s
sys	0m0.204s

What's responsible for the other ~5 seconds remains to be investigated.

comment:5 Changed 2 years ago by RyanGlScott

Found it. The other culprit is commit 6eb52cfc2e31df2561860f43d41766464ccfe8af (Improve SetLevels for join points):

Commit 871b63e4ea95d4c516d31378d0475167e75caa01 (Improve pretty-printing of types)
-----
real    0m13.918s
user    0m13.708s
sys     0m0.192s

Commit 6eb52cfc2e31df2561860f43d41766464ccfe8af (Improve SetLevels for join points)
-----
real    0m19.008s
user    0m18.804s
sys     0m0.188s

comment:6 Changed 2 years ago by michalt

Cc: michalt added

comment:7 Changed 2 years ago by bgamari

Milestone: 8.2.18.2.2

This is certainly a significant regression, but I think it will need to wait until 8.2.2 at the earliest for resolution.

comment:8 Changed 2 years ago by dfeuer

Going from 8.0.2 to 8.2.1, things look very different right near the beginning, presumably as a result of the early inline patch. In 8.0.2, the first simplifier pass gives

*** Simplifier [TextRegexTDFANewDFAEngine_FA]:
Result size of Simplifier iteration=1
  = {terms: 6,220, types: 14,606, coercions: 1,563}

whereas in 8.2.1 it gives

*** Simplifier [TextRegexTDFANewDFAEngine_FA]:
Result size of Simplifier iteration=1
  = {terms: 17,641, types: 38,397, coercions: 10,258, joins: 30/233}

Things remain pretty bad until after float out, when they first get surprisingly good and then sort of oscillate. I'll take a look at the differences across the commits Ryan points out.

comment:9 Changed 2 years ago by dfeuer

In "Improve SetLevels for join points", everything is fine until liberate case. With the previous commit, this produces

34,318 terms, 54,571 types, 3,409 coercions, and 456/760 joins.

With the SetLevels commit, this goes up to

57,468 terms, 83,821 types, 3,439 coercions, and 856/1,310 joins.

It never makes up for that.

comment:10 Changed 2 years ago by bgamari

Hmm, it may be helpful to see what the additional bindings look like. This is when I would usually use my ghc-dump utility to dump a summary of the top-level bindings and their types. This should be easily diffable.

comment:11 in reply to:  10 Changed 2 years ago by dfeuer

Replying to bgamari:

Hmm, it may be helpful to see what the additional bindings look like. This is when I would usually use my ghc-dump utility to dump a summary of the top-level bindings and their types. This should be easily diffable.

Unfortunately, this seems ... challenging. I'm running the "before" version now. It's been going for many minutes, and is really chewing up disk space. In particular, TextRegexTDFANewDFAEngine_FA.pass-0017.cbor has already passed five gigabytes.

comment:12 Changed 2 years ago by dfeuer

I was able to get enough to compare the results of LiberateCase. Things generally look the same at the top level, but after the SetLevels patch there's a bunch more local code. Everything is rather huge, so it's hard to see just what the deal is, but interestingly it really seems to be added code more than replaced code. Virtually everything that used to be there is still there, but now there's more. I'll attach diffable dump files.

Changed 2 years ago by dfeuer

Attachment: pre-set-levels.gz added

Before setlevels patch

Changed 2 years ago by dfeuer

Attachment: post-set-levels.bz2 added

After setlevels patch

comment:13 Changed 2 years ago by dfeuer

Cc: dfeuer added

comment:14 Changed 2 years ago by bgamari

Milestone: 8.2.28.2.3

It doesn't look like this is going to happen for 8.2.2.

comment:15 Changed 2 years ago by rahulmutt

I recently merged the DmdAnal module from ghc master into Eta and observed a 2-3x blowup in generated codesize for the exact same module - Text.Regex.TDFA.NewDFA.Engine_FA. Given that the Eta code base does not yet contain the early inlining patch or anything relating to join points, I think one thing to be looking at is the impact of the new changes to the demand analyzer and how the inliner reacts to strictness/usage information.

comment:16 Changed 2 years ago by bgamari

rahulmutt, thanks for the datapoint. Can you say more specifically which revision of DmdAnal you were previously using and which you updated to?

comment:17 Changed 2 years ago by rahulmutt

Before: 63205f719287cb011388b4beddf30d3229238b9f (DmdAnal in ghc-7.10.3a-release tag)

After: b5b7d820afd8fca098bf1f4a7380d425ca6be31d

I still need to merge in the recent occurrence analyzer changes since I recall reading that OneShotInfo is not longer handled in the demand analyzer and has moved to the occurence analyzer. Nonetheless, I don't think lack of OneShotInfo would cause the observed magnitude of code size difference, but I may be wrong.

comment:18 Changed 22 months ago by ntc2

I'm observing an apparently exponential run time performance regression when combining GHC 8.2.1, Data.Text.Lazy, and Text.RE.TDFA. Here's my repo with simple test code illustrating the regression and timing stats: https://github.com/ntc2/ghc-8.2.1-regex-lazy-text-bug.

In summary, for the problematic combination of GHC version and libraries, the run times are 3s, 10s, 22s, and 40s for counting regex matches in files with 10000, 20000, 30000, and 40000 lines, respectively. For all of the unproblematic combinations -- i.e. GHC 8.0.2, String, strict Data.Text, or building with profiling -- the run time is always about 1s!

And this is a Heisenbug, in that the performance problem goes away if I build with profiling support!

EDIT: I see this issue is categorized as a "compile-time performance bug", so perhaps the category should be expanded, or a separate issue should be opened.

Last edited 22 months ago by ntc2 (previous) (diff)

comment:19 Changed 22 months ago by nomeata

@ntc2, that is an interesting finding, but maybe worth a separate issue.

Since you seem to have a nice setup to verify the problem, could you bisect which commit in GHC did this?

comment:20 Changed 22 months ago by ntc2

Cc: ntc2 added

comment:21 Changed 22 months ago by ntc2

@nomeata, I'll create a separate issue and try the bisection. Never done Git bisection before, but found a GHC docs page here: https://ghc.haskell.org/trac/ghc/wiki/WorkingConventions/Bisection.

(also, I thought commenting added me to the CC list, but apparently not)

EDIT: created separate Issue #14519

Last edited 22 months ago by ntc2 (previous) (diff)

comment:22 Changed 16 months ago by gbaz

Cc: gbaz added
Note: See TracTickets for help on using tickets.