Opened 4 years ago

Last modified 20 months ago

#11195 new bug

New pattern-match check can be non-performant

Reported by: goldfire Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.11
Keywords: PatternMatchWarnings Cc: gkaracha
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D1676, Phab:D1795
Wiki Page:

Description

In my work in merging Phab:D808 (and originally pointed out by thomie), I ran into a peculiar bug: the new pattern-match check took gobs of memory (~10GB) to check OptCoercion in the stage-2 compiler. My analysis got only as far as nailing the problem down to pmTraverse, which took 80% of the time, and got called roughly a gajillion times.

To reproduce, simply compile OptCoercion (with -package ghc) and you'll see your memory usage explode.

So that you'll be able to build GHC until this is fixed, I've disable the pattern-match check in that file.

There is a good chance that this is caused by an interaction between the pattern-match check and the type=kind code. I'm happy to probe further, but I'd want George's help.

NB: This applies only after my patch is merged. Which should be today. For some value of today. Hopefully today's value of today.

Attachments (1)

T11195.hs (5.7 KB) - added by bgamari 4 years ago.
Fix testcase

Download all attachments as: .zip

Change History (20)

comment:1 Changed 4 years ago by bgamari

Looking at the code I'd suspect the problem is either opt_trans_rule or opt_co4 as these both have a substantial number of patterns each with many guards. I'll try distilling a testcase from one of these.

comment:2 Changed 4 years ago by bgamari

Cc: gkaracha added
Milestone: 8.0.1
Owner: set to goldfire
Type of failure: None/UnknownCompile-time crash

comment:3 Changed 4 years ago by bgamari

$ inplace/bin/ghc-stage2 Test.hs -package ghc -O +RTS -s
[1 of 1] Compiling Test             ( Test.hs, Test.o )
  22,902,711,336 bytes allocated in the heap
  33,446,941,712 bytes copied during GC
   2,960,993,984 bytes maximum residency (21 sample(s))
      15,073,104 bytes maximum slop
            7835 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       530 colls,     0 par   16.236s  61.556s     0.1161s    29.5767s
  Gen  1        21 colls,     0 par    8.853s  69.183s     3.2944s    61.3812s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    6.645s  ( 60.820s elapsed)
  GC      time   25.089s  (130.739s elapsed)
  EXIT    time    0.251s  (  1.432s elapsed)
  Total   time   31.996s  (192.991s elapsed)

  Alloc rate    3,446,736,733 bytes per MUT second

  Productivity  21.6% of total user, 3.6% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

Changed 4 years ago by bgamari

Attachment: T11195.hs added

Fix testcase

comment:4 in reply to:  1 Changed 4 years ago by gkaracha

Replying to bgamari:

Looking at the code I'd suspect the problem is either opt_trans_rule or opt_co4 as these both have a substantial number of patterns each with many guards. I'll try distilling a testcase from one of these.

Yes, I think so too. For example, concerning this part:

-- Push transitivity inside forall
opt_trans_rule is co1 co2
  | ForAllCo tv1 eta1 r1 <- co1
  , Just (tv2,eta2,r2) <- etaForAllCo_maybe co2 = undefined

  | ForAllCo tv2 eta2 r2 <- co2
  , Just (tv1,eta1,r1) <- etaForAllCo_maybe co1 = undefined

Considering only the guards ForAllCo tv1 eta1 r1 <- co1 and ForAllCo tv2 eta2 r2 <- co2 will generate as missing cases the following:

TcRefl         _ _ ~ co1
TcTyConAppCo _ _ _ ~ co1
TcAppCo        _ _ ~ co1
TcCoVarCo        _ ~ co1
         ...
TcCoercion       _ ~ co1

ForAllCo tv1 eta1 r1 ~ co1, TcRefl         _ _ ~ co2
ForAllCo tv1 eta1 r1 ~ co1, TcTyConAppCo _ _ _ ~ co2
ForAllCo tv1 eta1 r1 ~ co1, TcAppCo        _ _ ~ co2
ForAllCo tv1 eta1 r1 ~ co1, TcCoVarCo        _ ~ co2
                         ...
ForAllCo tv1 eta1 r1 ~ co1, TcCoercion       _ ~ co2

This will be passed to the next clause and each one will be checked individually so the numbers will be multiplied and the sets will really grow exponentially. Guards are expensive by definition so I cannot think of a nice way out of this. Some options I can think of though are:

  1. Restructure the code to use less guards (or use them differently)
  2. See if I can make the check to eagerly solve some constraints to decrease the size of the sets it generates (I am not sure if it will be enough though, since, the above sets are all consistent for example so pruning eagerly wouldn't decrease the size of the above uncovered set)
  3. Oversimplify the check to drop all the guards

I hate to admit it but after all these performance failures (#11160, #11161, ... and now this), I am really tempted to drop the guards. The new checker will still be accurate when it comes to GADTs but will have almost the same performance as the old checker, by not reasoning about guards at all, like it used to do. I am not sure about the price we are willing to pay for the additional reasoning (I feel that we are paying a lot) so I would like some input on this. What do you think?

In any case I will investigate it more and see if this really has anything to do with the type=kind change or if it is inherent to the check. I assume the latter but if I am mistaken I am sure we can sort it out with Richard.

George

comment:5 Changed 4 years ago by Richard Eisenberg <eir@…>

In 67465497/ghc:

Add kind equalities to GHC.

This implements the ideas originally put forward in
"System FC with Explicit Kind Equality" (ICFP'13).

There are several noteworthy changes with this patch:
 * We now have casts in types. These change the kind
   of a type. See new constructor `CastTy`.

 * All types and all constructors can be promoted.
   This includes GADT constructors. GADT pattern matches
   take place in type family equations. In Core,
   types can now be applied to coercions via the
   `CoercionTy` constructor.

 * Coercions can now be heterogeneous, relating types
   of different kinds. A coercion proving `t1 :: k1 ~ t2 :: k2`
   proves both that `t1` and `t2` are the same and also that
   `k1` and `k2` are the same.

 * The `Coercion` type has been significantly enhanced.
   The documentation in `docs/core-spec/core-spec.pdf` reflects
   the new reality.

 * The type of `*` is now `*`. No more `BOX`.

 * Users can write explicit kind variables in their code,
   anywhere they can write type variables. For backward compatibility,
   automatic inference of kind-variable binding is still permitted.

 * The new extension `TypeInType` turns on the new user-facing
   features.

 * Type families and synonyms are now promoted to kinds. This causes
   trouble with parsing `*`, leading to the somewhat awkward new
   `HsAppsTy` constructor for `HsType`. This is dispatched with in
   the renamer, where the kind `*` can be told apart from a
   type-level multiplication operator. Without `-XTypeInType` the
   old behavior persists. With `-XTypeInType`, you need to import
   `Data.Kind` to get `*`, also known as `Type`.

 * The kind-checking algorithms in TcHsType have been significantly
   rewritten to allow for enhanced kinds.

 * The new features are still quite experimental and may be in flux.

 * TODO: Several open tickets: #11195, #11196, #11197, #11198, #11203.

 * TODO: Update user manual.

Tickets addressed: #9017, #9173, #7961, #10524, #8566, #11142.
Updates Haddock submodule.

comment:6 Changed 4 years ago by Ben Gamari <ben@…>

In bec5350d/ghc:

Adding flags: -ffull-guard-reasoning and too-many-guards

Introduction of two new flags, for more precise control over the new
pattern match checker's behaviour when reasoning about guards. This is
supposed to address #11195 (and maybe more performance bugs related to
the NP-Hardness of coverage checking).

Expected behaviour:

  * When `-ffull-guard-reasoning` is on, run the new pattern match
    checker in its full power

  * When `-ffull-guard-reasoning` is off (the default), for every
    match, check a metric to see whether pattern match checking for it
    has high probability of being non performant (at the the moment we
    check whether the number of guards is over 20 but I would like to
    use a more precise measure in the future). If the probability is
    high:

    - Oversimplify the guards (less expressive but more performant)
      and run the checker, and

    - Issue a warning about the simplification that happened.

A new flag `-Wtoo-many-guards/-Wno-too-many-guards` suppresses the
warning about the simplification (useful when combined with -Werror).

Test Plan: validate

Reviewers: goldfire, austin, hvr, bgamari

Reviewed By: bgamari

Subscribers: mpickering, thomie

Differential Revision: https://phabricator.haskell.org/D1676

GHC Trac Issues: #11195

comment:7 Changed 4 years ago by bgamari

Differential Rev(s): Phab:D1676
Resolution: fixed
Status: newclosed

The above patch teaches the pattern checker to run the patterns which it will check through a heuristic to determine whether it may result in the blow-up documented in this ticket. If there is a good chance things will explode the checker will run a simplified, but more performant, check. This should prevent this issue from affecting users in most cases.

comment:8 Changed 4 years ago by simonpj

Keywords: PatternMatchWarnings added
Milestone: 8.0.18.2.1
Owner: goldfire deleted
Resolution: fixed
Status: closednew

Re-opening for 8.2.

comment:9 Changed 4 years ago by Ben Gamari <ben@…>

In 28f951e/ghc:

Overhaul the Overhauled Pattern Match Checker

Overhaul the Overhauled Pattern Match Checker

* Changed the representation of Value Set Abstractions. Instead of
using a prefix tree, we now use a list of Value Vector Abstractions.
The set of constraints Delta for every Value Vector Abstraction is the
oracle state so that we solve everything only once.

* Instead of doing everything lazily, we prune at once (and in general
everything is much stricter). Hence, an example written with pattern
guards is checked in almost the same time as the equivalent with
pattern matching.

* Do not store the covered and the divergent sets at all. Since what we
only need is a yes/no (does this clause cover anything? Does it force
any thunk?) We just keep a boolean for each.

* Removed flags `-Wtoo-many-guards` and `-ffull-guard-reasoning`.
Replaced with `fmax-pmcheck-iterations=n`. Still debatable what should
the default `n` be.

* When a guard is for sure not going to contribute anything, we treat
it as such: The oracle is not called and cases `CGuard`, `UGuard` and
`DGuard` from the paper are not happening at all (the generation of a
fresh variable, the unfolding of the pattern list etc.). his combined
with the above seems to be enough to drop the memory increase for test
T783 down to 18.7%.

* Do not export function `dsPmWarn` (it is now called directly from
within `checkSingle` and `checkMatches`).

* Make `PmExprVar` hold a `Name` instead of an `Id`. The term oracle
does not handle type information so using `Id` was a waste of
time/space.

* Added testcases T11195, T11303b (data families) and T11374

The patch addresses at least the following:
Trac #11195, #11276, #11303, #11374, #11162

Test Plan: validate

Reviewers: goldfire, bgamari, hvr, austin

Subscribers: simonpj, thomie

Differential Revision: https://phabricator.haskell.org/D1795

comment:10 Changed 4 years ago by gkaracha

I think that commit 28f951edfe50ea5182065144340061ec326781f5 addresses this completely:

Author: George Karachalias <george.karachalias@gmail.com>
Date:   Wed Feb 3 19:06:45 2016 +0100

    Overhaul the Overhauled Pattern Match Checker

Hence, I am closing this. Feel free to reopen in case performance issues arise again due to pattern match checking.

comment:11 Changed 4 years ago by gkaracha

Differential Rev(s): Phab:D1676Phab:D1676, Phab:D1795
Resolution: fixed
Status: newclosed

comment:12 Changed 3 years ago by simonpj

Resolution: fixed
Status: closednew

I'm re-opening this because compilation still takes a ridiculously long time (minutes) and memory (6.7G allocated). Almost all of this is in the pattern-match checking. It can't be right!

Simon

comment:13 Changed 3 years ago by bgamari

simonpj, are you referring to the compilation of OptCoercion again?

comment:14 in reply to:  13 Changed 3 years ago by simonpj

Replying to bgamari:

simonpj, are you referring to the compilation of OptCoercion again?

I think so. It was careless of me not to be more specific.

comment:15 Changed 3 years ago by bgamari

Milestone: 8.2.18.4.1
Priority: highesthigh

I don't believe this will get fixed for 8.2 (assuming it is indeed still a problem; I've not seen such terrible performance checking OptCoercion).

comment:16 Changed 3 years ago by rwbarton

It's very strange. I also see OptCoercion taking several minutes and gobs of memory to compile, when I copy it somewhere and build it with -package ghc; but according to my logs from investigating join points it never takes particularly long to compile as part of the ghc build.

comment:17 Changed 3 years ago by simonpj

Suggestion: switch on -ticky or profiling for the regular build process, to try to get an idea of where that time and space is going.

comment:18 Changed 3 years ago by bgamari

I also noticed while looking at the early inline patch that T11195, which is derived from OptCoercion, also seems to take quite a long time (and consequently occasionally fails).

comment:19 Changed 20 months ago by bgamari

Milestone: 8.4.1
Priority: highnormal

Given how there has been little progress on this, un-milestoning and downgrading priority.

Note: See TracTickets for help on using tickets.