Opened 2 years ago

Closed 23 months ago

#14667 closed bug (fixed)

Compiling a function with a lot of alternatives bottlenecks on insertIntHeap

Reported by: niteria Owned by:
Priority: normal Milestone: 8.6.1
Component: Compiler Version:
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): phab:D4329
Wiki Page:

Description

I have a module that looks like this:

module A10000 where

data A = A
  | A00001
  | A00002
...
  | A10000

f :: A -> Int
f A00001 = 19900001
f A00002 = 19900002
...
f A10000 = 19910000

Compiling with -s gives:

[1 of 1] Compiling A10000           ( A10000.hs, A10000.o )                                   
                                               
A10000.hs:10004:1: warning:                                                                   
    Pattern match checker exceeded (2000000) iterations in                                    
    an equation for ‘f’. (Use -fmax-pmcheck-iterations=n                                      
    to set the maximun number of iterations to n)                                                                                                                                            
      |
10004 | f A00001 = 19900001
      | ^^^^^^^^^^^^^^^^^^^...
  95,068,628,968 bytes allocated in the heap
  14,166,421,680 bytes copied during GC
     312,247,488 bytes maximum residency (19 sample(s))
       3,267,024 bytes maximum slop
             857 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     35163 colls,     0 par    5.191s   5.170s     0.0001s    0.0724s
  Gen  1        19 colls,     0 par    1.240s   1.236s     0.0650s    0.1872s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time   23.100s  ( 23.341s elapsed)
  GC      time    6.431s  (  6.405s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time   29.532s  ( 29.748s elapsed)

With profiling enabled I get:

  total time  =       22.67 secs   (22673 ticks @ 1000 us, 1 processor)                                                                                                                      
  total alloc = 36,143,653,320 bytes  (excludes profiling overheads)                                                                                                                         
                                                                                                                                                                                             
COST CENTRE     MODULE         SRC                                                 %time %alloc  ticks     bytes                                                                             
                                                                                                                                                                                             
insertIntHeap   Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(450,1)-(454,38)      73.9   77.5  16746 28001680176                                                                           
SimplTopBinds   SimplCore      compiler/simplCore/SimplCore.hs:770:39-74             9.4    1.8   2136 647116224                                                                             
deSugar         HscMain        compiler/main/HscMain.hs:511:7-44                     2.6    4.2    599 1530056552                                                                            
pprNativeCode   AsmCodeGen     compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    2.3    2.9    524 1030493864                                                                            
StgCmm          HscMain        compiler/main/HscMain.hs:(1426,13)-(1427,62)          1.3    1.4    288 497401376                                                                             
RegAlloc-linear AsmCodeGen     compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    0.9    1.1    201 383789200                                                                             
tc_rn_src_decls TcRnDriver     compiler/typecheck/TcRnDriver.hs:(494,4)-(556,7)      0.8    1.2    176 441973152                                                                             
Parser          HscMain        compiler/main/HscMain.hs:(316,5)-(384,20)             0.5    1.1    113 411448880                                                                             

After a local patch that basically reverts https://phabricator.haskell.org/rGHC5a1a2633553 I get:

[1 of 1] Compiling A10000           ( A10000.hs, A10000.o )                                   
                                               
A10000.hs:10004:1: warning:                                                                   
    Pattern match checker exceeded (2000000) iterations in                                    
    an equation for ‘f’. (Use -fmax-pmcheck-iterations=n
    to set the maximun number of iterations to n)
      |
10004 | f A00001 = 19900001
      | ^^^^^^^^^^^^^^^^^^^...
  15,162,268,144 bytes allocated in the heap
   4,870,184,600 bytes copied during GC
     323,794,936 bytes maximum residency (19 sample(s))
       3,074,056 bytes maximum slop
             886 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       811 colls,     0 par    1.828s   1.821s     0.0022s    0.0770s
  Gen  1        19 colls,     0 par    1.217s   1.213s     0.0638s    0.1820s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    5.842s  (  6.144s elapsed)
  GC      time    3.046s  (  3.034s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    8.888s  (  9.179s elapsed)

For a file of smaller size with 1000 constructors, it still gives a 10% win.

This example is artificial, but it looks like something that someone could write for a sparse enum that looks like this in C++:

enum access_t { read = 1, write = 2, exec = 4 };

Change History (12)

comment:1 Changed 2 years ago by Bartosz Nitka <niteria@…>

In cf2c029c/ghc:

Fix quadratic behavior of prepareAlts

Summary:
This code is quadratic and a simple test case I used
managed to tickle it.

The example (same one as #14667) looks like this:
```
module A10000 where

 data A = A
   | A00001
   | A00002
 ...
   | A10000

 f :: A -> Int
 f A00001 = 19900001
 f A00002 = 19900002
 ...
 f A10000 = 19910000
```

Applied on top of a fix for #14667, it gives a 30% compile time
improvement.

Test Plan: ./validate

Reviewers: simonpj, bgamari

Subscribers: rwbarton, thomie, simonmar, carter

Differential Revision: https://phabricator.haskell.org/D4307

comment:2 Changed 23 months ago by bgamari

niteria, is there anything more needed here?

comment:3 Changed 23 months ago by niteria

The commit referenced above is related, but solves a different problem. I didn't expect it to be referenced here.

The issue still persists, I'd like to revert https://phabricator.haskell.org/rGHC5a1a2633553, but that commit was supposed to be a performance win, so it'd be helpful if I knew that's what we want to do.

I hoped someone would tell me what to test before putting up a diff with the revert. Should I run nofib, or is ./validate enough?

comment:4 Changed 23 months ago by niteria

Owner: set to niteria

comment:5 Changed 23 months ago by simonpj

The commit in comment:1 says "Applied on top of a fix for #14667 it gives a 30% improvement". So that implies that you have a separate fix for this ticket. Can you

  • Characterise what causes the perf problem that this ticket identifies?
  • Describe your proposed fix?

If you want to revert Simon M's patch you'd better discuss it with him. Perhaps he had a case that was improved by it.

comment:6 Changed 23 months ago by simonmar

That diff of mine basically replaces a set with a list, so it's not entirely surprising that it can be responsible for something being accidentally quadratic. I think at the time it was a small improvement because N was small in everything I measured, and for small N a list is a bit faster than an IntSet, but again it's not entirely surprising to find that it's possible to make it behave badly.

I'm totally fine with reverting it, but do a nofib run first to check for adverse effects.

comment:7 Changed 23 months ago by simonpj

I wonder why you replaced a set with a list? Was it just perf? Doing so would yield at best a small perf improvement.

In any case, a well-implemented set abstraction should probably keep small sets as lists anyway (if that's a perf improvement). If it doesn't, it'd be better to improve the set abstraction than to commit to a list regardless of size.

comment:8 Changed 23 months ago by niteria

Differential Rev(s): phab:D4329
Status: newpatch

comment:9 Changed 23 months ago by Ben Gamari <ben@…>

In 88297438/ghc:

Use IntSet in Dataflow

Before this change, a list was used as a substitute for a heap.
This led to quadratic behavior on a simple program (see new
test case).

This change replaces it with IntSet in effect reverting
5a1a2633553. @simonmar said it's fine to revert as long as nofib
results are good.

Test Plan:
new test case:

20% improvement
3x improvement when N=10000

nofib:

I run it twice for before and after because the compile time
results are noisy.

- Compile Allocations:

```
          before    before re-run    after     after re-run
-1 s.d.   -----     -0.0%            -0.1%     -0.1%
+1 s.d.   -----     +0.0%            +0.1%     +0.1%
Average   -----     +0.0%            -0.0%     -0.0%
```
- Compile Time:

```
          before    before re-run    after     after re-run
-1 s.d.   -----     -0.1%            -2.3%     -2.6%
+1 s.d.   -----     +5.2%            +3.7%     +4.4%
Average   -----     +2.5%            +0.7%     +0.8%

```
I checked each case and couldn't find consistent slow-down/speed-up on
compile time. Full results here: P173

Reviewers: simonpj, simonmar, bgamari

Reviewed By: bgamari

Subscribers: rwbarton, thomie, carter, simonmar

GHC Trac Issues: #14667

Differential Revision: https://phabricator.haskell.org/D4329

comment:10 Changed 23 months ago by bgamari

Milestone: 8.6.1
Resolution: fixed
Status: patchclosed

comment:11 Changed 23 months ago by bgamari

Owner: niteria deleted
Resolution: fixed
Status: closednew

Actually, there may be more left to do here; niteria, close this when you feel appropriate.

comment:12 Changed 23 months ago by niteria

Resolution: fixed
Status: newclosed

All good, thanks!

Note: See TracTickets for help on using tickets.