Opened 14 months ago

Last modified 10 months ago

#15304 new bug

Huge increase of compile time and memory use from 8.0.2 to 8.2.2 or 8.4.2

Reported by: NathanWaivio Owned by: tdammers
Priority: high Milestone: 8.6.1
Component: Compiler Version: 8.4.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I am the author of the cl3 library on Hackage. I have noticed a huge increase of compile time and memory use when testing 8.2.2 and 8.4.2. ghc-8.0.2 compiled in 4:17.33 using 3.5 GB. ghc-8.2.2 compiled in 26:40.15 using 32.8 GB. This is an increase of 6x in time and 9x in memory. This is not all bad, my nbody benchmark has improved about 35% between ghc-8.0.2 and ghc-8.4.2 so the increased compilation time and memory usage are producing much better runtime performance. I am interested if you could suggest some workarounds to help others compile on systems with less resources. I have 64GB memory in my system and would like to test out some -fno-* GHC Options. Could you point me in the right direction? The library is almost entirely pure functions. I am also interested in other options, like if there are ways to rewrite things to make it easier on the compiler or using NOINLINE on a trouble spot and how to find that trouble spot.

Attachments (4)

ghc-8.2.2-v.txt (27.0 KB) - added by NathanWaivio 14 months ago.
verbose output of ghc-8.2.2
ghc-8.0.2-v.txt (23.9 KB) - added by NathanWaivio 14 months ago.
verbose output of ghc-8.0.2
ghc-8.4.2-v.txt (27.4 KB) - added by NathanWaivio 14 months ago.
Attached verbose output of ghc-8.4.2. Noticed that while compiling the memory usage increased from 10GB to 32GB during CodeGen.
Cl3.hs (52.4 KB) - added by monoidal 13 months ago.

Download all attachments as: .zip

Change History (22)

comment:1 Changed 14 months ago by nomeata

A quick first thing to do is to run ghc with -v. It will print statistics about each core-to-core pass (size of the AST, and in recent versions memory consumption), and maybe you can spot one pass where the size of the AST drastically increases.

Changed 14 months ago by NathanWaivio

Attachment: ghc-8.2.2-v.txt added

verbose output of ghc-8.2.2

comment:2 Changed 14 months ago by NathanWaivio

All of the numbers look pretty big to me.

Changed 14 months ago by NathanWaivio

Attachment: ghc-8.0.2-v.txt added

verbose output of ghc-8.0.2

comment:3 Changed 14 months ago by NathanWaivio

I have now been able to analyze the differences between ghc-8.0.2 to ghc-8.2.2.

Here is a table of the growth of the number of terms per iteration of the Simplifier for the module Algebra.Geometric.Cl3:

Simplifier Iteration 1 2 3 4
ghc-8.0.2 118,835 138,330 172,291 516,767
ghc-8.2.2 149,046 185,066 15,190,006 12,310,166

Apparently the largest difference occurs in Simplifier Iteration 3 between the Worker Wrapper and Float Out.

What is going on with Simplifier Iteration 3?

The other module, Algebra.Geometric.Cl3.JonesCalculus, actually has fewer terms in ghc-8.2.2 than ghc-8.0.2.

Last edited 14 months ago by NathanWaivio (previous) (diff)

Changed 14 months ago by NathanWaivio

Attachment: ghc-8.4.2-v.txt added

Attached verbose output of ghc-8.4.2. Noticed that while compiling the memory usage increased from 10GB to 32GB during CodeGen.

comment:4 Changed 14 months ago by bgamari

Priority: normalhigh

comment:5 Changed 14 months ago by tdammers

Owner: set to tdammers

comment:6 Changed 13 months ago by tdammers

Steps to reproduce:

  1. Check out from git: #!bash git clone https://github.com/waivio/cl3.git; cd cl3
  2. Install the random package (e.g. cabal install random)
  3. #!bash cd cl3
  4. #!bash ghc -c Algebra/Geometric/Cl3.hs -fforce-recomp -O2

Without -O2, things compile in under .1 seconds on my machine; adding -O2 makes it take minutes.

comment:7 Changed 13 months ago by tdammers

It seems that the Eq instance is at the core of the problem: replacing it with the following allows GHC to compile the module in a timely fashion, and compilation succeeds even with a heap limit of only 128M:

instance Eq Cl3 where
    (==) = undefined

This instance also compiles fast:

instance Eq Cl3 where
    (R a0) == (R b0) = True
    _ == _ = undefined

But this one seems to blow up:

instance Eq Cl3 where
    (R a0) == (R b0) = a0 == b0
    _ == _ = undefined
Last edited 13 months ago by tdammers (previous) (diff)

comment:8 Changed 13 months ago by tdammers

Further investigation reveals that it's not necessarily the Eq instance itself that causes trouble; commenting out various parts of the module allows me to make it blow up or not blow up even with the full original Eq in place. It seems that multiple factors contribute to the blowup, and removing enough of them "fixes" the problem.

comment:9 Changed 13 months ago by simonpj

Here's -dshow-passes with HEAD

simonpj@cam-05-unx:~/tmp/cl3$ ~/5builds/HEAD-5/inplace/bin/ghc-stage2 -c -dshow-passes src/Algebra/Geometric/Cl3.hs -O2
Glasgow Haskell Compiler, Version 8.7.20180710, stage 2 booted by GHC version 8.2.2
Using binary package database: /home/simonpj/5builds/HEAD-5/inplace/lib/package.conf.d/package.cache
Using binary package database: /home/simonpj/.ghc/x86_64-linux-8.7.20180710/package.conf.d/package.cache
package flags []
loading package database /home/simonpj/5builds/HEAD-5/inplace/lib/package.conf.d
loading package database /home/simonpj/.ghc/x86_64-linux-8.7.20180710/package.conf.d
wired-in package ghc-prim mapped to ghc-prim-0.5.3
wired-in package integer-gmp mapped to integer-gmp-1.0.2.0
wired-in package base mapped to base-4.12.0.0
wired-in package rts mapped to rts
wired-in package template-haskell mapped to template-haskell-2.14.0.0
wired-in package ghc mapped to ghc-8.7
*** Checking old interface for Algebra.Geometric.Cl3 (use -ddump-hi-diffs for more details):
*** Parser [Algebra.Geometric.Cl3]:
!!! Parser [Algebra.Geometric.Cl3]: finished in 281.59 milliseconds, allocated 124.657 megabytes
*** Renamer/typechecker [Algebra.Geometric.Cl3]:
!!! Renamer/typechecker [Algebra.Geometric.Cl3]: finished in 1729.18 milliseconds, allocated 582.499 megabytes
*** Desugar [Algebra.Geometric.Cl3]:
Result size of Desugar (before optimization)
  = {terms: 46,524, types: 55,087, coercions: 2,210, joins: 0/9,539}
Result size of Desugar (after optimization)
  = {terms: 26,825, types: 34,697, coercions: 4,390, joins: 1/660}
!!! Desugar [Algebra.Geometric.Cl3]: finished in 463.17 milliseconds, allocated 198.217 megabytes
*** Simplifier [Algebra.Geometric.Cl3]:
Result size of Simplifier iteration=1
  = {terms: 29,453, types: 35,248, coercions: 7,271, joins: 1/982}
Result size of Simplifier iteration=2
  = {terms: 26,430, types: 32,412, coercions: 5,036, joins: 1/207}
Result size of Simplifier iteration=3
  = {terms: 26,370, types: 32,315, coercions: 4,924, joins: 1/198}
Result size of Simplifier
  = {terms: 26,370, types: 32,315, coercions: 4,924, joins: 1/198}
!!! Simplifier [Algebra.Geometric.Cl3]: finished in 1478.89 milliseconds, allocated 532.541 megabytes
*** Specialise [Algebra.Geometric.Cl3]:
Result size of Specialise
  = {terms: 27,077, types: 33,084, coercions: 4,924, joins: 1/226}
!!! Specialise [Algebra.Geometric.Cl3]: finished in 34.41 milliseconds, allocated 20.874 megabytes
*** Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Algebra.Geometric.Cl3]:
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 29,888, types: 35,057, coercions: 4,924, joins: 1/217}
!!! Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Algebra.Geometric.Cl3]: finished in 303.86 milliseconds, allocated 133.719 megabytes
*** Simplifier [Algebra.Geometric.Cl3]:
Result size of Simplifier iteration=1
  = {terms: 109,632,
     types: 50,922,
     coercions: 4,826,
     joins: 183/7,875}
Result size of Simplifier iteration=2
  = {terms: 93,026,
     types: 52,819,
     coercions: 4,899,
     joins: 185/1,646}
Result size of Simplifier iteration=3
  = {terms: 135,959,
     types: 55,173,
     coercions: 4,892,
     joins: 99/2,772}
Result size of Simplifier iteration=4
  = {terms: 131,354, types: 52,485, coercions: 4,892, joins: 53/529}
Result size of Simplifier
  = {terms: 131,354, types: 52,485, coercions: 4,892, joins: 53/529}
!!! Simplifier [Algebra.Geometric.Cl3]: finished in 4415.46 milliseconds, allocated 1573.215 megabytes
*** Simplifier [Algebra.Geometric.Cl3]:
Result size of Simplifier iteration=1
  = {terms: 130,205, types: 52,159, coercions: 4,892, joins: 37/519}
Result size of Simplifier iteration=2
  = {terms: 128,591, types: 51,440, coercions: 4,892, joins: 37/513}
Result size of Simplifier
  = {terms: 128,591, types: 51,440, coercions: 4,892, joins: 37/513}
!!! Simplifier [Algebra.Geometric.Cl3]: finished in 3285.00 milliseconds, allocated 1248.401 megabytes
*** Simplifier [Algebra.Geometric.Cl3]:
Result size of Simplifier iteration=1
  = {terms: 129,119, types: 51,615, coercions: 4,892, joins: 37/538}
Result size of Simplifier iteration=2
  = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533}
Result size of Simplifier
  = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533}
!!! Simplifier [Algebra.Geometric.Cl3]: finished in 3415.70 milliseconds, allocated 1218.423 megabytes
*** Float inwards [Algebra.Geometric.Cl3]:
Result size of Float inwards
  = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533}
!!! Float inwards [Algebra.Geometric.Cl3]: finished in 218.06 milliseconds, allocated 143.267 megabytes
*** Called arity analysis [Algebra.Geometric.Cl3]:
Result size of Called arity analysis
  = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533}
!!! Called arity analysis [Algebra.Geometric.Cl3]: finished in 160.66 milliseconds, allocated 86.793 megabytes
*** Simplifier [Algebra.Geometric.Cl3]:
Result size of Simplifier
  = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533}
!!! Simplifier [Algebra.Geometric.Cl3]: finished in 943.43 milliseconds, allocated 405.961 megabytes
*** Demand analysis [Algebra.Geometric.Cl3]:
Result size of Demand analysis
  = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533}
!!! Demand analysis [Algebra.Geometric.Cl3]: finished in 1513.03 milliseconds, allocated 552.224 megabytes
*** Worker Wrapper binds [Algebra.Geometric.Cl3]:
Result size of Worker Wrapper binds
  = {terms: 130,487, types: 54,247, coercions: 4,892, joins: 38/701}
!!! Worker Wrapper binds [Algebra.Geometric.Cl3]: finished in 36.37 milliseconds, allocated 10.774 megabytes
*** Simplifier [Algebra.Geometric.Cl3]:
Result size of Simplifier iteration=1
  = {terms: 143,685,
     types: 58,766,
     coercions: 4,815,
     joins: 96/1,750}
Result size of Simplifier iteration=2
  = {terms: 175,579,
     types: 63,298,
     coercions: 4,815,
     joins: 173/1,293}
  C-c C-c*** Deleting temp files:
*** Deleting temp dirs:

I had to stop it with ctrl-C

Changed 13 months ago by monoidal

Attachment: Cl3.hs added

comment:10 Changed 13 months ago by monoidal

Based on comment:6, I attached a bit smaller version without dependency on random.

On my machine:

  • ghc-8.0.2 -O -c Cl3.hs -fforce-recomp takes 2 secs
  • ghc-8.2.1 -O -c Cl3.hs -fforce-recomp takes 57 secs
  • ghc-8.4.3 -O -c Cl3.hs -fforce-recomp takes 95 secs

If I don't add -O, every version takes 2 secs.

comment:11 Changed 13 months ago by tdammers

Great, thanks a lot for that. 95 seconds is a lot more manageable.

comment:12 Changed 13 months ago by monoidal

You can make the process faster by removing methods from instance Floating Cl3 - it seems that every one of the 12 methods slows down the compilation by few seconds. For example, if you keep exp and log only it's 22 sec (and the root cause should still be present).

comment:13 Changed 13 months ago by tdammers

This is very useful. The reduced Cl3.hs with all Floating methods except exp and log removed gives a much cleaner result. Looking at -ddump-rule-firings output, here's counters of how often each rule fired on 8.0.2 and 8.4.3:

rule                      |  8.0  8.4
--------------------------+------------
*##                       |  127   44
+##                       |   26  372
^2/Integer                |   55   55
Class op -                |  419  417
Class op /                |   15    8
Class op *                | 1698 1693
Class op **               |    3    2
Class op +                |  737  734
Class op abs              |    5    5
Class op atan2            |    1    1
Class op cos              |    7    5
Class op cosh             |    3    2
Class op exp              |   14   10
Class op fromInteger      |   13    8
Class op fromRational     |    4    2
Class op log              |   20   14
Class op log1p            |    6    4
Class op negate           |  109  106
Class op $p1Floating      |   25   16
Class op $p1Fractional    |   20   12
Class op pi               |    1    1
Class op recip            |    4    3
Class op sin              |    7    5
Class op sinh             |    3    2
Class op sqrt             |   14   14
doubleFromInteger         |    9    7
SC:$clog0                 |    6    0
SC:$w$catan20             |    1    0

So it looks like a change to the way SpecConstrs are handled is preventing either specializations, or RULES that follow from them.

comment:14 Changed 13 months ago by tdammers

Compiling with -fno-spec-constr makes no difference performance wise, but gets rid of the SC:... rule firings, so those weren't the culprit after all. Investigating further.

comment:15 Changed 13 months ago by tdammers

OK, next experiment: compile with -fno-specialise, and diff -ddump-simpl-stats output (minus the PreInline and PostInline parts). Which gives us:

  • Cl3.dump-simpl-stats

    old new  
    11
    22==================== FloatOut stats: ====================
    3 2018-07-12 13:46:13.85196043 UTC
     32018-07-12 14:25:41.716397419 UTC
    44
    5 140 Lets floated to top level; 0 Lets floated elsewhere; from 35 Lambda groups
     5131 Lets floated to top level; 0 Lets floated elsewhere; from 22 Lambda groups
    66
    77
    88==================== FloatOut stats: ====================
    9 2018-07-12 13:46:17.767931325 UTC
     92018-07-12 14:26:18.231633767 UTC
    1010
    11 24 Lets floated to top level; 0 Lets floated elsewhere; from 42 Lambda groups
     111 Lets floated to top level; 0 Lets floated elsewhere; from 25 Lambda groups
    1212
    1313
    1414==================== Grand total simplifier statistics ====================
    15 2018-07-12 13:46:19.653844196 UTC
     152018-07-12 14:26:39.421166819 UTC
    1616
    17 Total ticks:     44196
     17Total ticks:     91304
    1818
    19 6609 UnfoldingDone
    20   1681 GHC.Float.$fNumDouble_$c*
     195077 UnfoldingDone
    2120  1681 GHC.Float.timesDouble
    22   729 GHC.Float.$fNumDouble_$c+
    2321  729 GHC.Float.plusDouble
    24   415 GHC.Float.$fNumDouble_$c-
    2522  415 GHC.Float.minusDouble
     23  177 $j_s6gs
     24  177 $j_s6kx
    2625  137 Algebra.Geometric.Cl3.$WAPS
     26  132 $j_s6gn
     27  132 $j_s6gq
     28  132 $j_s6ks
     29  132 $j_s6kv
     30  121 $j_s6cS
     31  121 $j_s6gX
     32  110 $j_s6PQ
     33  110 $j_s71Q
    2734  102 GHC.Base.$
    28   102 GHC.Float.$fNumDouble_$cnegate
    2935  102 GHC.Float.negateDouble
    30   62 $j_s5Li
    31   62 $j_s5LN
     36  44 $j_s6gm
     37  44 $j_s6gr
     38  44 $j_s6kr
     39  44 $j_s6kw
     40  25 Algebra.Geometric.Cl3.$WR
     41  22 $j_s6gi
     42  22 $j_s6go
     43  22 $j_s6kn
     44  22 $j_s6kt
    3245  19 Algebra.Geometric.Cl3.$WH
    3346  19 Algebra.Geometric.Cl3.$WODD
    34   17 Algebra.Geometric.Cl3.$WR
    3547  17 Algebra.Geometric.Cl3.$WBPV
    36   15 $c/_a1Af
    37   14 GHC.Float.$fFloatingDouble_$csqrt
     48  17 $j_s6cc
    3849  14 GHC.Float.sqrtDouble
    3950  14 Algebra.Geometric.Cl3.$WC
    40   12 $j_s5Le
    41   12 $j_s5Lg
    42   12 $j_s5LJ
    43   12 $j_s5LL
    4451  11 Algebra.Geometric.Cl3.$WPV
    4552  11 Algebra.Geometric.Cl3.$WTPV
    46   10 $j_s5Lf
    47   10 $j_s5LK
    48   9 $cfromInteger_a3eU
    49   8 $c*_a1He
    50   7 $c+_a1Az
    51   6 $clog1p_a1zz
    52   6 $s$clog_s614
    53   5 $cnegate_a36c
     53  8 $c*_a2h6
     54  8 GHC.Float.$fNumDouble_$cfromInteger
     55  7 $cfromInteger_a3OM
     56  7 $j_s6gp
     57  7 $j_s6ku
     58  6 $c/_a2ad
     59  6 $c+_a2at
    5460  5 Algebra.Geometric.Cl3.$WV3
    5561  5 Algebra.Geometric.Cl3.$WBV
    56   5 lvl_s5VO
    57   5 lvl_s5VP
    58   4 $c-_a1H7
    59   4 GHC.Float.$fNumDouble_$cabs
    60   4 $j_s5Lb
    61   4 $j_s5Lc
    62   4 $j_s5Ld
    63   4 $j_s5Lh
    64   4 $j_s5LG
    65   4 $j_s5LH
    66   4 $j_s5LI
    67   4 $j_s5LM
    68   3 $c**_a1yF
     62  4 $clog1p_a29F
     63  4 $cnegate_a3G4
     64  4 GHC.Float.fabsDouble
     65  4 $j_s6c8
     66  4 $j_s6ca
     67  4 $j_s6gj
     68  4 $j_s6gk
     69  4 $j_s6ko
     70  4 $j_s6kp
     71  3 Algebra.Geometric.Cl3.projEigs
    6972  3 Algebra.Geometric.Cl3.$WI
    70   3 $j_s5J8
    71   3 $j_s5Jo
     73  2 $c**_a28T
     74  2 $crecip_a2ak
     75  2 $c-_a2h1
    7276  2 GHC.Base.$!
    7377  2 GHC.Float.$dm**
    7478  2 GHC.Float.$dmexpm1
     
    8286  2 GHC.Real.$dm/
    8387  2 GHC.Real.$dmrecip
    8488  2 GHC.Num.$dm-
     89  2 Algebra.Geometric.Cl3.spectraldcmp
    8590  2 Algebra.Geometric.Cl3.reduce
    86   2 $dNum_s5Id
    87   2 $dNum_s5It
    88   2 lvl_s5VK
    89   1 $cabs_a36g
    90   1 GHC.Float.$fRealFloatDouble_$catan2
    91   1 GHC.Float.$fNumDouble_$cfromInteger
    92   1 GHC.Float.$fFractionalDouble_$crecip
    93   1 GHC.Float.$fFloatingDouble_$ccos
    94   1 GHC.Float.$fFloatingDouble_$csin
    95   1 GHC.Float.$fFloatingDouble_$clog
    96   1 GHC.Float.$fFloatingDouble_$cexp
    97   1 GHC.Float.sinDouble
    98   1 GHC.Float.logDouble
    99   1 GHC.Float.expDouble
     91  2 $j_s6gl
     92  2 $j_s6kq
     93  1 $cabs_a3G8
    10094  1 GHC.Float.cosDouble
    101   1 Algebra.Geometric.Cl3.projEigs
    102   1 $s$dmlog1pexp_s5Ge
    103   1 lvl_s5I8
    104   1 lvl_s5I9
    105   1 $dFractional_s5Ib
    106   1 lvl_s5Ik
    107   1 lvl_s5Il
    108   1 lvl_s5Im
    109   1 lvl_s5In
    110   1 lvl_s5Io
    111   1 lvl_s5Ip
    112   1 $dFractional_s5Ir
    113   1 lvl_s5Iw
    114   1 lvl_s5W0
    115 3337 RuleFired
    116   1695 Class op *
    117   737 Class op +
    118   419 Class op -
    119   127 *##
    120   109 Class op negate
     95  1 GHC.Float.expDouble
     96  1 GHC.Float.logDouble
     97  1 GHC.Float.sinDouble
     98  1 GHC.Float.$fFractionalDouble_$crecip
     99  1 GHC.Float.$fRealFloatDouble_$catan2
     100  1 lvl_s6br
     101  1 lvl_s6bs
     102  1 lvl_s6bu
     103  1 lvl_s6bv
     104  1 lvl_s6bx
     105  1 $j_s6c9
     106  1 lvl_s7OD
     107  1 lvl_s7OE
     1083532 RuleFired
     109  1691 Class op *
     110  734 Class op +
     111  417 Class op -
     112  372 +##
     113  106 Class op negate
    121114  55 ^2/Integer
    122   26 +##
    123   25 Class op $p1Floating
    124   20 Class op $p1Fractional
    125   15 Class op /
    126   14 Class op exp
    127   14 Class op log
     115  44 *##
     116  16 Class op $p1Floating
    128117  14 Class op sqrt
    129   13 Class op fromInteger
    130   9 doubleFromInteger
    131   6 Class op log1p
    132   6 SC:$clog0
     118  12 Class op $p1Fractional
     119  10 Class op exp
     120  10 Class op log
     121  8 Class op /
     122  8 Class op fromInteger
     123  7 doubleFromInteger
    133124  5 Class op abs
    134   4 Class op cos
    135   4 Class op fromRational
    136   4 Class op recip
    137   4 Class op sin
    138   3 Class op **
    139   3 Class op cosh
    140   3 Class op sinh
     125  4 Class op log1p
     126  3 Class op cos
     127  3 Class op recip
     128  3 Class op sin
     129  2 Class op **
     130  2 Class op cosh
     131  2 Class op fromRational
     132  2 Class op sinh
    141133  1 Class op atan2
    142134  1 Class op pi
    143   1 SC:$w$catan20
    144 25 LetFloatFromLet 25
    145 1 EtaReduction 1 x_a5bv
    146 9652 BetaReduction
    147   1681 ds_a5IW
    148   1681 ds1_a5IX
    149   729 ds_a5IM
    150   729 ds1_a5IN
    151   415 ds_a5Jb
    152   415 ds1_a5Jc
    153   137 dt_a1fe
    154   137 dt_a1ff
    155   137 dt_a1fg
    156   137 dt_a1fh
    157   137 dt_a1fi
    158   137 dt_a1fj
    159   137 dt_a1fk
    160   137 dt_a1fl
    161   124 dt_d5iD
    162   124 dt_d5iE
    163   124 dt_d5iF
    164   124 dt_d5iG
    165   124 dt_d5iH
    166   124 dt_d5iI
    167   124 dt_d5iJ
    168   124 dt_d5iK
    169   102 a_12
    170   102 b_13
    171   102 r_1j
    172   102 tpl_B1
    173   102 tpl_B2
    174   102 ds_a5JB
    175   55 a_a5nS
    176   55 $dNum_a5nT
    177   55 $dIntegral_a5nU
    178   55 x_a5nV
    179   24 dt_d5iP
    180   24 dt_d5iQ
    181   24 dt_d5iR
    182   24 dt_d5iS
    183   24 dt_d5j1
    184   24 dt_d5j2
    185   24 dt_d5j3
    186   24 dt_d5j4
    187   20 dt_d5iT
    188   20 dt_d5iU
    189   20 dt_d5iV
    190   20 dt_d5iW
    191   20 dt_d5iX
    192   20 dt_d5iY
    193   19 dt_a1eq
    194   19 dt_a1er
    195   19 dt_a1es
    196   19 dt_a1et
    197   19 dt_a1eU
    198   19 dt_a1eV
    199   19 dt_a1eW
    200   19 dt_a1eX
    201   17 dt_a1dS
    202   17 dt_a1eG
    203   17 dt_a1eH
    204   17 dt_a1eI
    205   17 dt_a1eJ
    206   17 dt_a1eK
    207   17 dt_a1eL
    208   16 eta_a53E
    209   16 eta1_a53F
    210   14 dt_a1eA
    211   14 dt_a1eB
    212   14 ds_a5IH
    213   12 sc_s60u
    214   12 sc_s60v
    215   11 dt_a1eg
    216   11 dt_a1eh
    217   11 dt_a1ei
    218   11 dt_a1ej
    219   11 dt_a1f4
    220   11 dt_a1f5
    221   11 dt_a1f6
    222   11 dt_a1f7
    223   9 int_a1bM
    224   8 ds_d4ON
    225   8 ds_d4OO
    226   8 dt_d5iL
    227   8 dt_d5iM
    228   8 dt_d5iN
    229   8 dt_d5iO
    230   8 dt_d5j5
    231   8 dt_d5j6
    232   8 dt_d5j7
    233   8 dt_d5j8
    234   8 dt_d5ja
    235   8 dt_d5jb
    236   8 dt_d5jc
    237   8 dt_d5jd
    238   8 dt_d5je
    239   8 dt_d5jf
    240   7 eta_a53o
    241   7 ds_d4CK
    242   7 ds_d4CL
    243   6 y_X5Nf
    244   5 x_a1bN
    245   5 dt_a1dW
    246   5 dt_a1dX
    247   5 dt_a1dY
    248   5 dt_a1e4
    249   5 dt_a1e5
    250   5 dt_a1e6
    251   5 x_a5bv
    252   5 y_a5bw
    253   4 eta_a53a
    254   4 eta1_a53b
    255   4 x_a5mY
    256   3 dt_a1ec
    257   2 cliffor_aIx
    258   2 r_aIz
    259   2 a_a533
    260   2 $dFloating_a534
    261   2 a_a538
    262   2 $dFloating_a539
    263   2 a_a53c
    264   2 $dFloating_a53d
    265   2 a_a53g
    266   2 $dFloating_a53h
    267   2 a_a53j
    268   2 $dFloating_a53k
    269   2 a_a53m
    270   2 $dFloating_a53n
    271   2 a_a53r
    272   2 $dFloating_a53s
    273   2 a_a53w
    274   2 $dFloating_a53x
    275   2 a_a53z
    276   2 $dFloating_a53A
    277   2 a_a53C
    278   2 $dFractional_a53D
    279   2 a_a53G
    280   2 $dFractional_a53H
    281   2 a_a5bt
    282   2 $dNum_a5bu
    283   2 a_a5mH
    284   2 b_a5mI
    285   2 f_a5mJ
    286   2 x_a5mK
    287   2 dt_d5et
    288   2 dt_d5eu
    289   1 eta_a535
    290   1 eta_a53e
    291   1 eta1_a53f
    292   1 eta_a53i
    293   1 eta_a53l
    294   1 eta_a53t
    295   1 x_a53y
    296   1 eta_a53B
    297   1 eta_a53I
    298   1 i_a5mT
    299   1 x_a5oJ
    300   1 w_a5p8
    301   1 w1_a5p9
    302   1 ds_a5K6
    303   1 ds_a5Kc
    304   1 ds_a5Kh
    305   1 ds_a5Kp
    306   1 sc_a5PU
    307   1 sc1_a5PV
    308   1 ds_d4AC
    309   1 ds_d4BO
    310   1 ds_d4CI
    311   1 ds_d50R
    312   1 ds_d52M
    313   1 w_s5RT
    314 9 CaseOfCase
    315   2 wild_X9
    316   2 wild_XV
    317   2 dt_X1dU
    318   2 wild1_a5J2
    319   1 ww_s5RV
    320 7664 KnownBranch
    321   1685 wild1_a5J2
    322   1681 wild_a5IY
    323   729 wild_a5IO
    324   729 wild1_a5IS
    325   415 wild_a5Jd
    326   415 wild1_a5Jh
    327   248 wild_X9
    328   137 dt_X1fn
    329   137 dt_X1fq
    330   137 dt_X1ft
    331   137 dt_X1fw
    332   137 dt_X1fz
    333   137 dt_X1fC
    334   137 dt_X1fF
    335   137 dt_X1fI
    336   102 wild_a5JC
    337   36 dt_X1eZ
    338   36 dt_X1f2
    339   25 wild_XV
    340   21 dt_X1dU
    341   19 dt_X1ev
    342   19 dt_X1ey
    343   19 dt_X1eB
    344   19 dt_X1eE
    345   19 dt_X1f5
    346   19 dt_X1f8
    347   19 ww_s5RV
    348   17 dt_X1eN
    349   17 dt_X1eQ
    350   17 dt_X1eT
    351   17 dt_X1eW
    352   14 dt_X1eD
    353   14 dt_X1eG
    354   14 wild_a5II
    355   12 wild_Xf
    356   11 dt_X1el
    357   11 dt_X1eo
    358   11 dt_X1er
    359   11 dt_X1eu
    360   11 dt_X1f9
    361   11 dt_X1fc
    362   11 dt_X1ff
    363   11 dt_X1fi
    364   9 wild_a5mU
    365   8 wild_Xg
    366   8 dt_X1ee
    367   6 wild_Xd
    368   5 wild_Xb
    369   5 wild_XW
    370   5 dt_X1e0
    371   5 dt_X1e3
    372   5 dt_X1e6
    373   5 dt_X1e8
    374   5 dt_X1eb
    375   4 wild_Xa
     135687 LetFloatFromLet 687
     1362 EtaReduction
     137  1 eta_B2
     138  1 x_a5TK
     13914750 BetaReduction
     140  1681 ds_a65p
     141  1681 ds1_a65q
     142  729 ds_a64o
     143  729 ds1_a64p
     144  415 ds_a65z
     145  415 ds1_a65A
     146  354 dt_d5SW
     147  354 dt_d5SX
     148  354 dt_d5SY
     149  354 dt_d5SZ
     150  354 dt_d5T0
     151  354 dt_d5T1
     152  354 dt_d5T2
     153  354 dt_d5T3
     154  264 dt_d5Po
     155  264 dt_d5Pp
     156  264 dt_d5Pq
     157  264 dt_d5Pr
     158  264 dt_d5Rw
     159  264 dt_d5Rx
     160  264 dt_d5Ry
     161  264 dt_d5Rz
     162  242 vx_a63F
     163  226 ds_d5kP
     164  137 dt_a1Gh
     165  137 dt_a1Gi
     166  137 dt_a1Gj
     167  137 dt_a1Gk
     168  137 dt_a1Gl
     169  137 dt_a1Gm
     170  137 dt_a1Gn
     171  137 dt_a1Go
     172  102 a_11
     173  102 b_12
     174  102 r_1i
     175  102 v_B1
     176  102 v_B2
     177  102 ds_a65W
     178  88 dt_d5OG
     179  88 dt_d5OH
     180  88 dt_d5OI
     181  88 dt_d5OJ
     182  88 dt_d5Se
     183  88 dt_d5Sf
     184  88 dt_d5Sg
     185  88 dt_d5Sh
     186  55 a_a64W
     187  55 $dNum_a64X
     188  55 $dIntegral_a64Y
     189  55 x_a64Z
     190  44 dt_d5LY
     191  44 dt_d5Q6
     192  44 dt_d5Q7
     193  25 dt_a1EV
     194  19 dt_a1Ft
     195  19 dt_a1Fu
     196  19 dt_a1Fv
     197  19 dt_a1Fw
     198  19 dt_a1FX
     199  19 dt_a1FY
     200  19 dt_a1FZ
     201  19 dt_a1G0
     202  17 dt_a1FJ
     203  17 dt_a1FK
     204  17 dt_a1FL
     205  17 dt_a1FM
     206  17 dt_a1FN
     207  17 dt_a1FO
     208  17 dt_d5Uj
     209  17 dt_d5Uk
     210  17 dt_d5Ul
     211  17 dt_d5Um
     212  17 dt_d5Un
     213  17 dt_d5Uo
     214  17 dt_d5Up
     215  17 dt_d5Uq
     216  14 dt_a1FD
     217  14 dt_a1FE
     218  14 ds_a64j
     219  14 dt_d5QM
     220  14 dt_d5QN
     221  14 dt_d5QO
     222  14 dt_d5QP
     223  14 dt_d5QQ
     224  14 dt_d5QR
     225  11 dt_a1Fj
     226  11 dt_a1Fk
     227  11 dt_a1Fl
     228  11 dt_a1Fm
     229  11 dt_a1G7
     230  11 dt_a1G8
     231  11 dt_a1G9
     232  11 dt_a1Ga
     233  8 i_a63N
     234  8 ds_d5wR
     235  8 ds_d5wS
     236  8 dt_d5MD
     237  8 dt_d5ME
     238  8 dt_d5MF
     239  8 dt_d5Nk
     240  8 dt_d5Nl
     241  8 dt_d5Nm
     242  7 int_a1BT
     243  6 ds_d5kO
     244  6 w_s6Mj
     245  6 w_s6Mk
     246  5 dt_a1EZ
     247  5 dt_a1F0
     248  5 dt_a1F1
     249  5 dt_a1F7
     250  5 dt_a1F8
     251  5 dt_a1F9
     252  5 x_a5LB
     253  4 x_a1BU
     254  4 ds_a63S
     255  4 dt_d5O1
     256  4 dt_d5TZ
     257  4 dt_d5U0
     258  4 dt_d5U1
     259  4 dt_d5U2
     260  4 dt_d5Ub
     261  4 dt_d5Uc
     262  4 dt_d5Ud
     263  4 dt_d5Ue
     264  3 dt_a1Ff
     265  3 x_a5Li
     266  3 y_a5Lj
     267  3 x_a5LX
     268  3 w_s6Mz
     269  2 function_a1BV
     270  2 cliffor_a1BW
     271  2 r_a1C1
     272  2 a_a5L7
     273  2 $dFloating_a5L8
     274  2 a_a5Le
     275  2 $dFloating_a5Lf
     276  2 a_a5Lk
     277  2 $dFloating_a5Ll
     278  2 a_a5Lp
     279  2 $dFloating_a5Lq
     280  2 a_a5Lt
     281  2 $dFloating_a5Lu
     282  2 a_a5Lx
     283  2 $dFloating_a5Ly
     284  2 a_a5LC
     285  2 $dFloating_a5LD
     286  2 a_a5LH
     287  2 $dFloating_a5LI
     288  2 a_a5LK
     289  2 $dFloating_a5LL
     290  2 a_a5LP
     291  2 $dFractional_a5LQ
     292  2 a_a5LU
     293  2 $dFractional_a5LV
     294  2 a_a5TI
     295  2 $dNum_a5TJ
     296  2 a_a63B
     297  2 b_a63C
     298  2 f_a63D
     299  2 x_a63E
     300  1 x_a5La
     301  1 x_a5Ln
     302  1 y_a5Lo
     303  1 x_a5Ls
     304  1 x_a5Lw
     305  1 x_a5LG
     306  1 x_a5LJ
     307  1 x_a5LO
     308  1 x_a5LS
     309  1 y_a5LT
     310  1 x_a5TK
     311  1 y_a5TL
     312  1 x_a66T
     313  1 ds_a67a
     314  1 ds_a67g
     315  1 ds_a67l
     316  1 ds_a67s
     317  1 w_a67O
     318  1 w1_a67P
     319  1 dt_d5TO
     320  1 dt_d5TP
     321  1 dt_d5TQ
     322  1 dt_d5TR
     323  1 dt_d5TS
     324  1 dt_d5TT
     325  1 dt_d5TV
     326  1 dt_d5TW
     327  1 dt_d5TX
     328  1 dt_d5TY
     329  1 dt_d5U5
     330  1 dt_d5U6
     331  1 dt_d5U7
     332  1 dt_d5U8
     333  1 dt_d5U9
     334  1 dt_d5Ua
     335  1 dt_d5Uf
     336  1 dt_d5Ug
     337  1 dt_d5Uh
     338  1 dt_d5Ui
     339  1 dt_d5XU
     340  1 dt_d5XV
     341  1 w_s6M9
     3427 CaseOfCase
     343  2 wild_X16
     344  2 vx_a63F
     345  1 wild_X9
     346  1 wild_X15
     347  1 ww_s6Mb
     34813525 KnownBranch
     349  1681 wild_a65r
     350  1681 wild1_a65v
     351  1211 wild_Xo
     352  888 wild_Xf
     353  884 wild_Xg
     354  729 wild_a64q
     355  729 wild1_a64u
     356  565 wild_X9
     357  444 wild_Xd
     358  415 wild_a65B
     359  415 wild1_a65F
     360  266 wild_Xa
     361  242 wild_X1i
     362  242 vx_a63F
     363  223 wild_Xb
     364  223 wild_X16
     365  222 wild_Xj
     366  222 wild_Xn
     367  221 wild_Xc
     368  178 wild_Xe
     369  155 wild_Xk
     370  137 dt_X1Gq
     371  137 dt_X1Gt
     372  137 dt_X1Gw
     373  137 dt_X1Gz
     374  137 dt_X1GC
     375  137 dt_X1GF
     376  137 dt_X1GI
     377  137 dt_X1GL
     378  102 wild_a65X
     379  36 dt_X1G2
     380  36 dt_X1G5
     381  25 dt_X1EX
     382  19 dt_X1Fy
     383  19 dt_X1FB
     384  19 dt_X1FE
     385  19 dt_X1FH
     386  19 dt_X1G8
     387  19 dt_X1Gb
     388  17 dt_X1FQ
     389  17 dt_X1FT
     390  17 dt_X1FW
     391  17 dt_X1FZ
     392  14 dt_X1FG
     393  14 dt_X1FJ
     394  14 wild_a64k
     395  12 wild_X15
     396  11 dt_X1Fo
     397  11 dt_X1Fr
     398  11 dt_X1Fu
     399  11 dt_X1Fx
     400  11 dt_X1Gc
     401  11 dt_X1Gf
     402  11 dt_X1Gi
     403  11 dt_X1Gl
     404  11 ww_s6Mb
     405  8 dt_X1Fh
     406  7 wild_a63O
     407  5 dt_X1F3
     408  5 dt_X1F6
     409  5 dt_X1F9
     410  5 dt_X1Fb
     411  5 dt_X1Fe
     412  4 wild_X2i
     413  4 wild_a63T
     414  1 wild_X2a
     415  1 wild_a66U
     416  1 wild_a67b
     417  1 wild_a67h
     418  1 wild_a67m
     419  1 wild_a67t
     420  1 ww_a67Q
     421  1 ds_d5JX
     422  1 ww_s6MB
     423  1 ww_s6MH
     424  1 ww_s6Ne
     42518 SimplifierDone 18
     42676 AltMerge
     427  8 wild_X4B
     428  8 wild_X4C
     429  8 wild_X4E
     430  8 wild_X4F
     431  8 wild_X4G
     432  6 wild_X4D
    376433  4 wild_Xe
    377   4 wild_a5mZ
    378   3 wild_Xj
    379   3 wild_Xn
     434  4 wild_Xg
     435  4 wild_X4H
     436  3 wild_X4x
     437  3 wild_X4y
     438  3 wild_X4I
     439  2 wild_Xa
    380440  2 wild_Xc
    381   2 wild_Xk
    382   2 wild_Xo
    383   1 wild_X1C
    384   1 wild_X1M
    385   1 wild_X1O
    386   1 wild_a5oK
    387   1 ww_a5pa
    388   1 wild_a5K7
    389   1 wild_a5Kd
    390   1 wild_a5Ki
    391   1 wild_a5Kq
    392   1 ds_d4AR
    393   1 ds_d4AS
    394   1 ds_d51T
    395   1 ww_s5Sj
    396 2 CaseMerge 2 vx_a5mL
    397 1 CaseIdentity 1 wild_X9
    398 21 SimplifierDone 21
     441  2 wild_X4z
     442  2 wild_X4A
     443  1 wild_X4J
    399444

(Left / before / red: GHC 8.0.2; right / after / green: GHC 8.4.3).

comment:16 Changed 13 months ago by NathanWaivio

I have found an undocumented flag: "-fno-worker-wrapper". When enabled the original code compiles in 43.20 seconds (37x improvement in time), and uses 660MB max (48x improvement in space) with GHC 8.4.2. All of the tests pass for the library and the benchmark still performs at the improved speed. It seems like the Worker/Wrapper Transformation is causing issues with this code. Why is that? Perhaps Worker/Wrapper shouldn't be run in certain circumstances. What do you think?

comment:17 Changed 13 months ago by tdammers

Sounds like a promising lead, if not the key to figuring this out. Worker/wrapper is generally a good thing to do; in theory, all it should do is turn arguments to recursive function calls closures, thus reducing the argument-passing overhead. But I can imagine that this would also get in the way of inlinings and RULES.

So what we're looking for is code that follows the pattern that makes it a candidate for worker/wrapper optimization (parameters passed through recursive calls unchanged), such that the worker/wrapper optimization breaks up expressions that might otherwise trigger useful inlinings or rewrites.

comment:18 Changed 10 months ago by NathanWaivio

I've continued to investigate and found an alternate work around. I changed every $! in the code to a $ I would get the improved compile performance even with worker-wrapper active. I'm not sure what that means.

Note: See TracTickets for help on using tickets.