Opened 9 years ago

Closed 8 years ago

#4856 closed bug (fixed)

Performance regression in the type checker regression for GADTs and type families

Reported by: chak Owned by:
Priority: normal Milestone: 7.2.1
Component: Compiler (Type checker) Version: 7.0.1
Keywords: Cc: dimitris@…, verdelyi@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The GHC 7.0.2 RC1 shows poor performance when compiling Data.Array.Accelerate. In particular, to type check a particular module, GHC 7.02 takes several minutes, while GHC's resident memory grows to 350MB. In contrast, GHC 6.12.3 compiles the same module in a few seconds.

How to reproduce the problem:

  1. Download the latest version of Data.Array.Accelerate with darcs from http://code.haskell.org/accelerate/
  2. Change directory to the new accelerate darcs repo.
  3. Invoke GHCi as ghci -Iinclude.
  4. Issue the GHCi command: :l Data/Array/Accelerate/Smart.hs

You will notice that Data.Arra.Accelerate.Array.Sugar, which is heavy in type classes and type families, already requires a noticeable time to compile — this is already a performance regression from 6.12.3. The module Data.Array.Accelerate.Smart requires an even much longer time to compile.

Change History (11)

comment:1 Changed 9 years ago by chak

Type of failure: GHC rejects valid programCompile-time performance bug

comment:2 Changed 9 years ago by chak

I should probably point out that the performance regression is dramatic. Instead of a few seconds, Smart.hs takes something like 10min on my laptop to compile. It makes it practically infeasible to work at Data.Array.Accelerate with 7.0.1 (or the HEAD).

comment:3 Changed 9 years ago by igloo

Milestone: 7.0.3

I'll milestone this as 7.0.3 for now, but if 7.0.2 ends up being the DPH-release then we'll be going through the 7.0.3 tickets for it too.

comment:4 Changed 9 years ago by igloo

Priority: normalhigh

comment:5 Changed 9 years ago by chak

Please don't delay it until 7.0.3. This is has nothing to do with DPH. Data.Array.Accelerate is an array EDSL for GPU programming: http://hackage.haskell.org/package/accelerate

comment:6 Changed 9 years ago by simonpj

I'm working on it.

comment:7 Changed 9 years ago by simonpj

Cc: dimitris@… added

OK after this commit

Wed Jan 12 06:56:04 PST 2011  simonpj@microsoft.com
  * Major refactoring of the type inference engine
  
  This patch embodies many, many changes to the contraint solver, which
  make it simpler, more robust, and more beautiful.  But it has taken
  me ages to get right. The forcing issue was some obscure programs
  involving recursive dictionaries, but these eventually led to a
  massive refactoring sweep.

I compiled accelerate 0.9.0.0, with these results

ghc 1.12.3: 
  real   0m51.776s
  user   0m48.310s
  sys    0m1.750s

HEAD:
  real    0m49.007s
  user    0m46.420s
  sys     0m1.380s

So HEAD is just slightly faster. This conceals a difference, though:

  • 6.12 is faster on Data.Array.Accelerate.Array.Sugar (5s vs 12s)
  • 6.12 is slower on Data.Array.Accelerate.Smart

Dimitrios is going to investigate why we are slower on Sugar, and that's why I'll leave the ticket open, but meanwhile I think it's fast enough to use.

Simon

comment:8 Changed 9 years ago by verdelyi

Cc: verdelyi@… added

comment:9 Changed 8 years ago by igloo

Priority: highnormal

comment:10 Changed 8 years ago by dimitris

Simon, can you check that the performance is acceptable now and better than 6.12 and previous HEAD?

comment:11 Changed 8 years ago by simonpj

Resolution: fixed
Status: newclosed

OK, I ran a complete build of the accelerate library thus:

runhaskell Setup.hs clean
runhaskell Setup.hs configure --with-ghc=/home/simonpj/builds/validate-HEAD/inplace/bin/ghc-stage2
time runhaskell Setup.hs build 

I added +RTS -s -RTS to the opts in accelerate.cabal. Headline results:

  • GHC 6.12.3: 45 sec
  • GHC 7.0.3: 40 sec
  • GHC 7.2: 35 sec

Result, happiness. Details:

===================== GHC 6.12.3 ============================
  14,320,835,376 bytes allocated in the heap
   4,597,762,168 bytes copied during GC
     229,976,136 bytes maximum residency (27 sample(s))
      10,919,512 bytes maximum slop
             662 MB total memory in use (11 MB lost due to fragmentation)

  Generation 0: 27057 collections,     0 parallel,  9.89s,  9.95s elapsed
  Generation 1:    27 collections,     0 parallel,  5.90s,  6.33s elapsed

  Parallel GC work balance: -nan (0 / 0, ideal 1)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    ( 27.49s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    ( 27.54s)       0.00s    (  0.00s)
  Task  2 (bound)  :   26.04s    ( 27.54s)      15.79s    ( 16.28s)

  SPARKS: 0 (0 converted, 0 pruned)

  INIT  time    0.01s  (  0.00s elapsed)
  MUT   time   25.45s  ( 27.54s elapsed)
  GC    time   15.79s  ( 16.28s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   41.25s  ( 43.82s elapsed)

  %GC time      38.3%  (37.1% elapsed)

  Alloc rate    562,483,714 bytes per MUT second

  Productivity  61.7% of total user, 58.1% of total elapsed

Registering accelerate-0.9.0.0...

real	0m45.512s
user	0m43.370s
sys	0m1.100s

====================== GHC 7.0.3 =========================
  24,541,831,600 bytes allocated in the heap
   3,779,826,928 bytes copied during GC
     107,669,136 bytes maximum residency (28 sample(s))
      10,089,928 bytes maximum slop
             275 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 46519 collections,     0 parallel,  9.59s,  9.59s elapsed
  Generation 1:    28 collections,     0 parallel,  4.82s,  4.82s elapsed

  Parallel GC work balance: -nan (0 / 0, ideal 1)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    ( 24.32s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    ( 24.37s)       0.00s    (  0.00s)
  Task  2 (bound)  :   22.80s    ( 24.37s)      14.40s    ( 14.41s)

  SPARKS: 0 (0 converted, 0 pruned)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   22.81s  ( 24.37s elapsed)
  GC    time   14.40s  ( 14.41s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   37.21s  ( 38.78s elapsed)

  %GC time      38.7%  (37.2% elapsed)

  Alloc rate    1,075,947,339 bytes per MUT second

  Productivity  61.3% of total user, 58.8% of total elapsed

Registering accelerate-0.9.0.0...

real	0m40.254s
user	0m38.500s
sys	0m1.030s

================= GHC HEAD = 7.2 ===================
  15,255,857,904 bytes allocated in the heap
   3,268,823,408 bytes copied during GC
      76,216,592 bytes maximum residency (35 sample(s))
       2,914,432 bytes maximum slop
             196 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     29181 colls,     0 par    7.54s    7.54s     0.0003s    0.0042s
  Gen  1        35 colls,     0 par    4.56s    4.56s     0.1303s    0.2876s

  Parallel GC work balance: -nan (0 / 0, ideal 1)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    ( 33.58s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    ( 33.58s)       0.00s    (  0.00s)
  Task  2 (bound)  :   19.74s    ( 21.47s)      12.09s    ( 12.10s)

  SPARKS: 0 (0 converted, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   19.75s  ( 21.47s elapsed)
  GC      time   12.10s  ( 12.10s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   31.85s  ( 33.58s elapsed)

  Alloc rate    772,436,650 bytes per MUT second

  Productivity  62.0% of total user, 58.8% of total elapsed

Registering accelerate-0.9.0.0...

real	0m35.221s
user	0m33.700s
sys	0m0.770s

So I'm happy. It seems that Sugar is still a bit slow to compile compared with 6.12.3, and 6.12 allocates less overall than either 7.0.3 or HEAD. But the bottom line is good, so I'll close this ticket.

Simon

Note: See TracTickets for help on using tickets.