Opened 9 years ago
Closed 9 years ago
#4856 closed bug (fixed)
Performance regression in the type checker regression for GADTs and type families
Reported by: | chak | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.2.1 |
Component: | Compiler (Type checker) | Version: | 7.0.1 |
Keywords: | Cc: | dimitris@…, verdelyi@… | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Compile-time performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description
The GHC 7.0.2 RC1 shows poor performance when compiling Data.Array.Accelerate
. In particular, to type check a particular module, GHC 7.02 takes several minutes, while GHC's resident memory grows to 350MB. In contrast, GHC 6.12.3 compiles the same module in a few seconds.
How to reproduce the problem:
- Download the latest version of
Data.Array.Accelerate
with darcs from http://code.haskell.org/accelerate/ - Change directory to the new
accelerate
darcs repo. - Invoke GHCi as
ghci -Iinclude
. - Issue the GHCi command:
:l Data/Array/Accelerate/Smart.hs
You will notice that Data.Arra.Accelerate.Array.Sugar
, which is heavy in type classes and type families, already requires a noticeable time to compile — this is already a performance regression from 6.12.3. The module Data.Array.Accelerate.Smart
requires an even much longer time to compile.
Change History (11)
comment:1 Changed 9 years ago by
Type of failure: | GHC rejects valid program → Compile-time performance bug |
---|
comment:2 Changed 9 years ago by
comment:3 Changed 9 years ago by
Milestone: | → 7.0.3 |
---|
I'll milestone this as 7.0.3 for now, but if 7.0.2 ends up being the DPH-release then we'll be going through the 7.0.3 tickets for it too.
comment:4 Changed 9 years ago by
Priority: | normal → high |
---|
comment:5 Changed 9 years ago by
Please don't delay it until 7.0.3. This is has nothing to do with DPH. Data.Array.Accelerate
is an array EDSL for GPU programming: http://hackage.haskell.org/package/accelerate
comment:7 Changed 9 years ago by
Cc: | dimitris@… added |
---|
OK after this commit
Wed Jan 12 06:56:04 PST 2011 simonpj@microsoft.com * Major refactoring of the type inference engine This patch embodies many, many changes to the contraint solver, which make it simpler, more robust, and more beautiful. But it has taken me ages to get right. The forcing issue was some obscure programs involving recursive dictionaries, but these eventually led to a massive refactoring sweep.
I compiled accelerate 0.9.0.0
, with these results
ghc 1.12.3: real 0m51.776s user 0m48.310s sys 0m1.750s HEAD: real 0m49.007s user 0m46.420s sys 0m1.380s
So HEAD is just slightly faster. This conceals a difference, though:
- 6.12 is faster on
Data.Array.Accelerate.Array.Sugar
(5s vs 12s) - 6.12 is slower on
Data.Array.Accelerate.Smart
Dimitrios is going to investigate why we are slower on Sugar
, and that's why I'll leave the ticket open, but meanwhile I think it's fast enough to use.
Simon
comment:8 Changed 9 years ago by
Cc: | verdelyi@… added |
---|
comment:9 Changed 9 years ago by
Priority: | high → normal |
---|
comment:10 Changed 9 years ago by
Simon, can you check that the performance is acceptable now and better than 6.12 and previous HEAD?
comment:11 Changed 9 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
OK, I ran a complete build of the accelerate
library thus:
runhaskell Setup.hs clean runhaskell Setup.hs configure --with-ghc=/home/simonpj/builds/validate-HEAD/inplace/bin/ghc-stage2 time runhaskell Setup.hs build
I added +RTS -s -RTS
to the opts in accelerate.cabal
. Headline results:
- GHC 6.12.3: 45 sec
- GHC 7.0.3: 40 sec
- GHC 7.2: 35 sec
Result, happiness. Details:
===================== GHC 6.12.3 ============================ 14,320,835,376 bytes allocated in the heap 4,597,762,168 bytes copied during GC 229,976,136 bytes maximum residency (27 sample(s)) 10,919,512 bytes maximum slop 662 MB total memory in use (11 MB lost due to fragmentation) Generation 0: 27057 collections, 0 parallel, 9.89s, 9.95s elapsed Generation 1: 27 collections, 0 parallel, 5.90s, 6.33s elapsed Parallel GC work balance: -nan (0 / 0, ideal 1) MUT time (elapsed) GC time (elapsed) Task 0 (worker) : 0.00s ( 27.49s) 0.00s ( 0.00s) Task 1 (worker) : 0.00s ( 27.54s) 0.00s ( 0.00s) Task 2 (bound) : 26.04s ( 27.54s) 15.79s ( 16.28s) SPARKS: 0 (0 converted, 0 pruned) INIT time 0.01s ( 0.00s elapsed) MUT time 25.45s ( 27.54s elapsed) GC time 15.79s ( 16.28s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 41.25s ( 43.82s elapsed) %GC time 38.3% (37.1% elapsed) Alloc rate 562,483,714 bytes per MUT second Productivity 61.7% of total user, 58.1% of total elapsed Registering accelerate-0.9.0.0... real 0m45.512s user 0m43.370s sys 0m1.100s ====================== GHC 7.0.3 ========================= 24,541,831,600 bytes allocated in the heap 3,779,826,928 bytes copied during GC 107,669,136 bytes maximum residency (28 sample(s)) 10,089,928 bytes maximum slop 275 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 46519 collections, 0 parallel, 9.59s, 9.59s elapsed Generation 1: 28 collections, 0 parallel, 4.82s, 4.82s elapsed Parallel GC work balance: -nan (0 / 0, ideal 1) MUT time (elapsed) GC time (elapsed) Task 0 (worker) : 0.00s ( 24.32s) 0.00s ( 0.00s) Task 1 (worker) : 0.00s ( 24.37s) 0.00s ( 0.00s) Task 2 (bound) : 22.80s ( 24.37s) 14.40s ( 14.41s) SPARKS: 0 (0 converted, 0 pruned) INIT time 0.00s ( 0.00s elapsed) MUT time 22.81s ( 24.37s elapsed) GC time 14.40s ( 14.41s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 37.21s ( 38.78s elapsed) %GC time 38.7% (37.2% elapsed) Alloc rate 1,075,947,339 bytes per MUT second Productivity 61.3% of total user, 58.8% of total elapsed Registering accelerate-0.9.0.0... real 0m40.254s user 0m38.500s sys 0m1.030s ================= GHC HEAD = 7.2 =================== 15,255,857,904 bytes allocated in the heap 3,268,823,408 bytes copied during GC 76,216,592 bytes maximum residency (35 sample(s)) 2,914,432 bytes maximum slop 196 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 29181 colls, 0 par 7.54s 7.54s 0.0003s 0.0042s Gen 1 35 colls, 0 par 4.56s 4.56s 0.1303s 0.2876s Parallel GC work balance: -nan (0 / 0, ideal 1) MUT time (elapsed) GC time (elapsed) Task 0 (worker) : 0.00s ( 33.58s) 0.00s ( 0.00s) Task 1 (worker) : 0.00s ( 33.58s) 0.00s ( 0.00s) Task 2 (bound) : 19.74s ( 21.47s) 12.09s ( 12.10s) SPARKS: 0 (0 converted, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.00s ( 0.00s elapsed) MUT time 19.75s ( 21.47s elapsed) GC time 12.10s ( 12.10s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 31.85s ( 33.58s elapsed) Alloc rate 772,436,650 bytes per MUT second Productivity 62.0% of total user, 58.8% of total elapsed Registering accelerate-0.9.0.0... real 0m35.221s user 0m33.700s sys 0m0.770s
So I'm happy. It seems that Sugar
is still a bit slow to compile compared with 6.12.3, and 6.12 allocates less overall than either 7.0.3 or HEAD. But the bottom line is good, so I'll close this ticket.
Simon
I should probably point out that the performance regression is dramatic. Instead of a few seconds,
Smart.hs
takes something like 10min on my laptop to compile. It makes it practically infeasible to work at Data.Array.Accelerate with 7.0.1 (or the HEAD).