Opened 10 years ago

Closed 10 years ago

#3331 closed bug (fixed)

control-monad-queue performance regression

Reported by: lpsmith Owned by:
Priority: normal Milestone: 6.12 branch
Component: Compiler Version: 6.10.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

$ cabal unpack control-monad-queue
$ cd control-monad-queue-0.0.9.1/tests
$ ghc --make -O2 Time.hs -i..
$ ./Time allison 34 20
$ ./Time corec1  34 20
$ ./Time corec2  34 20

Runs anywhere from 16-32% slower on GHC 6.10.3 than GHC 6.8.3, and allocates significantly more memory.

Attachments (2)

A.hs (987 bytes) - added by igloo 10 years ago.
Time.hs (2.2 KB) - added by igloo 10 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 Changed 10 years ago by igloo

difficulty: Unknown

Here are some numbers when compiling with -O and -O2, and with 6.8.2 and 6.10.3.

In particular, note the bytes allocated in the heap (the first number after <<ghc:, which is roughly half for 6.8.2 -O2.

$ ghc-6.8.2 --make Time -O2 -Wall
$ time ./Time allison 34 20 +RTS -tstderr
./Time allison 34 20 +RTS -tstderr
allison
Timings: [74,70,69,69,69,70,69,69,69,69,69,68,69,68,69,69,68,70,68,69]
Sum:     1384
Minimum: 68
Maximum: 74
Mean:    69.2
Stddev:  1.2489995996796797
<<ghc: 7698344320 bytes, 14933 GCs, 10787493/25911296 avg/max bytes residency (248 samples), 74M in use, 0.00 INIT (0.00 elapsed), 3.36 MUT (3.47 elapsed), 10.50 GC (10.46 elapsed) :ghc>>
./Time allison 34 20 +RTS -tstderr  13.86s user 0.07s system 99% cpu 13.939 total
$ ghc-6.8.2 --make Time -O -Wall
$ time ./Time allison 34 20 +RTS -tstderr
./Time allison 34 20 +RTS -tstderr 
allison
Timings: [88,82,82,82,82,82,82,82,83,82,82,81,83,82,83,82,82,82,82,83]
Sum:     1649
Minimum: 81
Maximum: 88
Mean:    82.45
Stddev:  1.3592277219068187
<<ghc: 15028226896 bytes, 28919 GCs, 10697618/26333184 avg/max bytes residency (253 samples), 74M in use, 0.00 INIT (0.00 elapsed), 4.91 MUT (4.92 elapsed), 11.57 GC (11.64 elapsed) :ghc>>
./Time allison 34 20 +RTS -tstderr  16.48s user 0.09s system 100% cpu 16.568 total
$ ghc-6.10.3 --make Time -O2 -Wall
$ time ./Time allison 34 20 +RTS -t
./Time allison 34 20 +RTS -t 
allison
Timings: [92,86,86,86,86,86,87,86,86,86,85,85,85,85,85,85,85,86,85,85]
Sum:     1718
Minimum: 85
Maximum: 92
Mean:    85.9
Stddev:  1.5132745950421556
<<ghc: 15229851816 bytes, 29050 GCs, 10585871/25713256 avg/max bytes residency (254 samples), 76M in use, 0.00 INIT (0.00 elapsed), 5.64 MUT (6.26 elapsed), 11.56 GC (11.00 elapsed) :ghc>>
./Time allison 34 20 +RTS -t  17.20s user 0.07s system 100% cpu 17.261 total
$ ghc-6.10.3 --make Time -O -Wall
$ time ./Time allison 34 20 +RTS -t
./Time allison 34 20 +RTS -t 
allison
Timings: [102,96,95,96,94,95,96,95,98,96,96,96,95,94,95,96,95,96,94,96]
Sum:     1916
Minimum: 94
Maximum: 102
Mean:    95.8
Stddev:  1.6911534525287764
<<ghc: 18790076976 bytes, 35841 GCs, 11806004/25067440 avg/max bytes residency (229 samples), 74M in use, 0.00 INIT (0.00 elapsed), 7.46 MUT (7.43 elapsed), 11.67 GC (11.81 elapsed) :ghc>>
./Time allison 34 20 +RTS -t  19.13s user 0.12s system 100% cpu 19.252 total

Changed 10 years ago by igloo

Attachment: A.hs added

Changed 10 years ago by igloo

Attachment: Time.hs added

comment:2 Changed 10 years ago by igloo

The above numbers are for the attached Time.hs and A.hs.

comment:3 Changed 10 years ago by igloo

Milestone: 6.12 branch

comment:4 Changed 10 years ago by simonmar

Type of failure: Runtime performance bug

comment:5 Changed 10 years ago by simonpj

Resolution: fixed
Status: newclosed

Happily the HEAD is better than 6.08. Here are the 6.08 timings for 'allison':

bash-3.2$ ./Time allison 34 20 +RTS -sstderr 
allison
Timings: [84,77,77,78,77,76,78,77,76,76,78,76,76,77,77,77,77,76,76,77]
Sum:     1543
Minimum: 76
Maximum: 84
Mean:    77.15
Stddev:  1.7109938632268673
7,698,631,424 bytes allocated in the heap
5,371,953,432 bytes copied during GC (scavenged)
    293,760 bytes copied during GC (not scavenged)
 25,944,064 bytes maximum residency (241 sample(s))

      14685 collections in generation 0 (  6.06s)
        241 collections in generation 1 (  5.46s)

         74 Mb total memory in use

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    3.92s  (  3.89s elapsed)
  GC    time   11.52s  ( 11.64s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   15.44s  ( 15.53s elapsed)

  %GC time      74.6%  (74.9% elapsed)

  Alloc rate    1,964,733,507 bytes per MUT second

  Productivity  25.4% of total user, 25.2% of total elapsed


real	0m15.545s
user	0m15.437s
sys	0m0.107s

And with the HEAD:

bash-3.2$ ./Time allison 34 20 +RTS -sstderr 
allison
Timings: [80,73,72,72,71,72,72,72,73,72,72,72,71,72,71,71,71,71,71,71]
Sum:     1442
Minimum: 71
Maximum: 80
Mean:    72.1
Stddev:  1.9209372712298545
   7,698,487,056 bytes allocated in the heap
   5,357,441,720 bytes copied during GC
      25,815,200 bytes maximum residency (238 sample(s))
         848,400 bytes maximum slop
              74 MB total memory in use (1 MB lost due to fragmentation)

  Generation 0: 14447 collections,     0 parallel,  5.50s,  5.55s elapsed
  Generation 1:   238 collections,     0 parallel,  4.85s,  4.91s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    4.07s  (  4.06s elapsed)
  GC    time   10.36s  ( 10.47s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   14.42s  ( 14.53s elapsed)

  %GC time      71.8%  (72.0% elapsed)

  Alloc rate    1,891,795,331 bytes per MUT second

  Productivity  28.2% of total user, 28.0% of total elapsed


real	0m14.550s
user	0m14.426s
sys	0m0.121s

Notice that allocation is essentially unchanged, but elapsed time seems a bit better. I don't know why.

When I tried this initially with HEAD I got:

bash-3.2$ time ./Time allison 34 20 +RTS -sstderr
./Time allison 34 20 +RTS -sstderr 
allison
Timings: [79,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Sum:     79
Minimum: 0
Maximum: 79
Mean:    3.95
Stddev:  17.21765082698566
     385,451,912 bytes allocated in the heap
     267,303,984 bytes copied during GC
      24,657,920 bytes maximum residency (13 sample(s))
         805,320 bytes maximum slop
              73 MB total memory in use (1 MB lost due to fragmentation)

  Generation 0:   723 collections,     0 parallel,  0.28s,  0.28s elapsed
  Generation 1:    13 collections,     0 parallel,  0.24s,  0.31s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.21s  (  0.21s elapsed)
  GC    time    0.52s  (  0.59s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.73s  (  0.80s elapsed)

  %GC time      71.4%  (73.5% elapsed)

  Alloc rate    1,853,428,245 bytes per MUT second

  Productivity  28.5% of total user, 25.9% of total elapsed


real	0m0.818s
user	0m0.726s
sys	0m0.088s

So the HEAD is better at some kind of inlining or fusion or full laziness: it's combining all the test runs into one! So to get the fair comparision I had to change Time.hs. The two changes are:

test' string f i n = do
  ts <- mapM (test string . f) [(k,i) | k <- [1..n]]
  ... blah...

and

fromQueue f (_,n) = length (f (T.fib n)) == fib n - 1

fromUnit  f (_,n) = (f (T.fib n)) == ()

That is, the test functions take a parameter they don't use. Otherwise full laziness makes test' too efficient. You may want to make this change in the package itself.

I have not tried with GHC 6.12, because even if it's slower, I doubt we'll go back to fix it.

Simon

Note: See TracTickets for help on using tickets.