Opened 9 years ago

Closed 8 years ago

#4965 closed bug (fixed)

60% performance regression in continuation-heavy code between 6.12 and 7

Reported by: bos Owned by: igloo
Priority: high Milestone: 7.2.1
Component: Compiler Version: 7.0.1
Keywords: Cc: kolmodin@…, johan.tibell@…, hvr@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I have a JSON parsing library whose performance has regressed by a little over 60% between 6.12.3 and the current head of the 7.0 branch.

The library uses my attoparsec parsing library, which is written in a heavily continuation-based style.

Building the code

Reproducing the problem is pretty easy.

You need two git repos:

git clone git://github.com/bos/attoparsec
git clone git://github.com/mailrank/aeson

They're easily built:

(cd attoparsec && cabal install)
(cd aeson && cabal install)

Measuring performance

Inside the aeson repo are some benchmarks, of which AesonParse is the only interesting one for our purposes.

cd aeson/benchmarks && make AesonParse

There's also prepackaged test data, so you can run a benchmark on the spot.

./AesonParse 1000 json-data/twitter100.json

This program accepts a number of loop iterations as its first argument, and a data file as its second. It prints its execution time, so a smaller number indicates better performance.

Performance with 6.12.3

Here's what I see with 6.12.3, on my MacBook Pro running OS X 10.6.5.

$ ./AesonParse 1000 json-data/twitter100.json 
json-data/twitter100.json:
  1000 good, 1.847631s

This number is good - it's about as fast as Python's C-based json parser.

Performance with GHC 7

You can pass a ghc argument to make to build the benchmark with a different compiler.

make clean
make AesonParse ghc=/usr/local/ghc/7/bin/ghc

And with the benchmark rebuilt, here's the performance:

$ ./AesonParse7 1000 json-data/twitter100.json 
json-data/twitter100.json:
  1000 good, 2.939818s

This is one of the very few instances I've seen of a performance regression with GHC 7, and unfortunately it's very large.

Please let me know what other information I can provide.

Attachments (3)

aeson-ghc-6.12.3-32bit.pdf (20.6 KB) - added by bos 8 years ago.
Heap profile (+RTS -hT) with 32-bit GHC 6.12.3
aeson-ghc-7.0.3-32bit.pdf (21.0 KB) - added by bos 8 years ago.
Heap profile (+RTS -hT) with 32-bit GHC 7.0.3
aeson-ghc-head-32bit.pdf (20.5 KB) - added by bos 8 years ago.
Heap profile (+RTS -hT) with 32-bit GHC HEAD (April 5)

Download all attachments as: .zip

Change History (18)

comment:1 Changed 9 years ago by simonpj

That's bad. Thanks for telling us. We'll look into it.

comment:2 Changed 9 years ago by simonmar

You said you're using the 7.0 branch, which has some known performance regressions (see #4951). I've pushed some patches today that improve things, so if you could try again with the latest branch and let us know the results that would be helpful.

comment:3 Changed 9 years ago by bos

Simon, your patches from today do indeed improve the performance:

./AesonParse7new 1000 json-data/twitter100.json 
json-data/twitter100.json:
  1000 good, 2.284175s

That brings us from a 60% regression to 25%.

comment:4 Changed 9 years ago by maltem

See also this post suggesting that this is a problem only on 64-bit.

comment:5 Changed 9 years ago by simonpj

OK well I managed to get it built ok. Here are my numbers (on a 64-bit Linux machine). With 6.12.3:

./AesonParse 1000 json-data/twitter100.json +RTS -s 
json-data/twitter100.json:
  1000 good, 3.456034s
   4,198,375,016 bytes allocated in the heap
     498,417,024 bytes copied during GC
         275,208 bytes maximum residency (334 sample(s))
         162,648 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  7005 collections,     0 parallel,  0.93s,  0.94s elapsed
  Generation 1:   334 collections,     0 parallel,  0.13s,  0.18s elapsed

  INIT  time    0.01s  (  0.00s elapsed)
  MUT   time    2.34s  (  2.33s elapsed)
  GC    time    1.06s  (  1.13s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    3.41s  (  3.46s elapsed)

  %GC time      31.1%  (32.7% elapsed)

  Alloc rate    1,786,542,560 bytes per MUT second

  Productivity  68.6% of total user, 67.7% of total elapsed

With HEAD

./AesonParse 1000 json-data/twitter100.json +RTS -s 
json-data/twitter100.json:
  1000 good, 3.594115s
   4,033,775,472 bytes allocated in the heap
     504,649,504 bytes copied during GC
         260,344 bytes maximum residency (334 sample(s))
         156,400 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  7037 collections,     0 parallel,  0.90s,  0.90s elapsed
  Generation 1:   334 collections,     0 parallel,  0.14s,  0.14s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    2.55s  (  2.55s elapsed)
  GC    time    1.04s  (  1.04s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    3.60s  (  3.60s elapsed)

  %GC time      28.9%  (28.9% elapsed)

  Alloc rate    1,578,226,103 bytes per MUT second

  Productivity  71.0% of total user, 71.0% of total elapsed

So HEAD allocates a bit less, but consistently takes a bit longer (3.6s against 3.45s). At the moment. I don't think I can unpick where that time is going; it it was in bigger allocation then ticky would show, but it isn't. So I'm not sure how to proceed here.

Simon

comment:6 Changed 9 years ago by tibbe

You might want to try on a 32-bit machine as well. Lennart noticed some big 32 vs 64-bit difference in other continuation heavy code (Data.Binary): http://lennartkolmodin.blogspot.com/2011/02/binary-by-numbers.html

comment:7 Changed 9 years ago by kolmodin

Cc: kolmodin@… added

comment:8 Changed 9 years ago by tibbe

Cc: johan.tibell@… added

comment:9 Changed 9 years ago by igloo

Milestone: 7.2.1
Priority: normalhigh

comment:10 Changed 8 years ago by bos

Confirmed that we're down to a 22% (or so) regression in 7.0.3. I'm glad to see the performance has further improved with HEAD - thanks. When I have time, I'll build HEAD and see what I can see.

Changed 8 years ago by bos

Attachment: aeson-ghc-6.12.3-32bit.pdf added

Heap profile (+RTS -hT) with 32-bit GHC 6.12.3

Changed 8 years ago by bos

Attachment: aeson-ghc-7.0.3-32bit.pdf added

Heap profile (+RTS -hT) with 32-bit GHC 7.0.3

comment:11 Changed 8 years ago by bos

I've attached two heap profile outputs, one generated with 6.12.3 and another with 7.0.3, both 32-bit.

For this particular test case (an 8.3MB JSON file), 6.12.3 is about 66% faster than 7.0.3, and uses a little over half the memory. It looks like the 22% performance difference only applies to small inputs. The larger the input, the bigger the advantage that 6.12.3 has.

64-bit GHC 7.0.3 uses almost 300MB of heap here, which is pretty painful.

Changed 8 years ago by bos

Attachment: aeson-ghc-head-32bit.pdf added

Heap profile (+RTS -hT) with 32-bit GHC HEAD (April 5)

comment:12 Changed 8 years ago by bos

I checked with HEAD, and performance is back to 6.12.3 levels: allocation is about the same, and performance is too, even on the large input.

However, I notice that -hT doesn't seem to be counting wall clock time properly as of 7.0.3, and this persists in HEAD too. Look at the PDFs I attached to this bug: the first (with 6.12.3) counts time correctly, while the second and third under-count by a large amount. For example, the actual wall-clock time for HEAD was the same as 6.12.3, about 15 seconds, but the X axis in the chart only runs up to 5 seconds. Weird.

comment:13 Changed 8 years ago by hvr

Cc: hvr@… added

comment:14 Changed 8 years ago by simonpj

Owner: set to igloo

Well it's a real pity about 7.0.3, but I rather doubt it's worth the upheaval of yet another release.

I've made a new ticket #5096 for the profiling question.

Ian: it'd be ideal to have a perf/ test, so that we'll know if this one goes bad on us again? But there is a big wad of library code involved, so maybe it's infeasible. Any opinions anyone?

And then, I suppose, close.

comment:15 Changed 8 years ago by simonmar

Resolution: fixed
Status: newclosed

Not feasible to add a test, I think.

Note: See TracTickets for help on using tickets.