Opened 11 years ago

Last modified 4 years ago

#3061 new bug

GHC's GC default heap growth strategy is not as good as other runtimes

Reported by: dons Owned by:
Priority: lowest Milestone:
Component: Runtime System Version: 6.10.1
Keywords: performance, GC Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: yes
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

This is a GC performance ticket. The benchmark is the binary-trees benchmark, and the issue is whether or not GHC's ability to grow the heap without a heap check is comparably worse than other similar language runtimes.

Looking at the binary-trees benchmark, we see that GHC does very well on a parallel system, when we give a GC hint to the size.

E.g. the attached binary-trees program is very very competitive:

Compile with:

    ghc -O2 -fasm --make -threaded A.hs 

Run with:

    $ /A 20  +RTS -N4 -H340M

And we get:

    $ time ./A +RTS -N4 -H340M -RTS 20          
stretch tree of depth 21	 check: -1
2097152	 trees of depth 4	 check: -2097152
524288	 trees of depth 6	 check: -524288
131072	 trees of depth 8	 check: -131072
32768	 trees of depth 10	 check: -32768
8192	 trees of depth 12	 check: -8192
2048	 trees of depth 14	 check: -2048
512	 trees of depth 16	 check: -512
128	 trees of depth 18	 check: -128
32	 trees of depth 20	 check: -32
long lived tree of depth 20	 check: -1
./A +RTS -N4 -H340M -RTS 20  17.08s user 0.30s system 315% cpu 5.511 total

However, without that GC hint performance deteriorates dramatically,

$ time ./A +RTS -N4  -RTS 20      
stretch tree of depth 21	 check: -1
2097152	 trees of depth 4	 check: -2097152
524288	 trees of depth 6	 check: -524288
131072	 trees of depth 8	 check: -131072
32768	 trees of depth 10	 check: -32768
8192	 trees of depth 12	 check: -8192
2048	 trees of depth 14	 check: -2048
512	 trees of depth 16	 check: -512
128	 trees of depth 18	 check: -128
32	 trees of depth 20	 check: -32
long lived tree of depth 20	 check: -1
./A +RTS -N4 -RTS 20  33.83s user 0.42s system 136% cpu 25.033 total

So 5x slow down.

Could the GC heap growth algorithm be tuned? Other language runtimes, that are slower than Haskell's on this benchmark when our GC hint is in play, don't seem to suffer as badly without the hint:

http://shootout.alioth.debian.org/u64q/benchmark.php?test=binarytrees&lang=all

See e.g. Erlang. So: is there a better growth algorithm?

Attachments (1)

A.hs (2.3 KB) - added by dons 11 years ago.
A parallel binary trees program

Download all attachments as: .zip

Change History (19)

Changed 11 years ago by dons

Attachment: A.hs added

A parallel binary trees program

comment:1 Changed 11 years ago by simonmar

difficulty: Unknown
Milestone: 6.12 branch

So you're not allowed to use GC settings in the shootout? I think this benchmark is pathological for our default GC settings. The reason is that we use a small fixed-size allocation area, which is usually good for cache behaviour, but in this benchmark when we get to the larger tree sizes, we always GC before the tree has been constructed, and the GC therefore has to copy the tree, possibly multiple times. In fact this is very similar to a standard GC benchmark that we use.

I'd be quite interested to know what other runtimes do here. I played a little with scaling up the size of the allocation area if we find we're copying a lot, but didn't see a dramatic improvement.

comment:2 Changed 11 years ago by simonmar

Type: bugrun-time performance bug

comment:3 in reply to:  1 Changed 11 years ago by dons

Replying to simonmar:

So you're not allowed to use GC settings in the shootout? I think this benchmark is pathological for our default GC settings. The reason is that we use a small fixed-size allocation area, which is usually good for cache behaviour, but in this benchmark when we get to the larger tree sizes, we always GC before the tree has been constructed, and the GC therefore has to copy the tree, possibly multiple times. In fact this is very similar to a standard GC benchmark that we use.

Currently you can't, no. But the question is why can the other languages do better without the hint (e.g. Java is twice as fast, though on single core we're much better)? I wrote more about this benchmark in a blog post:

http://donsbot.wordpress.com/2009/03/04/playing-with-ghcs-parallel-runtime/

comment:4 Changed 11 years ago by simonmar

I'm guessing Java -server, and Scala, which also uses Java -server, uses quite aggressive heap settings, using more memory (they both use over 500M). Whereas we opt for more a conservative policy by default, to use less memory. You could probably argue that -server is a GC setting :-)

comment:5 Changed 11 years ago by dons

BTW, here's the ruling for the shootout:

http://alioth.debian.org/tracker/index.php?func=detail&aid=311528&group_id=30402&atid=411005

"For many language implementations the binary-trees programs are all about GC and GC work can be evaded by setting a
larger initial heap, hinting that the heap will be in some size range, controlling when GC happens etc

We've taken an arbitrary approach - just default settings for all language implementations.

Maybe there's a better (rather than different arbitrary) approach?

Nothing persuasive has been suggested yet."

I'm out of ideas for making binary-trees parallelise better then :)

comment:6 Changed 10 years ago by simonmar

Type of failure: Runtime performance bug

comment:7 Changed 9 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:8 Changed 9 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:9 Changed 9 years ago by igloo

Milestone: 7.0.17.0.2

comment:10 Changed 9 years ago by igloo

Milestone: 7.0.27.2.1

comment:11 Changed 8 years ago by igloo

Milestone: 7.2.17.4.1

comment:12 Changed 8 years ago by igloo

Milestone: 7.4.17.6.1
Priority: lowlowest

comment:13 Changed 7 years ago by igloo

Milestone: 7.6.17.6.2

comment:14 Changed 5 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:15 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:16 Changed 5 years ago by thoughtpolice

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:17 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:18 Changed 4 years ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.