Opened 4 years ago

Closed 3 years ago

#11707 closed bug (fixed)

Don't desugar large lists with build

Reported by: bgamari Owned by:
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.10.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D2007, Phab:D2023
Wiki Page:

Description

When looking at T9661 I noticed that a massive list was being inlined into listArray, where foldr/build was being applied, turning a bunch of massive static data into a massive block of code.

It seems likely that keeping this as static data would be preferable here.

Change History (16)

comment:1 Changed 4 years ago by bgamari

There is currently a GHC flag which controls whether build desugaring is used at all, -fsimple-list-literals. To roughly characterize the effect of build on allocation, I did a nofib against 3f60ce8751c860a2bd0ddbe87429136a9d97449b run with and without -fsimple-list-literals. The major allocation changes were,

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           ansi          +0.0%    +10.8%     0.000     0.004      0.0%
           bspt          +0.1%     +2.7%     0.004     0.004      0.0%
      cacheprof          +0.1%     +3.3%     0.181     0.181      0.0%
          eliza          -0.2%     +0.8%     0.000     0.000      0.0%
          fluid          -0.1%     +0.7%     0.004     0.004      0.0%
         fulsom          +0.0%     +0.7%     0.125     0.125    -21.4%
      integrate          +0.0%   +152.6%     0.081     0.081     +7.4%
           lift          +0.0%     +4.5%     0.000     0.000      0.0%
       nucleic2          -0.3%     +2.4%     0.024     0.024      0.0%
         parser          -0.4%     +0.6%     0.011     0.012      0.0%
        reptile          -0.0%     +0.1%     0.004     0.004      0.0%
        rewrite          +0.1%    +96.7%     0.007     0.012      0.0%
(those tests with no change in allocations omitted)
--------------------------------------------------------------------------------
            Min          -0.4%     -0.2%     -1.0%     -2.0%    -21.4%
            Max          +0.1%   +152.6%     +0.3%     +0.6%     +7.4%
 Geometric Mean          -0.0%     +1.9%     -0.2%     -0.2%     -0.2%

There are a few things to note here:

  • Allocations, if they change at all, increase without build fusion, as we would expect
  • A fulsom had its total memory usage dramatically reduced without build fusion; this may be worth further investigation.
  • Runtime didn't appreciably change in any case
  • Binary sizes don't change

On the other hand, if I compile T9961 with -fsimple-list-literals, I see a dramatic reduction in compiler allocations (from 745044392 to 518841472, about 30%). Unfortunately it's not easy to measure the runtime cost of list construction with this test. I'll try to write up a simple Criterion benchmark to do this.

comment:2 Changed 4 years ago by bgamari

It turns out that getting fusion to fire as it does in T9961 is actually quite tricky. For instance, while listArray fuses (producing a large string of writeArray#s) in this case,

arr :: Array Int Int
arr = listArray (0,10) [ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1 ]  

It does not in this case,

arr = listArray (0,10) [ 1,1,1,1,1,1,1,1,1,1 ]  

I then realized that only the prefix of negated elements will be unrolled. For instance,

arr = listArray (0,10) [ -1,-1,-1,1,1,1,1,1,1,1 ]

will fuse the first three elements into the construction loop, which will look something like this,

Test2.arr1 =
  \ (@ s) (s1# :: GHC.Prim.State# s) ->
    case GHC.Prim.newArray# @ Int @ s 11 (GHC.Arr.arrEleBottom @ Int) s1#
    of _ { (# ipv, ipv1 #) ->
    case GHC.Prim.writeArray# @ s @ Int ipv1 0 Test2.arr13 ipv
    of s4# { __DEFAULT ->
    case GHC.Prim.writeArray# @ s @ Int ipv1 1 Test2.arr13 s4#
    of s4#1 { __DEFAULT ->
    letrec {
      go :: [Int] -> GHC.Prim.Int# -> GHC.Prim.State# s -> GHC.Prim.State# s
      go =
        \ (ds :: [Int]) (eta :: GHC.Prim.Int#) (eta1 :: GHC.Prim.State# s) ->
          case ds of _ {
            [] -> eta1;
            : y ys ->
              case GHC.Prim.writeArray# @ s @ Int ipv1 eta y eta1
              of s4#2 { __DEFAULT ->
              case eta of wild1 {
                __DEFAULT -> go ys (GHC.Prim.+# wild1 1) s4#2;
                10 -> s4#2
              }
              }
          }; } in {- ... -}

where arr11 = I# -1.

Last edited 4 years ago by bgamari (previous) (diff)

comment:3 Changed 4 years ago by simonpj

Why do integrate and rewrite regress so much?

comment:4 in reply to:  3 Changed 4 years ago by bgamari

Replying to simonpj:

Why do integrate and rewrite regress so much?

In the case of integrate we have,

  let  d = (u-l)/8.0 in
     d * sum 
      [ (f l)*0.5,
        f (l+d),
        f (l+(2.0*d)),
        f (l+(3.0*d)),
        f (l+(4.0*d)),
        f (u-(3.0*d)),
        f (u-(2.0*d)),
        f (u-d),
        (f u)*0.5]

which is precisely the sort of pattern I was concerned about.

comment:5 Changed 4 years ago by bgamari

I'm not as certain in the case of rewrite, which is a much larger program. This is one possibility,

group_rules = map parse_eqn [ 
              "(a * b) * c = a * (b * c)", 
              "E * x = x", 
               "I(x) * x = E" ]

Perhaps the more likely culprit is,

p_term = seQ q_func [p_ident, look_for '(', 
                      list_of p_expr ',', look_for ')'] ## p_prim

where, seQ involves foldr on its list argument and was clearly designed expecting fusion.

Last edited 4 years ago by bgamari (previous) (diff)

comment:6 Changed 4 years ago by bgamari

I've split out the strange inconsistency in unrolling discussed in comment:2 into another ticket, #11710.

comment:7 Changed 4 years ago by bgamari

I've pushed a simple approach to avoid build-based list desugaring for large lists to Phab:D2007. Here we simply threshold on the list length. I have a few concerns about this approach,

  • This threshold introduces a "performance cliff", where users will suddenly see drastically different performance characteristics as their literal lists grow.
  • There isn't really a good way to choose this size. Ideally we'd want a smaller threshold with larger consumers and vice-versa, but we have no way of knowing what will be consuming our list in the desugaring.

From a runtime performance perspective, applying build more liberally should rarely hurt on "moderately" sized lists (it can only expose further optimization opportunities; if no fusion is possible it will eventually get rule-rewritten back to a list)

Consequently, the threshold we introduce here is a bit concerning since we are limiting contexts in which we'll use build, potentially hiding optimizations.

Regardless, nofib shows no change in allocations after this change, so it at very least doesn't hurt any of the cases there. Moreover, it shows a 30% reduction in compiler allocations in T9961 (from 745044392 to 745044392 bytes).

Last edited 4 years ago by bgamari (previous) (diff)

comment:8 Changed 4 years ago by bgamari

Differential Rev(s): Phab:D2007
Status: newpatch

comment:9 Changed 4 years ago by simonpj

See #11710; I think we should abandon the "static tail" idea described in Note [Desugaring explicit lists] entirely.

Then, to avoid code bloat on very long lists, let's have the patch in this ticket, which ensures that if the list is long we revert to cons-list for. The goal here is just to eliminate mega-code-bloat in extreme cases. (Hence no real need for a flag to control maxBuildLength.

I don't really like it; but I don't want BOTH hacks. One is enough!

Not worth burning much midnight oil on this one.

comment:10 Changed 3 years ago by Ben Gamari <ben@…>

In b735e99d/ghc:

DsExpr: Don't build/foldr huge lists

Desugaring long lists with build trades large static data for large
code, which is likely a poor trade-off. See #11707.

Test Plan: Validate, nofib

Reviewers: simonpj, austin

Reviewed By: austin

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2007

GHC Trac Issues: #11707

comment:11 Changed 3 years ago by bgamari

Milestone: 8.0.1
Status: patchmerge

comment:12 Changed 3 years ago by simonpj

What of comment:9? I'd prefer not to remove complexity as we add it!

comment:13 Changed 3 years ago by simonpj

Ah: see Phab:D2023

comment:14 Changed 3 years ago by bgamari

Differential Rev(s): Phab:D2007Phab:D2007, Phab:D2023
Status: mergenew

comment:15 Changed 3 years ago by bgamari

Status: newpatch

comment:16 Changed 3 years ago by bgamari

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.