Opened 11 years ago

Last modified 3 years ago

#2607 new bug

Inlining defeats selector thunk optimisation

Reported by: simonmar Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.8.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #12241 Differential Rev(s):
Wiki Page:


From a post on haskell-cafe.

Lev Walkin wrote:

I wondered why would a contemporary GHC 6.8.3 exhibit such a leak? After all, the technique was known in 2000 (and afir by Wadler in '87) and one would assume Joe English's reference to "most other Haskell systems" ought to mean GHC.

Thanks for this nice example - Don Stewart pointed me to it, and Simon PJ and I just spent some time this morning diagnosing it.

Incedentally, with GHC 6.8 you can just run the program with "+RTS -hT" to get a basic space profile, there's no need to compile it for profiling - this is tremendously useful for quick profiling jobs. And in this case we see the the heap is filling up with (:) and Tree constructors, no thunks.

Here's the short story: GHC does have the space leak optimisation you refer to, and it is working correctly, but it doesn't cover all the cases you might want it to cover. In particular, optimisations sometimes interact badly with the space leak avoidance, and that's what is happening here. We've known about the problem for some time, but this is the first time I've seen a nice small example that demonstrates it.

>     -- Lazily build a tree out of a sequence of tree-building events
>     build :: [TreeEvent] -> ([UnconsumedEvent], [Tree String])
>     build (Start str : es) =
>             let (es', subnodes) = build es
>                 (spill, siblings) = build es'
>             in (spill, (Tree str subnodes : siblings))
>     build (Leaf str : es) =
>             let (spill, siblings) = build es
>             in (spill, Tree str [] : siblings)
>     build (Stop : es) = (es, [])
>     build [] = ([], [])

So here's the long story. Look at the first equation for build:

>     build (Start str : es) =
>             let (es', subnodes) = build es
>                 (spill, siblings) = build es'
>             in (spill, (Tree str subnodes : siblings))

this turns into

      x = build es
      es' = fst x
      subnodes = snd x

      y = build es'
      spill = fst y
      siblings = snd y

now, it's the "siblings" binding we're interested in, because this one is never demanded - in this example, "subnodes" ends up being an infinite list of trees, and we never get to evaluate "siblings". So anything referred to by siblings will remain in the heap.

The space-leak avoidance optimisation works on all those "fst" and "snd" bindings: in a binding like "siblings = snd y", when y is evaluated to a pair, the GC will automatically reduce "snd y", so releasing the first component of the pair. This all works fine.

But the optimiser sees the above code and spots that es' only occurs once, in the right hand side of the binding for y, and so it inlines it. Now we have

      x = build es
      subnodes = snd x

      y = build (fst x)
      spill = fst y
      siblings = snd y

Now, usually this is a good idea, but in this case we lost the special space-leak avoidance on the "fst x" expression, because it is now embedded in an expression. In fact in this case the thunk goes away entirely, because build is strict.

But now, when the program runs, the thunk for siblings retains y, which retains x, which evaluates to a pair, the second component of which evaluates to an infintely growing list of Trees (the first components is a chain of "fst y" expressions that constantly get reduced by the GC and don't take up any space).

Attachments (1)

selector.hs (1.6 KB) - added by simonmar 11 years ago.

Download all attachments as: .zip

Change History (10)

Changed 11 years ago by simonmar

Attachment: selector.hs added

comment:1 Changed 11 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:2 Changed 11 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:3 Changed 9 years ago by simonmar

Type of failure: None/Unknown

comment:5 Changed 9 years ago by simonpj

The way GHC's current mechanisms fail is this. We start thus:

   p = e
   x = case p of (a,_) -> a
   y = case p of (_,b) -> b
   z = case x of
           [] -> []
           (t:ts) -> foo ts

The bindings for x and y look like selector thunks, and GHC's RTS does the Wadler-thing on them. But what happens is that x only occurs once so the optimiser inlines it in z (we assume there are other uses of 'y')

   p = e
   y = case p of (_,b) -> b
   z = case p of 
        (a,_) -> case a of 
                  [] -> []
                  (t:ts) -> foo ts

Now z doesn't look like a selector thunk any more.

One "solution" might be to restrict the optimiser, but what if the above "optimised" program was exactly what the programmer wrote. Then it'd have a space leak. Or what if the optimiser's inlining led to a useful cascade of transformations that eliminated the leak some other way?

I had an idea on my bicycle this morning. In the code for 'z' we see the pattern

  z = ...(case p72 of (a, _) -> e)...

Assuming p72 is in scope at the z-binding, we can immediately see that we are picking just a little part of p72, and that is a leaky thing to do. So we could transform to

  z = ...(e)...
  a = case p72 of (a, _) -> a

thereby precisely reversing what went wrong, and making a thunk for 'a' that fits what GHC thinks of as a selector thunk.

So that's the idea. Very late in the compiler (something like at STG level, or just before), identify any case expression that:

  • Scutinises a variable
  • The variable is in scope at an enclosing let-binding, without passing any intervening lambdas
  • Has exactly one alternative (ie it's a product type)
  • Has some dead binders in the pattern

In that case, float the case as above.


comment:6 Changed 9 years ago by simonmar

I like it. I'm intrigued to know how often it would fire, and what affect it would have on performance in general.

comment:7 Changed 9 years ago by batterseapower

Does GHC deal with selector thunks on sum types as well? Presumably you even can transform:

z = ... (case p72 of (x:_) -> e1; [] -> e2)


z = ... (case p72 of (_:_) -> e1; [] -> e2)
x = case p72 of (x:_) -> x

This even extends to the case where both case branches bind variables, since the new selector thunks will only be evaluated in branches that guarantee the selector pattern match will succeed.

The sad part is that the FVs of p72 will be kept alive by the non-selector expression:

case p72 of (_:_) -> e1; [] -> e2

Even though we don't want either field. We could transform like this instead:

z = ... (case tag of 1 -> e1; 2 -> e2)
tag = case p72 of (_:_) -> 1; [] -> 2
x = case p72 of (x:_) -> x

And then have the garbage collector speculatively evaluate "tag-selector thunks" too. This would eliminate Wadler-style space leaks through sum types. Would it be worth the complexity cost though?

comment:8 Changed 9 years ago by batterseapower

Type of failure: None/UnknownRuntime performance bug

comment:9 Changed 3 years ago by dfeuer

#12241 provides another example of the selector thunk trick being defeated by the simplifier.

Note: See TracTickets for help on using tickets.