wiki:NestedClosures

Nested closure representation in STG

Progess on this is tracked by Trac #14461.

The problem

Consider this function (taken from comment:87 of #7258):

f10 :: A (Int, Int, Int, Int, Int, Int, Int, Int, Int, Int)
f10 =
  A (\i0 ->
  A (\i1 ->
  A (\i2 ->
  A (\i3 ->
  A (\i4 ->
  A (\i5 ->
  A (\i6 ->
  A (\i7 ->
  A (\i8 ->
  A (\i9 ->
  N (i0, i1, i2, i3, i4, i5, i6, i7, i8, i9)
  ))))))))))

In STG form the body of f10 looks like this

let f0 = [] \[i0].
         let f1 = [i0] \[i1].
                  let f2 = [i0,i1] \[i2].
                           let f3 = [i0,i1,i2] \[i3].
                                    N (i0,i1,i2,i3)
                           in A f3
                  in A f2
         in A f1
in A f0

(I have shortened it to n=3, but you can see the pattern.) Reminder: the form

   let f = [fv1,...,fvn] \[arg1,...,argm]. body

binds f to a heap-allocated function closure with free variables fv1,...,fvn, and arguments arg1,..., argm].

Consider the code for function f2:

  • On entry R0 points to the closure, and R1 points to the argument i2
  • The first thing it does is to allocate a closure for f3, which has free vars [i0,i1,i2]
  • So it fetches the first two by indirection from R0; the third, i2 is already in R1.
  • It allocates a 4-word closure, andn stores the three free varaiables in it. (The fourth word is the info pointer.)

So the closure for f3 looks like

--------------------------
| f3_info | i0 | i1 | i2 |
--------------------------

Clearly we generate a quadratic amount of code, and execute a quadratic number of loads and stores. This is not good.

A solution

An obvious solution is to share the already-allocated closures. For example, the closure for f3 could point to the (already-allocated) closure for f2, thus

--------------------
| f3_info | . | i2 |
------------|-------
            |
            V
            ---------------------
            | f2_info | i0 | i1 |
            ---------------------

If we did this all the way down we'd get a linear amount of code, we'd execute a linear number of loads and stores. Function closures would no longer be "flat", but that's ok.

Problem: Updatable thunks

While this solution is quite simple it poses some problems with respect to updatable thunks: If an inner closure refers to its outer closure, which happens to be an updatable thunk, we can't guarantee safe access to outers free variables for inner! When outer is forced its value will be written back to outers closure and outers info table is updated to be an indirection instead of a thunk. outers free variables are now considered garbage and we might segfault when trying to access these in inner.

When exactly can we do this?

The basic setup is this:

  let p = [a,b,c] \[x,y].
          .....(let { q = [a,b,c,d,e,f] \[r,s]. blah } in ...)...

That is:

  • Start with some "parent" closure p, with free variables [a,b,c]
  • At some arbitary place in p's code we see a binding for q, with free variables [a,b,c,d,e,f]: a superset of the free-vars of p.
  • In that case we can replace the free vars a,b,c of q's closure with p.

We might express the transformation like this:

  let p = [a,b,c] \[x,y].
          .....(let { q = [p{a,b,c},d,e,f] \[r,s]. blah } in ...)...

Here the notation p{a,b,c} in q's free-var list indicates that we store a single pointer (to p) in q's closure; but that makes available the variables a,b,c in q's RHS blah.

This can be nested. In original example we'd get

let f0 = [] \[i0].
         let f1 = [i0] \[i1].
                  let f2 = [i0,i1] \[i2].
                           let f3 = [f2{i0,i1},i2] \[i3].
                                    let f4 = [f3{f2{i0,i1},i2},i3] \[i4].
                                             N (i0,i1,i2,i3,i4)
                           in A f3
                  in A f2
         in A f1
in A f0

I took it up to n=4 to illustrate. Notice that

  • For f2 there was no point in storing a pointer to f1; that's no more compact than storing i0 directly.
  • For f4 we have this structure, reflected in its free-var list:
    --------------------
    | f4_info | . | i3 |
    ------------|-------
                |
                V
                --------------------
                | f3_info | . | i2 |
                ------------|-------
                            |
                            V
                            ---------------------
                            | f2_info | i0 | i1 |
                            ---------------------
    

Things to think about

  • In the above example I have used a pointer to the parent closure as soon as doing so saves a single word. But we could choose to do so only when it saves, say, three words. Then in small (common) cases we would get no nesting at all; but for large cases we woudl nest in "chunks" of three. Sounds easy and attractive.
  • You might wonder if you could use a pointer to a parent closure even if the child doesn't use all the free vars of the parent. E.g
    let f = [i1, ..., i20] \[v].
            let g = [i2, ... i20] \[w]. blah
            in ...
    in ...
    
    Here g shares all of f's free variables except i1. Making g's closure point to f's would be sound, but would risk a space leak by keeping i1 alive for too long.

This would be fixable, by keeping a liveness bit-map in g's info-table, and using it when scavenging g's closure. But that would be new work.

  • There may be many candidate parents of q, but there is always a "best" one (i.e. with the maximum number of free vars).
  • One could be more ambitious and look for "peer" sharing too
      let f = [] \[x].
              case x of (c,d,,e) ->
                 let g = [c,d,e] \[r]. blah1
                     h = [c,d,e] \[r]. blah2
                 in ...
    
    Here perhaps h could point to g. Or maybe they can both point to a shared record. There is plenty of material in the literature describing schemes like this, dating back at least thirty years. But our goal for now is simply to gather the low-hanging fruit; hence the "parent" idea.

  • All this affects what free variables to save across a case. For example
      let p = [a,b,c] \[x,y].
              case x of K d e _ f) ->
              let { q = [p{a,b,c},d,e,f] \[r,s]. blah } in ...
    
    Here we must save p (but not a,b,c) across the case.

Implementation

Because of the business about what to save across a case expression, I think we should do the analysis on STG, and record the results in STG free-var lists.

Last modified 2 years ago Last modified on Nov 29, 2017 7:41:31 PM