Opened 4 years ago

Last modified 3 years ago

#10613 new feature request

Mechanism for checking that we only enter single-entry thunks once

Reported by: bgamari Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.10.1
Keywords: Cc: simonmar, simonpj
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Other Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

During this week's GHC call it came up that it would be nice to have a linting mechanism for verifying that we only enter single-entry thunks once, as currently this invariant appears to be completely unchecked.

Currently we do not record whether a thunk is intended to be single-entry when we emit its code. Moreover, I don't believe there is any room for this left in the info table to record this fact. There are two ways I can think of accomodating this,

  • stealing a bit from the thunk type field
  • add a flags field to StgDebugInfo such that the lint requires the debug way

As far as implementing the check itself, I think it should be rather straightforward. When we enter a thunk we simply want to check whether it is single-entry. If it is then we replace it with a special type of BLACKHOLE-like thunk which crashes the program (or merely emits a warning) on entry.

Change History (24)

comment:1 Changed 4 years ago by bgamari

Simonmar, could you confirm that there are no blatant inaccuracies in this description? If not, do you have an opinion on which approach is preferable?

comment:2 Changed 4 years ago by rwbarton

For what it's worth, while investigating #10414, I ran the whole test suite with -feager-blackholing, which (at the time) caused all single-entry thunks to be overwritten with a black hole on entry. There were no test failures, which gives me a reasonably high degree of confidence that the "single-entry thunks are entered only once" invariant is maintained, outside of the situations involving multiple threads entering a closure simultaneously discussed in #10414.

comment:3 Changed 4 years ago by nomeata

A little motivation for anyone considering to work on this: Right now I would be very happy to have this feature.

comment:4 Changed 4 years ago by nomeata

This is me thinking out aloud about this feature. Ignore me if you want, but if you are curious, feel free to jump in.

Widening the scope of this ticket a bit (or shifting it?): Wouldn’t it be nice if we not only link the single-entry thunks, but also find out about missed opportunities for single-entry thunks? Such a counting would subsume the linting above.

I’m considering to build on ticky-ticky debugging, as it already provides a lot of the necessary infrastructure. In particular, it counts entries to thunks, when -ticky-dyn-thunk is enabled. The number is not quite what we want, though: It seems that if a certain static thunk is created n times, then it will report n entries; whereas in the scope of this ticket we want to know that each allocated thunk was entered once.

Also, currently a dynamic (i.e. an instance of a static) thunk that is not marked as single-entry will always be reported to be entered 0 or 1 times, as the result is shared. How do we get better numbers here? One way might be a special heap object, let’s call it a “counting indirection”. It behaves mostly like a regular indirection, i.e. it’s entry code would count a tick and then jump to the entry code of the indirectee, but the crucial difference would be that the garbage collector will 'not' erase it when it evacuates memory. It will, however, evacuate the indirectee as usual. This way, the sharing semantics are preserved, but every entry via the thunk will be counted, not just the first.

These counting indirections would be created by the code that creates a thunk. This way, this counting can be done independent of -ticky-dyn-thunk. These heap objects can also carry additional data used in counting, e.g. the name of the thunk (in CorePrep names) and whether GHC created this thunk as single-entry or not (which would not affect the behaviour of the counting indirection, but be useful for the reporting).

I’m not sure yet what’s the best data structure to do the counting, if we indeed want to count uses of each dynamic instance of the thunk (or rather, of the counting indirection) separately. Maybe statically allocate 'm' (maybe 1024) counters per thunk, and an index, and the first 'm' instances of the thunk use the these counter (each, upon first entry, incrementing the index to allocate its slot), and if there are more than 1024 entries, then simply stop counting.

I kind-of like this design, but I’m sure I’m missing some rather ugly details.

comment:5 in reply to:  4 Changed 4 years ago by simonpj

It seems that if a certain static thunk is created n times, then it will report n entries; whereas in the scope of this ticket we want to know that each allocated thunk was entered once.

Yes exactly. Let's call the code that allocates the thunk the thunk allocation site. For some of these sites, cardinality analysis may say "this is a single-entry thunk". Call those the single-entry thunk allocation sites and the others the multi-entry thunk allocation sites.

  • For each single-entry thunk allocation site, we'd like to know if any of the thunks it allocates are entered more than once. (This would mean that the analysis was unsound.)
  • For each multi-entry allocation site, we'd like to know if all the thunks it allocates are entered at most once. This is a possible missed opportunity to make it a single-entry site. (Only "possible" because in a different run it might be entered more than once.)
  • For all allocation sites we'd like to know if every thunk allocated there was never entered; that's a possible missed opportunity for absence.

A counting indirection sounds like the right kind of thing.

I would also like to count lambdas! So for the STG binding

f = \xy. e
  • How many times do I call f (ie enter the closure)? That is, is it a one-shot lambda.
  • Do I ever partially apply it, or is it always saturated?

Similar technology might do the job.

comment:6 Changed 4 years ago by nomeata

A counting indirection sounds like the right kind of thing.

Looking a bit into this IND_COUNTING now; so I’ll go into “dumping notes into trac” mode, for the future benefit of me or whoever else continues.

There is already a closure type IND_PERM which is not removed by the GC; our counting indirections requires the same behavior. So this is a good start, I guess. The following comments are observations from looking at every mention of IND_PERM in the code.

  • Cmm.h has macros to ENTER a closure, which shortcuts indirections without calling their entry code. This is not what we want for an IND_COUNTING; so we should use the default code there, to actually enter the counting closure.
  • Should the interpreter also count entries? I don’t see any reference to ticky in Interpreter.c, so probably not.
  • Should rts/Stable.c preserve IND_COUNTING? It does not preserve IND_PERM, so probably not.
  • Fun fact: tricky and profiling do currently not mix. As I plan to do this counting as part of ticky, this means I can ignore problems with profiling in the RTS (the closures still have to work, of course).
  • scavenge_mark_stack does nothing for IND_PERM with a comment I don’t fully understand. Will revisit.

Weird. I only find lots of code that handles IND_PERM, but none that actually creates closures of that type. Is all that dead code, likely since 2010’s reimplementation of BLACHOLES by Simon Marlow? (changeset:5d52d9b/ghc)

comment:7 Changed 4 years ago by nomeata

Created Phab:D1821 to let Harbormaster check if IND_PERM is really unused.

comment:8 Changed 4 years ago by simonmar

Why not emit some update code into a single entry thunk that updates the thunk with something that will explode if it is entered again? That seems a lot easier than allocating indirctions.

comment:9 Changed 4 years ago by nomeata

Why not emit some update code into a single entry thunk that updates the thunk with something that will explode if it is entered again? That seems a lot easier than allocating indirctions.

Right, that would be sufficient to detect thunks that are wrongly marked single-entry. But we also want to learn about thunks that are not marked single-entry, but should be.

comment:10 Changed 4 years ago by simonmar

Right, that would be sufficient to detect thunks that are wrongly marked single-entry. But we also want to learn about thunks that are not marked single-entry, but should be.

Do you expect to learn much by doing that? We know the analysis is inaccurate, and I expect you'll just find a lot of thunks that are used a long way from the definition site.

comment:11 Changed 4 years ago by simonpj

Yes we expect to learn a lot. From a scientific point of view (and writing the journal paper) I'm very interested to know how many of the single-entry thunks are found by the analysis. Let's say there are N thunk allocation sites in the code. We find that 15% are single-entry. If we found that only 20% were entered once (that 15% plus another 5%) that would mean there was at most another 5% that a more sophisticated analysis might find. But the number might be much larger.

If the number is large, we might start to characterise the missed opportunities further. Perhaps some are being missed because of a bug in the analysis. Others might be caught by a simple improvement in the analysis.

That's the thinking.

comment:12 Changed 4 years ago by Ben Gamari <ben@…>

In f42db157/ghc:

Remove unused IND_PERM

it seems that this closure type has not been in use since 5d52d9, so all
this is dead and untested code. This removes it. Some of the code might
be useful for a counting indirection as described in #10613, so when
implementing that, have a look at what this commit removes.

Test Plan: validate on harbormaster

Reviewers: austin, bgamari, simonmar

Reviewed By: simonmar

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1821

comment:13 Changed 4 years ago by nomeata

I have made initial, still hackish, progress. After a few iterations I can now run all of nofib with my counting indirections and it does not segfault any more (phew).

Here is a report from integrate, note the new columns #Alloc, Single and Multpile.

    Entries      Alloc    Alloc'd     #Alloc     Single   Multiple  Non-void Arguments      STG Name
--------------------------------------------------------------------------------
      50000     800000          0          0          0          0   1 D                    main@main:Main.main6{v r7sX}
          0          0    8000000     500000          0          0   0                      sat_s7u6{v} (main@main:Main) (thk) in r7sQ
          0          0    8000000     500000          0          0   0                      sat_s7u3{v} (main@main:Main) (thk) in r7sQ
          0          0    8000000     500000          0          0   0                      sat_s7tZ{v} (main@main:Main) (thk) in r7sQ
          0          0    8000000     500000          0          0   0                      sat_s7tU{v} (main@main:Main) (thk) in r7sQ
          0          0    8000000     500000          0          0   0                      sat_s7tP{v} (main@main:Main) (thk) in r7sQ
          0          0    8000000     500000          0          0   0                      sat_s7tK{v} (main@main:Main) (thk) in r7sQ
          0          0    8000000     500000          0          0   0                      sat_s7tF{v} (main@main:Main) (thk) in r7sQ
          0          0    8000000     500000          0          0   0                      sat_s7tA{v} (main@main:Main) (thk) in r7sQ
    4050000          0    7200000     450000          0     450000   1 D                    sat_s7uz{v} (main@main:Main) in s7uB
          0          0    8000000     500000          0          0   0                      sat_s7tu{v} (main@main:Main) (thk) in r7sQ
     500000   72000000          0          0          0          0   3 dd>                  main@main:Main.$wintegrate1D{v r7sQ}
     450000   32400000     800000      50000          0      50000   1 D                    sat_s7uB{v} (main@main:Main) in r7t3
      50000     800000          0          0          0          0   1 D                    main@main:Main.main10{v r7t2}
      50000    3600000          0          0          0          0   2 DD                   main@main:Main.main11{v r7t3}
      50000          0          0          0          0          0   4 LLid                 main@main:Main.$wgo{v r7t4}
          2          0         24          1          1          0   0                      sat_s7vH{v} (main@main:Main) (thk) in r7sR
          1         64          0          0          0          0   0                      main@main:Main.main1{v r7sR}
          1          0          0          0          0          0   0                      main@main:Main.main15{v r7t9}
          1          0          0          0          0          0   0                      main@main::Main.main{v 01D}

It is still buggy (or not fully understood), e.g.:

  • I thought I only enabled it for thunks, but it also seems to be present for two functions, and only there count multiple entries... Weird.
  • Also, these 500000-times allocated thunks are not really unused, are they?.
  • Two total entries for a thunk (sat_s7vH) that has been allocated once and called once? Maybe there was a heap check inbetween?

But still: You can tell that sat_s7uB has been allocated 50000 and each instance is called 9 times – this matches the 9-element-list in the code.

The implementation uses a new closure type that carries a pointer to a ticky counter struct, and its own little counter:

typedef struct {
    StgHeader   header;
    StgClosure *indirectee;
    const void *ent_counter; // A StgEntCounter
    StgWord     entries;
} StgIndCounting;

Upon entry, this increments the private counter, and if it does so the first time, increases Single, and on the second time decreases Single and increases Multiple. Otherwise, it passes R1 onto the indirectee, preserving the tag.

comment:14 Changed 4 years ago by simonpj

Good work!

comment:15 Changed 4 years ago by nomeata

Nooo, where is the longish text I wrote here yesterday? :-( I hate it when that happens. Darn, let’s see what I can remember. grnrl.

I thought I only enabled it for thunks, but it also seems to be present for two functions, and only there count multiple entries... Weird.

Still weird. If I add my counting indirection to everything, I get segfaults. So I guard it with idRepArity id == 0 for now, thinking I would only add it to thunks. But the above report shows some that have “Non-void arguments”, but still idRepArity id == 0; this is one example:

let {
  sat_s7uy [Occ=Once] :: GHC.Types.Double -> GHC.Types.Double
  [LclId, Str=DmdType] =
      \r srt:SRT:[] [x_s7ux] GHC.Float.timesDouble x_s7ux y_s7uu;
} in 
  case Main.$wintegrate1D 0.0## ww3_s7uw sat_s7uy of ww4_s7uz {
    __DEFAULT -> GHC.Types.D# [ww4_s7uz];
  };

Why does this have a zero idRepArity? Why does it not crash? Is it because it is used as an argument to a function, and hence no fast calls occur?

Also, these 500000-times allocated thunks are not really unused, are they?.

No, they are not. These are constructors, where entries are not counted. I extended the ticky report to be a bit more explicit about the kind of things these names are, i.e. distinguish cons from thunks, and for thunks, indicate which are one-shot and which are static.

Two total entries for a thunk (sat_s7vH) that has been allocated once and called once? Maybe there was a heap check inbetween?

No, this is simply a bug in the ticky code, where there are two calls to tickyEnterThunk (likely introduced by 99d4e5b4a0bd32813ff8c74e91d2dcf6b3555176, possibly due to a merge mistake).

The latter two improvements are at Phab:D2014, mostly to get phab to test the commit (which does not seem to happen... why?).

I now need to wrap my head around how to do the counting for functions. My counting indirection does not work as it is, because functions are entered differently. So I either have to create a counting indirection that works for functions (but I don’t know yet if a single one would work for all types of functions), or add the counting code to the function itself. In this case, the function closure needs an additional field for the counter.

comment:16 Changed 4 years ago by simonpj

If you want to count how many times a function is entered, why not just put a ticky counter at the start of the function's code?

Simon

comment:17 Changed 4 years ago by nomeata

If you want to count how many times a function is entered, why not just put a ticky counter at the start of the function's code?

The problem is that I need to count each dynamic instance separately. So I need a separate counter for each allocated closure, so the only place it can go is inside the closer as another non-pointer argument.

(For static functions, things are easier of course.)

comment:18 Changed 4 years ago by simonpj

Oh ok, well, same deal. When you enter a (dynamically allocated) function there's a register pointing to the function closure. So the ticky code at the start of the function's code can bump the count in ClosurePtr[4] (or whatever the appropriate offset is. No?

comment:19 Changed 4 years ago by nomeata

Yes! That’s the idea, and conceptually straight-forward. I’m just a bit worried that the change will be too invasive to the compiler’s code (info tables, offset calculation etc.)

But I guess I should just do it, and see how bad it is. In the worst case we keep it on a separate branch just for the investigation for the paper.

comment:20 Changed 4 years ago by nomeata

Ok, I implemented it, and nofib still runs (yay). The code is yet unrepresentable, though :-)

Preliminary observations (gotta run now):

In all of nofib, there is one single entry thunk that is entered multiple times (something in gamteb, allocated 2048 times, each time with two entries). Did not investigate yet, though.

On first glance, plenty of thunks which appear to be single-entry, but are not marked as such.

comment:21 Changed 4 years ago by nomeata

I started to put my code on wip/T10613. Currently, the extra counters require -ticky -ticky-allocd -ticky-dyn-thunk to be shown.

comment:22 Changed 4 years ago by Ben Gamari <ben@…>

In 6c2c853b/ghc:

Various ticky-related work

this Diff contains small, self-contained changes as I work towards
fixing #10613. It is mostly created to let harbormaster do its job, but
feedback is welcome as well.

Please do not merge this via arc; I’d like to push the individual
patches as layed out here. I might push mostly trivial ones even without
review, as long as the build passes.

Reviewers: austin, bgamari

Reviewed By: bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2014

comment:23 Changed 4 years ago by bgamari

Ignore the commit in comment:22; I forgot that this Diff was not intended to be merged. I'll be reverting it shortly.

comment:24 Changed 3 years ago by simonpj

See #13536 for an example where it'd have been great to have this feature!

Last edited 3 years ago by simonpj (previous) (diff)
Note: See TracTickets for help on using tickets.