Version 9 (modified by MikolajKonarski, 6 years ago)


The analysis of the semantics of +RTS -s and the list of needed new events

A sample output of +RTS -s, annotated with the new required events. More concrete event proposals will follow.

     237,179,528 bytes allocated in the heap

We need an event for each memory allocation.

      52,785,584 bytes copied during GC

An event with a summary of all copying done, after the end of each GC,

      17,272,336 bytes maximum residency (5 sample(s))

Probably a separate event for that, instead of trying to calculate it from all events. Perhaps the event should be emitted during major GC (the docs explain the "n samples" saying "only checked during major garbage collections").

       6,493,328 bytes maximum slop

We also need an event for slop, probably emitted rarely.

              45 MB total memory in use

Can probably be calculated from allocation and deallocation.

                  (0 MB lost due to fragmentation)

Ask JaffaCake? the following formula means:

 (peak_mblocks_allocated * BLOCKS_PER_MBLOCK * BLOCK_SIZE_W - hw_alloc_blocks * BLOCK_SIZE_W) / (1024 * 1024 / sizeof(W_))

in particular what is the difference between peak_mblocks_allocated and hw_alloc_blocks.

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       448 colls,     0 par    0.04s    0.04s     0.0001s    0.0016s
  Gen  1         5 colls,     0 par    0.07s    0.07s     0.0149s    0.0386s

Split the current GC events into generations and into seq/par (if RequestParGC is not enough to tell seq/par). We don't want to report the CPU time, only the elapsed time, and that's fine. With the current GC events only summarized information is available.

  Parallel GC work balance: 1.00 (6391526 / 6375794, ideal 2)

This is quite convoluted: total words copied during parallel GCs divided by the average over all parallel GCs of maximal number of words copied by any thread in a single par GC. Events needed: the events added above suffice, but quite a bit of extra state will have to be maintained when reading the events. Perhaps +RTS -s could be modified to make this figure simpler?

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.04s    (  0.38s)       0.13s    (  0.17s)
  Task  1 (worker) :    0.00s    (  0.55s)       0.00s    (  0.00s)
  Task  2 (bound)  :    0.41s    (  0.41s)       0.13s    (  0.13s)
  Task  3 (worker) :    0.00s    (  0.55s)       0.00s    (  0.00s)

Ask JaffaCake? how the "tasks" relate to the "threads" for which we generate events. Perhaps in the existing GC evens add the info about which task does the job. BTW, is the time between event GCIdle and GCWork counted as GC time of the task? Generally, is GCIdle useful for us here in any way?

  SPARKS: 300001 (17739 converted, 282262 overflowed, 0 dud, 0 GC'd, 0 fizzled)

Tell JaffaCake? that the example and description for that at

needs updating (not sure for which GHC version, though). Otherwise, we have enough events for that (we calculate this using the SparkCounters? events, but we could also use the precise per-spark events).

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.42s  (  0.41s elapsed)
  GC      time    0.29s  (  0.13s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.71s  (  0.55s elapsed)

(Note that there may be more positions here, e.g., for profiling.) We can sum up the times from GC events. For other stats, we'd need the MUT figure, but it's not obvious if we manage to get it from all the thread (task) events that we have and will add above. It's also not clear if adding events needed to get the other times is worth it. After all the other events are added, let's see if we can get any more of these summary times, perhaps by adding a minor event emitted just once. Note that the INIT time is necessary for the Productivity figure below (it does not count as "productive time" in +RTS -s)..

%GC time      89.5%  (75.3% elapsed)

Ask JaffaCake? why this line is in the docs, but not in the output of +RTS -s I've got.

  Alloc rate    564,074,971 bytes per MUT second

The events added above should suffice, except that we use elapsed time, not CPU time.

  Productivity  59.0% of total user, 76.6% of total elapsed

The events added above should be enough. Again. we only do the elapsed case.