Version 1 (modified by MikolajKonarski, 2 years ago)

--

How to add new graphs to TS

I'm afraid this is not understandable without the context of Threadscope internals and without some extra explanation. Sorry. Please ask questions.

To draw a graph from an eventlog I need the data preprocessed and then I just pick a portion to show on-screen.
Best if I can take the data from a validation profile and then just process some more, as we do for Histogram (from spark profile).
In this case we know the data makes sense and we can use the finite automaton validation engine 
for all the FA mangling of the list of events.
So the workflow for drawing a graph (and implementing a new one), as seen in Histogram, is:

1. parse the events in ghc-events
2. optionally validate them in ghc-events
3. generate a profile using a specific profile function for validator
4. preprocess the profile some more and store until eventlog reloaded
5. select the data for the required interval (or zoom/pan factor)
6. process yet more
7. draw

If the data is very dense, storing it in step 4 needs to use a zoom-tree of some kind.
For speedup under certain usage scenarios (many redraws with only some parameters changing), 
the data can be cached in step 5 (for user-defined graphs) or step 6 (for fixed graphs), 
until the relevant interval or zoom/pan or other parameter changes.

For simple events that do not require a lot of finite automaton mangling, we may skip steps 2 and 3.
If our spark graphs were built from the detailed spark events, we'd best use the validator profiles, 
but instead we use the spark counters, so the profiling work is actually done in GHC, so steps 2 and 3 are not needed.
For such simple graphs without FA (allocation rate is an example), the existing zoom-trees or the zoom-cache library suffice.
Allocation rate happens to be sampled as often as spark counters, 
so it actually fits best into the spark trees, no new tree kind is needed.
But generally, we may be best off to set up a single zoom tree with very small sampling interval 
and resample all data (sparks included) into that tree.
We'd lose some data, unless the sampling interval is 1, but we'd gain flexibility 
and the accuracy of visualization of spark rates would actually improve 
(except at very high zoom levels where the resampling noise overweights 
the more accurate rate of change calculation due to equal sample intervals).

GC is a border case; there's clearly a FA, but now that we don't have to track RequestParGC, 
it only has 6 states and the transitions are simple compared to the actual data processing that is triggered by transitions.
So if we don't want to validate GC events just for validation sake, 
it's IMHO not mandatory to encode the FA rules in the validator profile and rewrite the code to use that.
But if we already validate GC, then we should also make use of the validation profile, 
if only to ensure consistency between validation and visualization.

Totally new kinds of graphs for old events require changes from step 3 onward.
For user-defined graphs, if we gather enough data in an efficient format (zoom trees) in steps 3--5, 
we may just recompute 6 and 7 for each drawing, based on the current user graph definitions.