Changes between Version 27 and Version 28 of RTSsummaryEvents

Show
Ignore:
Timestamp:
01/14/12 23:14:48 (3 years ago)
Author:
MikolajKonarski (IP: 95.160.111.162)
Comment:

Update the discussion and create the section with the list of events

Legend:

Unmodified
Added
Removed
Modified
  • RTSsummaryEvents

    v27 v28  
    1 == The analysis of the semantics of +RTS -s and the list of needed new events == 
     1== The analysis of the semantics of +RTS -s == 
    22 
    3 Here is a sample output of +RTS -s, annotated with the new events required to simulate it in ThreadScope.  
    4 More concrete event proposals will follow after discussion. 
    5 Note that eventually we'd like to generate such a summary for any user-selected time interval of runtime 
    6 and we may need more or different events for this or we may choose to skip some information (for example 
    7 the INIT or EXIT times do not make sense for an interval).  
     3Here is a sample output of +RTS -s, annotated with a discussion of new events required to simulate it in ThreadScope (for a user-selected time interval). 
     4A list of the new required events is in the second part of this page. 
    85Here is a [https://github.com/Mikolaj/ThreadScope/raw/working/SummaryPanelMockup.png screenshot] 
    9 of what we can already do using the current set of events. It happens we can do as much for the whole runtime  
    10 as for the time intervals in this case.  
     6of what we can already do using the current set of events. It so happens we can do as much for the whole runtime 
     7as for the selected time intervals with the currently available events, but in general, intevals require more kinds of events and more samples. Similarly, when we visualise some of this as graphs and especially graphs of rates of change of some values (e.g., memory usage), more frequent sampling will be required. 
    118 
    129The first line of +RTS -s follows. 
     
    1613}}} 
    1714 
    18 We'd need an event, emitted at each GC, with the allocation since the previous GC. 
     15We'd need an extra event, emitted at each GC, with the allocation since the previous GC. 
    1916(We really don't want an event for every memory allocation, that would be impractical and very slow.) 
    2017 
     
    3027 
    3128A separate event for that, perhaps emitted only after major GC when we know how much memory 
    32 is really used by the program. The docs explain the "n samples" above saying "only checked during major garbage collections".  
     29is really used by the program. The docs explain the "n samples" above saying "only checked during major garbage collections". 
    3330 
    3431{{{ 
    3532       6,493,328 bytes maximum slop 
    3633}}} 
    37               
     34 
    3835We also need an extra event for slop, probably emitted rarely. 
    3936 
     
    4845                  (0 MB lost due to fragmentation) 
    4946}}} 
    50                    
     47 
    5148Fragmentation is calculated in the RTS -s code as follows: 
    5249 
     
    5552}}} 
    5653 
    57 so it's a difference between total memory allocated from the OS (peak_mblocks_allocated),  
    58 and total memory in use by the GC and other parts of the RTS (hw_alloc_blocks).  
    59 Presumably, all the events needed so far are of the latter kind (really used),  
     54so it's a difference between total memory allocated from the OS (peak_mblocks_allocated), 
     55and total memory in use by the GC and other parts of the RTS (hw_alloc_blocks). 
     56Presumably, all the events needed so far are of the latter kind (really used), 
    6057so the former (allocated from the OS) may need a new event. 
    6158 
     
    6663}}} 
    6764 
    68 The current GC events (in particular RequestParGC) seem to be enough to distinguish  
     65The current GC events (in particular RequestParGC) seem to be enough to distinguish 
    6966between seq and par GC. We'd need to split the current GC events into generations, though, 
    7067to report for every generation separately. We may and up with two tables for the same GC info: 
    7168one aggregated by cap, another by generations. Or, as long as there are only 2 generations, 
    72 one table with both caps and generations, with the following rows: cap0&gen0, cap0&gen1, cap1&gen0, etc.  
     69one table with both caps and generations, with the following rows: cap0&gen0, cap0&gen1, cap1&gen0, etc. 
    7370Note that we don't want to report the CPU time, only the elapsed time, and that's fine. 
    7471 
     
    7774}}} 
    7875 
    79 Let's ignore that one for now.   
     76Let's ignore that one for now. 
    8077JaffaCake says we probably don't care about work balance and that he thinks it is computed in the simplest way. 
    8178Detail are in [http://community.haskell.org/~simonmar/papers/parallel-gc.pdf]. 
     
    9491}}} 
    9592 
    96 JaffaCake says the task information has questionable usefulness, so let's ignore that one for now.   
    97 It's much more natural for us to present the same info per cap,  
    98 not per OS thread (which the tasks basically are). Actually we do present the GC info per cap (not only total, as in +RTS -s)  
     93JaffaCake says the task information has questionable usefulness, so let's ignore that one for now. 
     94It's much more natural for us to present the same info per cap, 
     95not per OS thread (which the tasks basically are). Actually we do present the GC info per cap (not only total, as in +RTS -s) 
    9996already and the total activity time per cap (which includes the mutator time) is much better conveyed by the graphs in ThreadScope. 
    10097 
    101 BTW, the time between events GCIdle and GCWork is still counted as GC time, so we may ignore the events for calculating  
    102 the times spent on GC. OTOH, a summary of the GCIdle times, per hec, then the total, also as the percentage of all GC time could be useful.  
     98BTW, the time between events GCIdle and GCWork is still counted as GC time, so we may ignore the events for calculating 
     99the times spent on GC. OTOH, a summary of the GCIdle times, per hec, then the total, also as the percentage of all GC time could be useful. 
    103100Probably we can do that cheaply along the way since we have to identify and sift out the GCIdle, GCDone and GCWork events anyway. 
    104101 
     
    107104}}} 
    108105 
    109 Tell JaffaCake that the example and description for the SPARKS count at  
     106Tell JaffaCake that the example and description for the SPARKS count at 
    110107{{{ 
    111108http://www.haskell.org/ghc/docs/latest/html/users_guide/runtime-control.html#rts-options-gc 
    112109}}} 
    113 needs updating (not sure for which GHC version, though). Otherwise, we have enough  
    114 events for that (we calculate this using the SparkCounters events,  
     110needs updating (not sure for which GHC version, though). Otherwise, we have enough 
     111events for that (we calculate this using the SparkCounters events, 
    115112but we could also use the precise per-spark events). 
    116113 
     
    124121 
    125122(Note that there may be more times listed above, e.g., the time overhead of profiling.) 
    126 We can sum up the GC time from GC events. We'd also like to have the MUT 
    127 figure, but it's not obvious if we can manage to get it from all the thread (task) 
    128 events that we have (or add above). It's also not clear if adding events 
    129 needed to get the other times is worth it. After any extra events 
    130 are added, let's see if we can get any more of these summary times, 
    131 perhaps by adding a minor event emitted just once. Note that the INIT time is 
    132 necessary for the Productivity figure below (INIT does not count as "productive 
    133 time" in +RTS -s).. 
     123We can sum up the GC time from GC events. We get the total of GC and MUT time (and PROF, etc.) as the time from the start of the first event to the (end of) the last event, so from the total and the GC time we can normally compute MUT. 
     124We can assume that INIT and EXIT are negligible (I wonder when they 
     125are not) and anyway they don't make sense for when we summarize an 
     126interval. If we insist on them, a separate event for each would be required. 
    134127 
    135128{{{ 
     
    152145The events added above should be enough. Again. we only do the elapsed case, so we'd show elapsed/elapsed, while the figures above 
    153146are cpu/cpu and cpu/elapsed. JaffCake thinks the latter mixture is OK. However, it mixes the productivity 
    154 of CPU mutation vs elapsed mutation with the productivity of mutation vs GC. In this light, out figure will not be that bad,  
    155 because it's consistent, even if not as accurate as the equally consistent first figure above.  
     147of CPU mutation vs elapsed mutation with the productivity of mutation vs GC. In this light, out figure will not be that bad, 
     148because it's consistent, even if not as accurate as the equally consistent first figure above. 
    156149 
    157150BTW, the fact that the second figure is higher (I can reproduce it), shows a problem with clocks or some other problem. 
    158151I'd guess the elapsed time should always be higher than the CPU time, shouldn't it? 
     152 
     153== The list of needed new events == 
     154 
     155TODO