Changes between Version 35 and Version 36 of RTSsummaryEvents

Show
Ignore:
Timestamp:
04/06/12 21:11:51 (15 months ago)
Author:
MikolajKonarski (IP: 95.160.111.162)
Comment:

Paste one of the emails about bad event ordering

Legend:

Unmodified
Added
Removed
Modified
  • RTSsummaryEvents

    v35 v36  
    269269I'd guess the elapsed time should always be higher than the CPU time, shouldn't it? 
    270270 
    271 == The list of considered new events or even parameters (not final) == 
     271== The list of considered new events (the implementation closely follows this draft, even though not all details match) == 
    272272 
    273273There is a wealth of statistics around heaps and GC. Some of the stats are reasonably common, shared by different implementations while many more are highly specific to a particular implementation. Even if we ignore RTSs other than GHC, we still have an issue of flexibility for future changes in GHC's RTS. 
     
    283283 * `EVENT_HEAP_LIVE (heap_capset, live_bytes)`: is the current amount of live/reachable data in the heap. This is almost certainly only known after a major GC. 
    284284 
    285  
    286285==== New GHC-specific general memory stats events ==== 
    287286 
     
    300299If in future we allow multiple independent heaps in the same OS process (e.g. separate RTS instances) then this scheme would let us cope by making a separate capset. Similarly it'd cope with implementations like GdH which use a global heap spanning multiple OS processes. 
    301300 
    302 ==== Tinkering ==== 
    303  
    304 While we tinker with these events, we could try to ensure 
    305 that RequestSeqGC and RequestParGC are emitted before their corresponding 
     301== Problems with the order of event in the eventlog == 
     302 
     303In the implemented version, we sidestep the issues with event order 
     304by tightly controlling the timestamps in the GC_START  
     305and GC_END events on the main GC cap (the one that requests GC) 
     306and by adding the GC_GLOBAL_SYNC event. 
     307We also treat the elapsed time delays needed to synchronise caps for parallel GC 
     308(both at GC start and GC end) as MUT time, instead of GC time, 
     309because while they are needed for GC, their cause is (slow)  
     310OS scheduling, just as for any other delays when switching threads. 
     311Below is a detailed discussion of the original GC event order 
     312and initial design choices considered. 
     313 
     314We can try to ensure that RequestSeqGC and RequestParGC are emitted before their corresponding 
    306315GC_START events, by using the following tip from JaffaCake:  
    307316"actually you could try moving the interruptAllCapabilities() call in Schedule.c:1500 down below the traceEvent calls".  
    308 Currently they can be emitted in arbitrary order and it complicates analysis,  
     317However, even with this implemented in GHC HEAD, the events can still 
     318be emitted in arbitrary order and it complicates analysis,  
    309319especially for selected intervals (and complicates and makes less strict 
    310 the future validation of the events). 
     320the future validation of events). 
     321 
     322The example of the event order issues below are on a system with 2 caps, on an eventlog 
     323with lots of small GCs. They show how GC times measured by "+RTS -s", 
     324that is from START_GC to the "GC stats" event, are different from TS times. 
     325The TS times are the average over all caps of the pause 
     326between START_GC, when the caps stop mutating, 
     327and END_GC, when they resume mutating. (Most of) the difference 
     328stems probably from the thread sleep/wake delays, some ending GC bookkeeping 
     329(including the time taken to emit GC and other events) 
     330and related overheads. 
     331 
     332It seems the TS times are more useful to the programmer that is mostly 
     333concerned in how much time is spend mutating and not how the wasted time 
     334is divided between the actual GC and the cap switching overhead. 
     335Moreover "+RTS -s" just adds the cap switching overhead to the MUT time 
     336hiding the fact that it's caused by GC. 
     337 
     338The differences accumulate and affect the "GC total elapsed time", 
     339"Avg pause" and "Max pause" columns of the GC table in "+RTS -s". 
     340They also influence the "Task" table (not reproduced in TS). 
     341They add up in total "GC time (elapsed)" and consequently in 
     342"MUT time (elapsed)" that is obtained by subtracting GC time from total time. 
     343In the result, they influence "Alloc rate" and "Productivity", too. 
     344The figures are seriously different between TS and "+RTS -s", 
     345except on small eventlogs. 
     346 
     347It seems for all these figures the TS version is the right one. 
     348In particular, if by "pause" we mean the time between 
     349GC_START and GC_END, by "Max pause" we neither mean 
     350the "maximum over all GCs of (minimal pause over all caps)", 
     351similarly as in +RTS -s (but not exactly the same, because 
     352even the minimal GC_END is usually a tick after "GC stats"), 
     353nor "maximum over all GCs of (maximal pause over all caps)", which 
     354we could easily show in TS and which could make sense for a real-time 
     355system with specialized, non-interchangeable caps, 
     356but "maximum over all GCs of (average pause over all caps)". 
     357 
     358Here's an example demonstrating END_GC 
     359for the previous GC following REQUEST_PAR_GC for the next GC on another cap. 
     360Note that the early END_GC on cap 1 still comes 1000 nanoseconds after 
     361the "GC stats" event. Even such small differences accumulate over all GCs: 
     362 
     363{{{ 
     3649366536000: cap 1: GC stats for heap capset 0: generation 0, 77552 
     365bytes copied, 226056 bytes slop, 696320 bytes fragmentation, 2 par 
     366threads, 71736 bytes max par copied, 77520 bytes total par copied 
     3679366537000: cap 1: finished GC 
     3689366538000: cap 1: spark stats: 1910358 created, 1228256 converted, 
     3691000 remaining (0 overflowed, 0 dud, 58306 GC'd, 144981 fizzled) 
     3709366541000: cap 1: running thread 3 
     3719367310000: cap 1: stopping thread 3 (heap overflow) 
     3729367312000: cap 1: requesting parallel GC 
     3739367316000: cap 0: finished GC 
     3749367317000: cap 0: spark stats: 1071418 created, 1292489 converted, 0 
     375remaining (0 overflowed, 0 dud, 41840 GC'd, 214904 fizzled) 
     3769367318000: cap 0: starting GC 
     3779368131000: cap 1: starting GC 
     378}}} 
     379 
     380Another example with END_GC on one of the caps coming very late: 
     381 
     382{{{ 
     3839378252000: cap 1: GC stats for heap capset 0: generation 0, 45160 
     384bytes copied, 58296 bytes slop, 1757184 bytes fragmentation, 2 par 
     385threads, 45080 bytes max par copied, 45112 bytes total par copied 
     3869378253000: cap 1: finished GC 
     3879378254000: cap 1: spark stats: 1913071 created, 1229807 converted, 
     388221 remaining (0 overflowed, 0 dud, 58923 GC'd, 144981 fizzled) 
     3899378255000: cap 1: waking up thread 3 on cap 1 
     3909378257000: cap 1: running thread 2885 
     3919378259000: cap 1: stopping thread 2885 (thread finished) 
     3929378268000: cap 1: running thread 3 
     3939378354000: cap 1: stopping thread 3 (blocked on black hole owned by 
     394thread 2884) 
     3959378355000: cap 1: creating thread 2886 
     3969378363000: cap 0: finished GC 
     397}}} 
     398 
     399Yet another example, this time there's a different event 
     400("allocated on heap capset") on the late cap, at the same time 
     401as "GC stats" an another cap and before END_GC finally arrives. 
     402Not sure how this can happen. The END_GC on the early cap comes here 
     403a bit later that usually (2000 nanoseconds after the "GC stats" event), too: 
     404 
     405{{{ 
     406836505000: cap 0: GC stats for heap capset 0: generation 0, 11584 
     407bytes copied, 205232 bytes slop, 1007616 bytes fragmentation, 2 par 
     408threads, 11472 bytes max par copied, 11536 bytes total par copied 
     409836505000: cap 1: allocated on heap capset 0: 180717856 total bytes till now 
     410836507000: cap 0: finished GC 
     411836507000: cap 0: spark stats: 5893 created, 114894 converted, 228 
     412remaining (0 overflowed, 0 dud, 13 GC'd, 22190 fizzled) 
     413836509000: cap 0: waking up thread 3 on cap 0 
     414836511000: cap 0: running thread 233 
     415836514000: cap 0: stopping thread 233 (thread finished) 
     416836521000: cap 1: finished GC 
     417}}} 
     418 
     419Finally an example where END_GC is quite late on both caps 
     420(3000 nanoseconds and 6000 nanoseconds): 
     421 
     422{{{ 
     42367582000: cap 1: GC stats for heap capset 0: generation 1, 58248 bytes 
     424copied, 40096 bytes slop, 1912832 bytes fragmentation, 2 par threads, 
     42551920 bytes max par copied, 58208 bytes total par copied 
     42667583000: cap 1: live data in heap capset 0: 82784 bytes 
     42767585000: cap 1: finished GC 
     42867585000: cap 1: spark stats: 15101 created, 2570 converted, 508 
     429remaining (0 overflowed, 0 dud, 879 GC'd, 12 fizzled) 
     43067588000: cap 0: finished GC 
     431}}}