Changes between Version 30 and Version 31 of RTSsummaryEvents

01/20/12 14:43:25 (2 years ago)
duncan (IP:

change (and add more detail) to new events


  • RTSsummaryEvents

    v30 v31  
    149149== The list of needed new events or even parameters == 
    151 A rough proposal, in particular the names are ad hoc and the units are provisional (e.g., words or blocks would be more natural for memory events, but TS does not know their sizes). 
    153151=== New memory stats events === 
    155 * MEM_ALLOCATED(number_of_bytes) emitted after (or just before?) each GC, with the number of bytes allocated in the heap since the previous GC, or a running total 
     153 * `EVENT_HEAP_ALLOCATED (bytes)`: is the total bytes allocated over the whole run by this HEC. That is we count allocation on a per-HEC basis. This event is in principle not tied to GC, it could be emitted any time. 
    157 * MEM_COPIED(number_of_bytes) emitted after each GC, with the number of bytes copied during that GC (this one could also be counted as a GC_ event, not MEM_ event, because it's only about the particular memory operations of a copying GC) 
     155 * `EVENT_HEAP_SIZE (bytes)`: is the current bytes allocated from the OS to use for the heap. For the current GHC RTS this is the `MBlock`s, kept in the `mblocks_allocated` var. Again, this in principle could be emitted any time. The maximum accuracy would be to emit the event exactly when MBlocks are allocated or freed. 
    159 * MEM_RESIDENCY(number_of_bytes) emitted after each major GC (too rarely, but after minor GC the figure is too inaccurate, probably), with the total number of bytes of memory actually used (as opposed to allocated from the OS) by the program 
     157 * `EVENT_HEAP_LIVE (bytes)`: is the current amount of live/reachable data in the heap. This is almost certainly only known after a major GC. 
    161 * MEM_SLOP(number_of_bytes) emitted after each GC, with the current slop in bytes 
     159 * `EVENT_GC_STATS (copied, slop, fragmentation)`: various less used GC stats (probably GHC specific, and specific to current GC design) 
    163 * MEM_TOTAL(number_of_bytes) emitted after each GC, with the total number of mblocks allocated from the OS 
     161=== Identifying heaps in eventlogs === 
    165 * MEM_TOTAL_BLOCK(number_of_bytes) emitted after each GC, with the total memory of blocks allocated inside the mblocks allocated from the OS, used to calculated the memory lost due to fragmentation of mblocks 
     163In the above events, the "allocated since prog start" is done per-HEC, but the heap total size and live data size apply to the heap as a whole, not a particular HEC. 
     165For completeness / future-proofing it may be wise to explicitly identify heaps and to have the heap size/live events tag the heap to which they apply. Remember that we can merge event logs from multiple processes, so there is already no truly global notion of heap, implicitly it would be the single heap belonging to the HEC that emits the event. We would also have to make the assumption that there is a single heap per OS process (we can already identify which HECs belong to the same OS process). Alternatively we can explicitly identify heaps using the existing capset (capability set) mechanism. We would add a new capset type: 
     169 * Capset type values for EVENT_CAPSET_CREATE 
     170 */ 
     171#define CAPSET_TYPE_CUSTOM      1  /* reserved for end-user applications */ 
     172#define CAPSET_TYPE_OSPROCESS   2  /* caps belong to the same OS process */ 
     173#define CAPSET_TYPE_CLOCKDOMAIN 3  /* caps share a local clock/time      */ 
     174#define CAPSET_TYPE_GCHEAP      4  /* caps share a GC'd heap */ 
     177We would then make a capset for the main heap and add all HECs to it. The `EVENT_HEAP_SIZE` and `EVENT_HEAP_LIVE` events would then have a capset argument to indicate the heap. 
     179If in future we allow multiple independent heaps in the same OS process (e.g. separate RTS instances) then this would let us cope. Similarly it'd cope with implementations like GdH which use a global heap spanning multiple OS processes. Would it be useful for talking about per-HEC local heaps? 
    167181=== New parameters for GC stats events === 
    169 Extra "generation" parameter is needed for one of GC_START or GC_END. 
    170 If we have the extra parameter in both, we can do more with partial eventlogs 
    171 that lack starts of ends of some GCs, and it's easier to calculate stats 
    172 for selected time intervals. When the RequestSeqGC and RequestParGC events 
    173 are emitted, it's not yet know if the GC will be major or minor, so no extra 
    174 parameters should be added to them. 
     183 * modify `EVENT_GC_START` to add a `(generation)` field. The generation number in a generational GC scheme. Use -1 if not applicable. 
     185When the RequestSeqGC and RequestParGC events are emitted, it's not yet know if the GC will be major or minor, so no extra parameters should be added to them. 
    176187While we tinker with these events, we could try to ensure