| 310 | | the future validation of the events). |
| | 320 | the future validation of events). |
| | 321 | |
| | 322 | The example of the event order issues below are on a system with 2 caps, on an eventlog |
| | 323 | with lots of small GCs. They show how GC times measured by "+RTS -s", |
| | 324 | that is from START_GC to the "GC stats" event, are different from TS times. |
| | 325 | The TS times are the average over all caps of the pause |
| | 326 | between START_GC, when the caps stop mutating, |
| | 327 | and END_GC, when they resume mutating. (Most of) the difference |
| | 328 | stems probably from the thread sleep/wake delays, some ending GC bookkeeping |
| | 329 | (including the time taken to emit GC and other events) |
| | 330 | and related overheads. |
| | 331 | |
| | 332 | It seems the TS times are more useful to the programmer that is mostly |
| | 333 | concerned in how much time is spend mutating and not how the wasted time |
| | 334 | is divided between the actual GC and the cap switching overhead. |
| | 335 | Moreover "+RTS -s" just adds the cap switching overhead to the MUT time |
| | 336 | hiding the fact that it's caused by GC. |
| | 337 | |
| | 338 | The differences accumulate and affect the "GC total elapsed time", |
| | 339 | "Avg pause" and "Max pause" columns of the GC table in "+RTS -s". |
| | 340 | They also influence the "Task" table (not reproduced in TS). |
| | 341 | They add up in total "GC time (elapsed)" and consequently in |
| | 342 | "MUT time (elapsed)" that is obtained by subtracting GC time from total time. |
| | 343 | In the result, they influence "Alloc rate" and "Productivity", too. |
| | 344 | The figures are seriously different between TS and "+RTS -s", |
| | 345 | except on small eventlogs. |
| | 346 | |
| | 347 | It seems for all these figures the TS version is the right one. |
| | 348 | In particular, if by "pause" we mean the time between |
| | 349 | GC_START and GC_END, by "Max pause" we neither mean |
| | 350 | the "maximum over all GCs of (minimal pause over all caps)", |
| | 351 | similarly as in +RTS -s (but not exactly the same, because |
| | 352 | even the minimal GC_END is usually a tick after "GC stats"), |
| | 353 | nor "maximum over all GCs of (maximal pause over all caps)", which |
| | 354 | we could easily show in TS and which could make sense for a real-time |
| | 355 | system with specialized, non-interchangeable caps, |
| | 356 | but "maximum over all GCs of (average pause over all caps)". |
| | 357 | |
| | 358 | Here's an example demonstrating END_GC |
| | 359 | for the previous GC following REQUEST_PAR_GC for the next GC on another cap. |
| | 360 | Note that the early END_GC on cap 1 still comes 1000 nanoseconds after |
| | 361 | the "GC stats" event. Even such small differences accumulate over all GCs: |
| | 362 | |
| | 363 | {{{ |
| | 364 | 9366536000: cap 1: GC stats for heap capset 0: generation 0, 77552 |
| | 365 | bytes copied, 226056 bytes slop, 696320 bytes fragmentation, 2 par |
| | 366 | threads, 71736 bytes max par copied, 77520 bytes total par copied |
| | 367 | 9366537000: cap 1: finished GC |
| | 368 | 9366538000: cap 1: spark stats: 1910358 created, 1228256 converted, |
| | 369 | 1000 remaining (0 overflowed, 0 dud, 58306 GC'd, 144981 fizzled) |
| | 370 | 9366541000: cap 1: running thread 3 |
| | 371 | 9367310000: cap 1: stopping thread 3 (heap overflow) |
| | 372 | 9367312000: cap 1: requesting parallel GC |
| | 373 | 9367316000: cap 0: finished GC |
| | 374 | 9367317000: cap 0: spark stats: 1071418 created, 1292489 converted, 0 |
| | 375 | remaining (0 overflowed, 0 dud, 41840 GC'd, 214904 fizzled) |
| | 376 | 9367318000: cap 0: starting GC |
| | 377 | 9368131000: cap 1: starting GC |
| | 378 | }}} |
| | 379 | |
| | 380 | Another example with END_GC on one of the caps coming very late: |
| | 381 | |
| | 382 | {{{ |
| | 383 | 9378252000: cap 1: GC stats for heap capset 0: generation 0, 45160 |
| | 384 | bytes copied, 58296 bytes slop, 1757184 bytes fragmentation, 2 par |
| | 385 | threads, 45080 bytes max par copied, 45112 bytes total par copied |
| | 386 | 9378253000: cap 1: finished GC |
| | 387 | 9378254000: cap 1: spark stats: 1913071 created, 1229807 converted, |
| | 388 | 221 remaining (0 overflowed, 0 dud, 58923 GC'd, 144981 fizzled) |
| | 389 | 9378255000: cap 1: waking up thread 3 on cap 1 |
| | 390 | 9378257000: cap 1: running thread 2885 |
| | 391 | 9378259000: cap 1: stopping thread 2885 (thread finished) |
| | 392 | 9378268000: cap 1: running thread 3 |
| | 393 | 9378354000: cap 1: stopping thread 3 (blocked on black hole owned by |
| | 394 | thread 2884) |
| | 395 | 9378355000: cap 1: creating thread 2886 |
| | 396 | 9378363000: cap 0: finished GC |
| | 397 | }}} |
| | 398 | |
| | 399 | Yet another example, this time there's a different event |
| | 400 | ("allocated on heap capset") on the late cap, at the same time |
| | 401 | as "GC stats" an another cap and before END_GC finally arrives. |
| | 402 | Not sure how this can happen. The END_GC on the early cap comes here |
| | 403 | a bit later that usually (2000 nanoseconds after the "GC stats" event), too: |
| | 404 | |
| | 405 | {{{ |
| | 406 | 836505000: cap 0: GC stats for heap capset 0: generation 0, 11584 |
| | 407 | bytes copied, 205232 bytes slop, 1007616 bytes fragmentation, 2 par |
| | 408 | threads, 11472 bytes max par copied, 11536 bytes total par copied |
| | 409 | 836505000: cap 1: allocated on heap capset 0: 180717856 total bytes till now |
| | 410 | 836507000: cap 0: finished GC |
| | 411 | 836507000: cap 0: spark stats: 5893 created, 114894 converted, 228 |
| | 412 | remaining (0 overflowed, 0 dud, 13 GC'd, 22190 fizzled) |
| | 413 | 836509000: cap 0: waking up thread 3 on cap 0 |
| | 414 | 836511000: cap 0: running thread 233 |
| | 415 | 836514000: cap 0: stopping thread 233 (thread finished) |
| | 416 | 836521000: cap 1: finished GC |
| | 417 | }}} |
| | 418 | |
| | 419 | Finally an example where END_GC is quite late on both caps |
| | 420 | (3000 nanoseconds and 6000 nanoseconds): |
| | 421 | |
| | 422 | {{{ |
| | 423 | 67582000: cap 1: GC stats for heap capset 0: generation 1, 58248 bytes |
| | 424 | copied, 40096 bytes slop, 1912832 bytes fragmentation, 2 par threads, |
| | 425 | 51920 bytes max par copied, 58208 bytes total par copied |
| | 426 | 67583000: cap 1: live data in heap capset 0: 82784 bytes |
| | 427 | 67585000: cap 1: finished GC |
| | 428 | 67585000: cap 1: spark stats: 15101 created, 2570 converted, 508 |
| | 429 | remaining (0 overflowed, 0 dud, 879 GC'd, 12 fizzled) |
| | 430 | 67588000: cap 0: finished GC |
| | 431 | }}} |