Changes between Version 4 and Version 5 of Ticket #9476, comment 76


Ignore:
Timestamp:
Jan 15, 2019 3:51:27 PM (8 months ago)
Author:
sgraf
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #9476, comment 76

    v4 v5  
    11OK, it turns out that `paraffins` regresses by 3.9% in runtime when allowing arbitrary closure growth because of the very same function I pinned down as the reason for the alleged speed-up in comment:56. It seems like we don't want to lift the inner loop of such a list comprehension.
    22
    3 On the other hand, there's `gen_regexp` which improves by 2-5% in runtime ([https://gist.github.com/sgraf812/ef99046ca713c214a53a669f1e69b27c here are the nofib results for `make mode=slow NoFibRuns=30`]) when we lift the next-to-inner function of a list comprehension (`go_s7zy` below):
    4 
    5 {{{
    6 let {
    7   go_s7zy =
    8       sat-only \r [x1_s7zz]
    9           case ># [x1_s7zz y_s7zx] of {
    10             __DEFAULT ->
    11                 case chr# [x1_s7zz] of ds13_s7zB {
    12                   __DEFAULT ->
    13                       let { ds14_s7zC = CCCS C#! [ds13_s7zB]; } in
    14                       let { z_s7zD = \u [] case +# [x1_s7zz 1#] of sat_s7zE { __DEFAULT -> go_s7zy sat_s7zE; }; } in
    15                       let {
    16                         go1_s7zF =
    17                             sat-only \r [ds15_s7zG]
    18                                 case ds15_s7zG of {
    19                                   [] -> z_s7zD;
    20                                   : y1_s7zI ys_s7zJ ->
    21                                       let { sat_s7zL = \u [] go1_s7zF ys_s7zJ; } in
    22                                       let { sat_s7zK = CCCS :! [ds14_s7zC y1_s7zI]; } in  : [sat_s7zK sat_s7zL];
    23                                 };
    24                       } in  go1_s7zF lvl13_s7zw;
    25                 };
    26             1# -> [] [];
    27           };
    28 } in  case ord# [c1_s7za] of sat_s7zM { __DEFAULT -> go_s7zy sat_s7zM; };
    29 }}}
    30 
    31 Lifting `go_s7zy` leads to a very slight increase in allocation (so would be disallowed from the perspective of closure growth), but a decrease in runtime. Maybe this is some low-level thing again, I don't know. The difference in Cmm is as expected and I wouldn't have expected a speed-up.
     3On the other hand, there's `gen_regexp` which improves by 2-5% in runtime ([https://gist.github.com/sgraf812/ef99046ca713c214a53a669f1e69b27c here are the nofib results for `make mode=slow NoFibRuns=30`]) when we lift the next-to-inner function of a list comprehension. ''Edit: It turns out this was another case of  #15999 which I rectified in [https://gitlab.haskell.org/ghc/nofib/merge_requests/12 nofib MR 12].''
    324
    335A similar situation arises in `grep`, where lifting the inner function of [https://github.com/ghc/nofib/blob/8632268ad8405f0c01aaad3ad16e23c65771ba49/real/grep/Main.lhs#L141 this list comprehension] leads to the substantial 3.2% speed-up and 7.2% improvement in allocations. I suspect that the allocation only happens on the cold code path (the case when a matching character was in the list), but it's hard to tell.
     
    3810
    3911- leads to an improvement of ~0.1% in runtime and counted instructions (though the two don't necessarily coincide).
    40 - also leads to unpredictable regressions in total allocations, like +28.6% in `wheel-sieve1`. From the regressions I investigated (`gen_regexp`, for example), the bindings the lifting decisions of which are responsible for the biggest regression in allocations (+10% due to the inner loop above, `go1_s7zF`) are not necessarily the same bindings which contribute the speed-ups (`go_s7zy` above).
     12- also leads to unpredictable regressions in total allocations, like +28.6% in `wheel-sieve1`. From the regressions I investigated, the bindings the lifting decisions of which are responsible for the biggest regression in allocations are not necessarily the same bindings which contribute the speed-ups.
    4113
    4214I'm not sure how to come up with a criterion that separates the two classes (beneficial to lift vs. not) of bindings, so I suggest we stick to the current closure growth heuristic. Which is a little more conservative than necessary, but avoids arbitrary regressions in allocations on STG level.