Opened 8 years ago

Last modified 11 months ago

#6087 infoneeded bug

Join points need strictness analysis

Reported by: simonpj Owned by:
Priority: normal Milestone: 8.10.1
Component: Compiler Version: 7.4.1
Keywords: JoinPoints, DemandAnalysis Cc: stefan@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I came across this code in the nofib/spectral progam knights, in the code for KnightHeuristic.possibleMoves:

let {
  $j_sU3
    :: GHC.Types.Int -> GHC.Types.Int -> [KnightHeuristic.Direction]
  [LclId, Arity=2]
  $j_sU3 =
    \ (ww2_sRl :: GHC.Types.Int) (ww3_sRm :: GHC.Types.Int) ->
      case ww2_sRl of ww4_XQY { GHC.Types.I# ww5_sQU ->
      case GHC.Prim.>=# ww5_sQU 1 of _ {
        GHC.Types.False -> go1_azF ys_azM;
        GHC.Types.True ->
          case ds1_azR of _ { GHC.Types.I# y1_azp ->
          case GHC.Prim.<=# ww5_sQU y1_azp of _ {
            GHC.Types.False -> go1_azF ys_azM;
            GHC.Types.True ->
              case ww3_sRm of wild6_XAB { GHC.Types.I# x_XAE ->
              case GHC.Prim.>=# x_XAE 1 of _ {
                GHC.Types.False -> go1_azF ys_azM;
                GHC.Types.True ->
                  case GHC.Prim.<=# x_XAE y1_azp of _ {
                    GHC.Types.False -> go1_azF ys_azM;
                    GHC.Types.True ->
                      case GHC.List.notElem
                             @ ChessSetList.Tile
                             ChessSetList.isSquareFree_$dEq
                             (ww4_XQY, wild6_XAB)
                             wild2_azW
                      of _ {
                        GHC.Types.False -> go1_azF ys_azM;
                        GHC.Types.True ->
                          GHC.Types.:
                            @ KnightHeuristic.Direction y_azL (go1_azF ys_azM)
                      }
                  }
              }
              }
          }
          }
      }
      } } in
case y_azL of _ {
  KnightHeuristic.UL ->
    $j_sU3
      (case ww_sQJ of _ { GHC.Types.I# x_aya ->
       GHC.Types.I# (GHC.Prim.-# x_aya 1)
       })
      (case ww1_sQK of _ { GHC.Types.I# x_aya ->
       GHC.Types.I# (GHC.Prim.-# x_aya 2)
       });
  KnightHeuristic.UR ->
    $j_sU3
      (case ww_sQJ of _ { GHC.Types.I# x_ayl ->
       GHC.Types.I# (GHC.Prim.+# x_ayl 1)
       })
      (case ww1_sQK of _ { GHC.Types.I# x_aya ->
       GHC.Types.I# (GHC.Prim.-# x_aya 2)
       });

...lots more similar case alternatives follow...

Just look at all those Ints getting boxed, and the immediately unboxed by the join point! The strictness analyser would spot this easily, but presumably the join point is constructed after strictness analysis.

There must be an opportunity here to run a later strictness analysis pass.

Attachments (3)

analyse (211.7 KB) - added by osa1 23 months ago.
nofib using GHC HEAD, compare default with -flate-dmd-anal
Main_stock.dump-stg (202.0 KB) - added by osa1 22 months ago.
cse
Main_late_dmd.dump-stg (203.0 KB) - added by osa1 22 months ago.
cse late demand analysis

Download all attachments as: .zip

Change History (39)

comment:1 Changed 8 years ago by stefan

Cc: stefan@… added

comment:2 Changed 7 years ago by igloo

Milestone: 7.6.17.6.2

comment:3 Changed 7 years ago by nfrisby

A second demand analysis at the end of the pipeline does help here.

                 let {
                    $j_sRv [Dmd=<C(C(S)),U>]
                      :: GHC.Types.Int
                         -> GHC.Types.Int
                         -> [KnightHeuristic.Direction]
    Entries     Allocs    Alloc'd Arity    Kinds      Function                      
--------------------------------------------------------------------------------
       6003      51444       1334      1    L          go{v s12q} (main:KnightHeuristic)
       5336      24190          0      2    SS         $j{v s12J} (main:KnightHeuristic)

becomes

       6003      24764       1334      1    L          go{v s187} (main:KnightHeuristic)
       5336      31464          0      2    iS         $w$j{v s18r} (main:KnightHeuristic)

This is ~ -7% overall change in allocation.

At O2 (and without my more precise ticky numbers), the opportunity for this improvement gets a bit clobbered, but others apparently show up.

        567      20851      1     0    L          go{v s1Ba} (main:KnightHeuristic)
        400       5200      2     0    ET         main:KnightHeuristic.move{v rfN}
        110       2274      5     0    SSiSL      main:KnightHeuristic.$wdescendents{v r1qa}
        168       1002      1     0    S          main:KnightHeuristic.descAndNo{v rfU}

becomes

        567      18204      1     0    L          go{v s1MK} (main:KnightHeuristic)

        111        872      5     0    SSiSL      main:KnightHeuristic.$wdescendents{v r1Cl}
        169        835      4     0    SSTL       main:KnightHeuristic.$wdescAndNo{v r1Cp}

For -3.8% allocation.

comment:4 Changed 5 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:5 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:6 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:7 Changed 4 years ago by thomie

Milestone: 8.0.1

comment:8 Changed 3 years ago by simonpj

Keywords: JoinPoints added

comment:9 Changed 3 years ago by lukemaurer

This seems to have gone away; AFAICT, nowadays the join point is never created.

comment:10 Changed 3 years ago by dfeuer

This does not seem to be an issue for this case after applying Luke's patch either (no join point and no problem). Are there reasons to suspect this is still an issue? If so, we need a new test case; otherwise I think we can close this ticket.

comment:11 Changed 3 years ago by dfeuer

Milestone: 8.2.1
Status: newinfoneeded

comment:12 Changed 3 years ago by bgamari

Milestone: 8.2.18.4.1

Given that 8.2.1-rc1 is imminent, I'm bumping these off to the 8.4

comment:13 Changed 23 months ago by bgamari

Simon, do you know the status of this?

comment:14 Changed 23 months ago by bgamari

Milestone: 8.4.18.6.1

This ticket won't be resolved in 8.4; remilestoning for 8.6. Do holler if you are affected by this or would otherwise like to work on it.

comment:15 Changed 23 months ago by simonpj

Huh. We have a flag -flate-dmd-anal, but it does not seem to be enabled by -O or -O2.

Could someone do a nofib run to see if it makes a difference?

comment:16 Changed 23 months ago by bgamari

osa1, can you take care of this?

comment:17 Changed 23 months ago by osa1

GHC 8.2.2 does not generate such Int allocations with -O2. -flate-dmd-anal makes no difference in possibleMoves output. I think we can close this as fixed.

comment:18 Changed 23 months ago by simonpj

Hang on. So late-dmd-anal may make no difference in this particular, but I bet it makes a difference sometimes. And if so we should switch it on for -02.

Could you do a nofib run with and without late-dmd-anal, to see?

Changed 23 months ago by osa1

Attachment: analyse added

nofib using GHC HEAD, compare default with -flate-dmd-anal

comment:19 Changed 23 months ago by osa1

I've attached the nofib-analyse output. For reference, here's the wiki page on late demand analysis: https://ghc.haskell.org/trac/ghc/wiki/LateDmd.

comment:20 Changed 23 months ago by simonpj

Interesting.

  • These are -O vs -O + late-dmd-anal I assume. It'd be worth doing the same with -O2 vis -O2 + late-dmd-anal
  • Did you recompile all the libraries, or just nofib?
  • A 4.6% increase in compiler allocations and compile time.
  • We get some small wins.
  • Can you see what is happening with the 5% allocation increase in mate?
  • I think that HEAD has indeed abandoned the "clever .hi file scheme", so we don't need to worry about that part.

comment:21 in reply to:  20 Changed 23 months ago by osa1

These are -O vs -O + late-dmd-anal I assume. It'd be worth doing the same with -O2 vis -O2 + late-dmd-anal

nofib runs with -O2 by default. Here's an example command that nofib runs to compile of the benchmarks:

ghc-stage2 -O2 -Rghc-timing -H32m -hisuf hi -rtsopts -c Main.hs -o Main.o

with -flate-dmd-anal:

ghc-stage2 -O2 -Rghc-timing -H32m -hisuf hi -flate-dmd-anal -rtsopts -c Main.hs -o Main.o

comment:22 Changed 23 months ago by osa1

Did you recompile all the libraries, or just nofib?

Just nofib.

Can you see what is happening with the 5% allocation increase in mate?

Sure, I'll update.

comment:23 Changed 23 months ago by osa1

In mate with -flate-dmd-anal we get lots of new worker functions and the final program is quite different than the one without -flate-dmd-anal. I couldn't find yet what's causing the extra allocations as the changes are so big.

comment:24 Changed 22 months ago by simonpj

We agreed that Omer would invest a small amount of effort to see if any of the other benchmarks that have increased are easier to understand

comment:25 Changed 22 months ago by osa1

I looked at cse and circsim which are smaller programs with increased allocation.

Really the only difference I could find after a few hours of investigation is with -flate-dmd-anal we get new worker functions that return head and tail of a list in an unboxed tuple instead of returning a list. As far as I can see, the result is almost always packed back to a list at some point (somehwere in upper levels in the call stack). Overall the allocation should stay the same (because we allocate same number of cons cells, but allocate them in different places than we would without -flate-dmd-anal), so maybe this doesn't explain the allocation increase. Here's an example from cse, this is without late demand analysis:

$wgo1_r6PF :: [GHC.Types.Char] -> GHC.Prim.Int# -> [GHC.Types.Char]
[GblId, Arity=2, Str=<S,1*U><S,1*U>m2, Unf=OtherCon []] =
    sat-only [] \r [w_s6Uz ww_s6UA]
        case w_s6Uz of {
          [] -> $wxs1_r6PE ww_s6UA;
          : y_s6UC [Occ=Once*] ys_s6UD [Occ=Once] ->
              case ww_s6UA of ds11_s6UE {
                __DEFAULT ->
                    let {
                      sat_s6UG [Occ=Once] :: [GHC.Types.Char]
                      [LclId] =
                          [ys_s6UD ds11_s6UE] \u []
                              case -# [ds11_s6UE 1#] of sat_s6UF {
                                __DEFAULT -> $wgo1_r6PF ys_s6UD sat_s6UF;
                              };
                    } in  : [y_s6UC sat_s6UG];
                1# -> : [y_s6UC n_r6Py];
              };
        };

We generate this version with -flate-dmd-anal:

$w$wgo_r6Yf :: [GHC.Types.Char] -> GHC.Prim.Int# -> (# GHC.Types.Char, [GHC.Types.Char] #)
[GblId, Arity=2, Str=<S,1*U><S,1*U>, Unf=OtherCon []] =
    sat-only [] \r [w_s73c w1_s73d]
        case w_s73c of {
          [] -> $w$wxs_r6Ye w1_s73d;
          : y_s73f [Occ=Once*] ys_s73g [Occ=Once] ->
              case w1_s73d of ds11_s73h {
                __DEFAULT ->
                    let {
                      sat_s73m [Occ=Once] :: [GHC.Types.Char]
                      [LclId] =
                          [ys_s73g ds11_s73h] \u []
                              case -# [ds11_s73h 1#] of sat_s73i {
                                __DEFAULT ->
                                    case $w$wgo_r6Yf ys_s73g sat_s73i of {
                                      (#,#) ww1_s73k [Occ=Once] ww2_s73l [Occ=Once] ->
                                          : [ww1_s73k ww2_s73l];
                                    };
                              };
                    } in  (#,#) [y_s73f sat_s73m];
                1# -> (#,#) [y_s73f n_r6Y9];
              };
        };

Notice that in the recursive case we do the same thing in both cases, but with -flate-dmd-anal we pack the result in call site rather than returning packed result in the recursive call. The call sites look like this: (large code, pasting only the relevant parts)

let {
  sat_s757 [Occ=Once, Dmd=<L,1*U>] :: [GHC.Types.Char]
  [LclId] =
      [ww_s751] \s []
          case $w$wgo_r6Yf ww_s751 4# of {
            (#,#) ww3_s755 [Occ=Once] ww4_s756 [Occ=Once] ->
                : [ww3_s755 ww4_s756];
          };
} in  GHC.Base.++ ds10_r6Y7 sat_s757;

So we directly pack it again. Same transformation happens in circsim too. New worker:

$w$wxs_rb9n :: GHC.Prim.Int# -> (# Main.Boolean, [Main.Boolean] #)
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,1*U>, Unf=OtherCon []] =
    sat-only \r [w_sbe6]
        case w_sbe6 of ds1_sbe7 {
          __DEFAULT ->
              let {
                sat_sbec [Occ=Once] :: [Main.Boolean]
                [LclId] =
                    \u []
                        case -# [ds1_sbe7 1#] of sat_sbe8 {
                          __DEFAULT ->
                              case $w$wxs_rb9n sat_sbe8 of {
                                (#,#) ww1_sbea [Occ=Once] ww2_sbeb [Occ=Once] ->
                                    : [ww1_sbea ww2_sbeb];
                              };
                        };
              } in  (#,#) [Main.T sat_sbec];
          1# -> (#,#) [Main.T GHC.Types.[]];
        };

exactly the same thing, recursive case directly packs the result. Call site:

case
    $w$wxs_rb9n y1_sbA8
of
{ (#,#) ww1_sbAc [Occ=Once] ww2_sbAd [Occ=Once] ->
      : [ww1_sbAc ww2_sbAd];
};

again result is directly packed.

This happens in several places in both programs.

I compared outputs both by mapping STG functions in outputs to each other manually and comparing the definitions, and also by using a diff tool to see if there are any extra code blocks etc. other than the ones I've already checked manually. These changes seem to be all unless I'm missing anything.

I wonder what happens if I disable CPR in post-late-ww pass and only do worker-wrapper (not sure if it's currently possible but in theory it should be?). This should avoid the transformations described above, maybele still winning in the other cases?

comment:26 Changed 22 months ago by simonpj

Interesting. Some thoughts

  • CPR for sum types was introduced in #5075.
  • The initial motivation was for non-recursive types like Maybe and Either. Here it seems quite likely that the caller will case-analyse the result, so exposing the result constructor to the caller is good.
  • You point out that for recursive types, like lists, we typically don't case-analyse the result, so nothing is gained by the CPR. Maybe we should disable CPR for recursive sum types.
  • It's hard to come with a rigorous notion of what a "recursive" sum-type is, but any reasonable approximation will do.
  • Disabling CPR for late-dmd-anal seems unprincipled, and doesn't catch the cases described in #5075.
  • I note that comment:10 of #5075 identifies that the transformation is switched off for non-top-level definitions. See Note [CPR for sum types] in DmdAnal. But now that we have join points, the objection may no longer apply. It'd be interesting to try.
  • comment:12 of #5075 identifies three more tickets that claim to benefit from late-dmd-anal. Let's see if they do.
  • cse and circsim really do allocate (a little) more. But in your investigation you didn't find any allocation differences. Doesn't -ticky nail it for you?

comment:27 Changed 22 months ago by osa1

cse and circsim really do allocate (a little) more. But in your investigation you didn't find any allocation differences. Doesn't -ticky nail it for you?

Just compared original vs. late demand analysis version of the same function in mate with help from ticky profiler and found the source of allocation.

(the function is huge, copying the important bits. Both functions have same number of entries)

Original function, important bit is how the second argument (Int) used:

$warg_sciK [InlPrag=NOUSERINLINE[0],
            Occ=OnceL*!,
            Dmd=<L,C(C1(C1(C1(U))))>]
  :: Board.Kind
     -> GHC.Types.Int
     -> GHC.Types.Int
     -> GHC.Types.Bool
     -> GHC.Types.Bool
[LclId,
 Arity=4,
 Str=<S,1*U><S(S),U(U)><L,U(U)><L,1*U>,
 Unf=OtherCon []] =
    sat-only [w_scin
              ww_scio
              ww1_scip
              ds_sciq
              lvl17_scis
              lvl18_sciB] \r [ww2_sciL ww3_sciM ww4_sciN w1_sciO]
        let {
          lvl19_sciP [Occ=Once*!, Dmd=<L,1*U>] :: GHC.Types.Bool
          [LclId] =
              [ww_scio ww1_scip ds_sciq ww3_sciM] \s []
                  let {
                    lvl20_sciQ [Occ=OnceL*!, Dmd=<L,U(U)>] :: GHC.Types.Int
                    [LclId] =
                        [ds_sciq ww3_sciM] \u []
                            case ww3_sciM of wild_sciR {
                              GHC.Types.I# x1_sciS [Occ=Once] ->
                                  case ds_sciq of {
                                    (,) xk_sciU [Occ=Once!] _ [Occ=Dead] ->
                                        case xk_sciU of wild2_sciW {
                                          GHC.Types.I# y1_sciX [Occ=Once] ->
                                              case <=# [x1_sciS y1_sciX] of {
                                                __DEFAULT -> wild_sciR;
                                                1# -> wild2_sciW;
                                              };
                                        };
                                  };
                            }; } in
                  let {
                    lvl21_sciZ [Occ=OnceL*!, Dmd=<L,U(U)>] :: GHC.Types.Int
                    [LclId] =
                        [ds_sciq ww3_sciM] \u []
                            case ww3_sciM of wild_scj0 {
                              GHC.Types.I# x1_scj1 [Occ=Once] ->
                                  case ds_sciq of {
                                    (,) xk_scj3 [Occ=Once!] _ [Occ=Dead] ->
                                        case xk_scj3 of wild2_scj5 {
                                          GHC.Types.I# y1_scj6 [Occ=Once] ->
                                              case <=# [x1_scj1 y1_scj6] of {
                                                __DEFAULT -> wild2_scj5;
                                                1# -> wild_scj0;
                                              };
                                        };
                                  };
                            }; } in ...

It's used strictly, but also returned in boxed form in __DEFAULT case of first case expression and 1# case of the second.

Now with late demand analysis:

$w$warg_sd96 [InlPrag=NOUSERINLINE[0],
              Occ=OnceL*!,
              Dmd=<L,C(C1(C1(C1(U))))>]
  :: Board.Kind
     -> GHC.Prim.Int#
     -> GHC.Types.Int
     -> GHC.Types.Bool
     -> GHC.Types.Bool
[LclId,
 Arity=4,
 Str=<S,1*U><S,U><L,U(U)><L,1*U>,
 Unf=OtherCon []] =
    sat-only [w_sd8J
              ww_sd8K
              ww1_sd8L
              ds_sd8M
              lvl17_sd8O
              lvl18_sd8X] \r [w1_sd97 ww2_sd98 w2_sd99 w3_sd9a]
        let {
          ww3_sd9b [Dmd=<L,U(U)>] :: GHC.Types.Int
          [LclId, Unf=OtherCon []] =
              CCCS GHC.Types.I#! [ww2_sd98]; } in
        let {
          lvl19_sd9c [Occ=Once*!, Dmd=<L,1*U>] :: GHC.Types.Bool
          [LclId] =
              [ww_sd8K ww1_sd8L ds_sd8M ww2_sd98 ww3_sd9b] \s []
                  let {
                    lvl20_sd9d [Occ=OnceL*!, Dmd=<L,U(U)>] :: GHC.Types.Int
                    [LclId] =
                        [ds_sd8M ww2_sd98 ww3_sd9b] \u []
                            case ds_sd8M of {
                              (,) xk_sd9f [Occ=Once!] _ [Occ=Dead] ->
                                  case xk_sd9f of wild1_sd9h {
                                    GHC.Types.I# y1_sd9i [Occ=Once] ->
                                        case <=# [ww2_sd98 y1_sd9i] of {
                                          __DEFAULT -> ww3_sd9b;
                                          1# -> wild1_sd9h;
                                        };
                                  };
                            }; } in
                  let {
                    lvl21_sd9k [Occ=OnceL*!, Dmd=<L,U(U)>] :: GHC.Types.Int
                    [LclId] =
                        [ds_sd8M ww2_sd98 ww3_sd9b] \u []
                            case ds_sd8M of {
                              (,) xk_sd9m [Occ=Once!] _ [Occ=Dead] ->
                                  case xk_sd9m of wild1_sd9o {
                                    GHC.Types.I# y1_sd9p [Occ=Once] ->
                                        case <=# [ww2_sd98 y1_sd9p] of {
                                          __DEFAULT -> wild1_sd9o;
                                          1# -> ww3_sd9b;
                                        };
                                  };
                            }; } in ...

now the second argument is Int# instead of Int as before, but we pack it in the first let because we need to return it boxed (_DEFAULT case of first case expression and 1# case of the second case).

This is the only extra source of allocation in mate as far as I can see from the ticky output so this should be responsible for 5% increase in mate.

comment:28 Changed 22 months ago by osa1

Now that I understand ticky profiling better I also took another look at cse again.

The difference is is these two functions:

    Entries      Alloc    Alloc'd  Non-void Arguments      STG Name
--------------------------------------------------------------------------------
        524      12072          0   2 Li                   $wgo1{v r6T8} (main:Main) (fun)
        394       9984          0   2 LL                   $wdraw{v r6Tg} (main:Main) (fun)
total:  918      22056

after late demand analysis this becomes:

    Entries      Alloc    Alloc'd  Non-void Arguments      STG Name
--------------------------------------------------------------------------------
        524      11424          0   2 Li                   $w$wgo{v r71J} (main:Main) (fun)
        394      12768          0   2 LL                   $wdraw{v r71R} (main:Main) (fun)
total:  918      24192

Definitions: (common parts in wdraw functions removed)

$wgo1_r6T8 :: [GHC.Types.Char] -> GHC.Prim.Int# -> [GHC.Types.Char]
[GblId, Arity=2, Str=<S,1*U><S,1*U>m2, Unf=OtherCon []] =
    sat-only [] \r [w_s6XW ww_s6XX]
        case w_s6XW of {
          [] -> $wxs1_r6T7 ww_s6XX;
          : y_s6XZ [Occ=Once*] ys_s6Y0 [Occ=Once] ->
              case ww_s6XX of ds11_s6Y1 {
                __DEFAULT ->
                    let {
                      sat_s6Y3 [Occ=Once] :: [GHC.Types.Char]
                      [LclId] =
                          [ys_s6Y0 ds11_s6Y1] \u []
                              case -# [ds11_s6Y1 1#] of sat_s6Y2 {
                                __DEFAULT -> $wgo1_r6T8 ys_s6Y0 sat_s6Y2;
                              };
                    } in  : [y_s6XZ sat_s6Y3];
                1# -> : [y_s6XZ n_r6T1];
              };
        };

$wdraw_r6Tg
  :: [GHC.Types.Char]
     -> [Main.GenTree [GHC.Types.Char]] -> [[GHC.Types.Char]]
[GblId, Arity=2, Str=<L,1*U><S,1*U>, Unf=OtherCon []] =
    sat-only [] \r [ww_s6ZI ww1_s6ZJ]
        let {
          sat_s70m [Occ=Once, Dmd=<L,1*U>] :: [GHC.Types.Char]
          [LclId] =
              [ww_s6ZI] \s []
                  let {
                    sat_s70l [Occ=Once, Dmd=<L,1*U>] :: [GHC.Types.Char]
                    [LclId] =
                        [ww_s6ZI] \s [] $wgo1_r6T8 ww_s6ZI 4#;
                  } in  GHC.Base.++ ds10_r6SZ sat_s70l;
        } in
          case
              case ww1_s6ZJ of {
                [] -> lvl23_r6Tc;
                : t_s6ZL [Occ=Once*!] ds11_s6ZM [Occ=Once!] ->
                    case ds11_s6ZM of {
                      [] ->
                          case
                              case t_s6ZL of {
                                Main.Node ww3_s6ZP [Occ=Once] ww4_s6ZQ [Occ=Once] ->
                                    $wdraw_r6Tg ww3_s6ZP ww4_s6ZQ;
                              }
                          of
                          sat_s6ZR
                          { __DEFAULT -> $sgo1_r6SJ sat_s6ZR ds9_r6SX xs2_r6Tf;
                          };
                      : ipv_s6ZS [Occ=Once] ipv1_s6ZT [Occ=Once] ->
                          let {
                            z_s6ZU [Occ=OnceL] :: [[GHC.Types.Char]]
                            [LclId] =
                                [ipv_s6ZS ipv1_s6ZT] \u [] $srsLoop_r6Th ipv_s6ZS ipv1_s6ZT; } in
                          let {
                            z1_s6ZV :: [[GHC.Types.Char]]
                            [LclId, Unf=OtherCon []] =
                                CCCS :! [ds5_r6SS z_s6ZU]; } in
                          let {
                            go4_s6ZX [Occ=LoopBreaker]
                              :: [[GHC.Types.Char]] -> [[GHC.Types.Char]] -> [[GHC.Types.Char]]
                          } in
                          let {
                            $sgo6_s6ZW [Occ=Once!T[3]]
                              :: [[GHC.Types.Char]]
                                 -> [GHC.Types.Char] -> [[GHC.Types.Char]] -> [[GHC.Types.Char]]
                          } in
                            case
                                case t_s6ZL of {
                                  Main.Node ww3_s70h [Occ=Once] ww4_s70i [Occ=Once] ->
                                      $wdraw_r6Tg ww3_s70h ww4_s70i;
                                }
                            of
                            sat_s70j
                            { __DEFAULT -> $sgo6_s6ZW sat_s70j ds7_r6SV xs_r6ST;
                            };
                    };
              }
          of
          sat_s70k
          { __DEFAULT -> $sgo2_r6SH sat_s70k sat_s70m xs1_r6T5;
          };

after late demand analysis:

$w$wgo_r71J
  :: [GHC.Types.Char]
     -> GHC.Prim.Int# -> (# GHC.Types.Char, [GHC.Types.Char] #)
[GblId, Arity=2, Str=<S,1*U><S,1*U>, Unf=OtherCon []] =
    sat-only [] \r [w_s76A w1_s76B]
        case w_s76A of {
          [] -> $w$wxs_r71I w1_s76B;
          : y_s76D [Occ=Once*] ys_s76E [Occ=Once] ->
              case w1_s76B of ds11_s76F {
                __DEFAULT ->
                    let {
                      sat_s76K [Occ=Once] :: [GHC.Types.Char]
                      [LclId] =
                          [ys_s76E ds11_s76F] \u []
                              case -# [ds11_s76F 1#] of sat_s76G {
                                __DEFAULT ->
                                    case $w$wgo_r71J ys_s76E sat_s76G of {
                                      (#,#) ww1_s76I [Occ=Once] ww2_s76J [Occ=Once] ->
                                          : [ww1_s76I ww2_s76J];
                                    };
                              };
                    } in  (#,#) [y_s76D sat_s76K];
                1# -> (#,#) [y_s76D n_r71D];
              };
        };

$wdraw_r71R
  :: [GHC.Types.Char]
     -> [Main.GenTree [GHC.Types.Char]] -> [[GHC.Types.Char]]
[GblId, Arity=2, Str=<L,1*U><S,1*U>, Unf=OtherCon []] =
    sat-only [] \r [ww_s78p ww1_s78q]
        let {
          karg_s78r [Occ=Once*, Dmd=<L,1*U>] :: [GHC.Types.Char]
          [LclId] =
              [ww_s78p] \s []
                  let {
                    sat_s78v [Occ=Once, Dmd=<L,1*U>] :: [GHC.Types.Char]
                    [LclId] =
                        [ww_s78p] \s []
                            case $w$wgo_r71J ww_s78p 4# of {
                              (#,#) ww3_s78t [Occ=Once] ww4_s78u [Occ=Once] ->
                                  : [ww3_s78t ww4_s78u];
                            };
                  } in  GHC.Base.++ ds10_r71B sat_s78v;
        } in
          case ww1_s78q of {
            [] -> $sgo2_r71j lvl22_r71N karg_s78r xs1_r71H;
            : t_s78x [Occ=Once*!] ds11_s78y [Occ=Once!] ->
                case ds11_s78y of {
                  [] ->
                      case t_s78x of {
                        Main.Node ww3_s78B [Occ=Once] ww4_s78C [Occ=Once] ->
                            case $wdraw_r71R ww3_s78B ww4_s78C of sat_s78D {
                              __DEFAULT ->
                                  case $sgo1_r71l sat_s78D ds9_r71z xs2_r71Q of sat_s78E {
                                    __DEFAULT -> $sgo2_r71j sat_s78E karg_s78r xs1_r71H;
                                  };
                            };
                      };
                  : ipv_s78F [Occ=Once!] ipv1_s78G [Occ=Once] ->
                      let {
                        z_s78H [Occ=OnceL] :: [[GHC.Types.Char]]
                        [LclId] =
                            [ipv_s78F ipv1_s78G] \u []
                                case ipv_s78F of {
                                  Main.Node ww3_s78J [Occ=Once] ww4_s78K [Occ=Once] ->
                                      $w$srsLoop_r71S ww3_s78J ww4_s78K ipv1_s78G;
                                }; } in
                      let {
                        z1_s78L :: [[GHC.Types.Char]]
                        [LclId, Unf=OtherCon []] =
                            CCCS :! [ds5_r71u z_s78H]; } in
                      let {
                        go4_s78N [Occ=LoopBreaker]
                          :: [[GHC.Types.Char]] -> [[GHC.Types.Char]] -> [[GHC.Types.Char]]
                      } in
                      let {
                        $sgo6_s78M [Occ=Once!]
                          :: [[GHC.Types.Char]]
                             -> [GHC.Types.Char] -> [[GHC.Types.Char]] -> [[GHC.Types.Char]]
                      } in
                        case t_s78x of {
                          Main.Node ww3_s797 [Occ=Once] ww4_s798 [Occ=Once] ->
                              case $wdraw_r71R ww3_s797 ww4_s798 of sat_s799 {
                                __DEFAULT ->
                                    case $sgo6_s78M sat_s799 ds7_r71x xs_r71v of sat_s79a {
                                      __DEFAULT -> $sgo2_r71j sat_s79a karg_s78r xs1_r71H;
                                    };
                              };
                        };
                };
          };

Somehow new wdraw function should be doing more allocation but I couldn't figure how yet. Also attached the whole STG dumps.

Changed 22 months ago by osa1

Attachment: Main_stock.dump-stg added

cse

Changed 22 months ago by osa1

Attachment: Main_late_dmd.dump-stg added

cse late demand analysis

comment:29 Changed 22 months ago by osa1

comment:12 of #5075 identifies three more tickets that claim to benefit from late-dmd-anal. Let's see if they do.

One of those tickets is the current ticket. I updated #5302. I checked #6070 but late demand analysis didn't make any difference in the example functions h and c shown in the ticket. I don't know how to generated the CPR2.c_go2, the ticket is missing that detail.

comment:30 Changed 22 months ago by Ömer Sinan Ağacan <omeragacan@…>

In a032ff7/ghc:

Add references to #6087

[skip ci]

comment:31 Changed 22 months ago by osa1

We decided to put this on hold at the meeting. Just a small update: the case in comment:27 seems to be a limitation with the current demand analyser. It sometimes introduces reboxing that can't be eliminated by the simplifier. Example:

{-# NOINLINE check #-}
check :: Int -> Bool
check !n = True

f_ :: Int -> Int
f_ n = if check n then n else n * f_ (n - 1)

even after simplifications we rebox the argument:

Rec {
-- RHS size: {terms: 17, types: 3, coercions: 0, joins: 0/0}
$wf_
$wf_
  = \ ww_s28U ->
      case check (I# ww_s28U) of {
        False ->
          case $wf_ (-# ww_s28U 1#) of ww1_s28Y { __DEFAULT ->
          *# ww_s28U ww1_s28Y
          };
        True -> ww_s28U
      }
end Rec }

Update: Simon's example looks much similar to what I'm observing in comment:27:

f 0 = []
f n = n : f (n-1)

worker:

Rec {
-- RHS size: {terms: 13, types: 4, coercions: 0, joins: 0/0}
$wf
$wf
  = \ ww_s29x ->
      case ww_s29x of ds_X27s {
        __DEFAULT -> : (I# ds_X27s) ($wf (-# ds_X27s 1#));
        0# -> []
      }
end Rec }
Last edited 22 months ago by osa1 (previous) (diff)

comment:32 Changed 22 months ago by simonpj

Correct. Another example might be

f :: Int -> [Int]
f 0 = []
f n = n : f (n-1)

It's strict all right, but we need the boxed 'n' for the result.

"Boxity" analysis might try to figure out when the boxed version is needed, and pass that too. At one stage I had that but it seemed unprincipled.

Bottom line so far: late demand analysis can be a win, but only rarely so; and it can be a loss. It's a pity -- I'd like to make it part of -O2. But we probably have higher priorities for now.

Can you add the nofib numbers in a comment, please, for posterity?

comment:33 Changed 22 months ago by osa1

(the text below is also attached as file "analyse")

NoFib Results

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
             CS           0.0%      0.0%     0.156     0.156      0.0%
            CSD           0.0%      0.0%    -19.8%    -19.8%      0.0%
             FS           0.0%      0.0%     -7.0%     -7.0%      0.0%
              S           0.0%      0.0%    -11.6%    -11.6%      0.0%
             VS           0.0%      0.0%     -5.5%     -5.6%      0.0%
            VSD           0.0%      0.0%     0.009     0.009      0.0%
            VSM           0.0%      0.0%    -14.3%    -14.3%      0.0%
           anna          +0.2%     +0.0%     0.056     0.056      0.0%
           ansi           0.0%      0.0%     0.000     0.000      0.0%
           atom          +0.0%      0.0%     0.136     0.136      0.0%
         awards          +0.0%      0.0%     0.000     0.000      0.0%
         banner          +0.0%      0.0%     0.000     0.000      0.0%
     bernouilli          -0.0%      0.0%     0.095     0.095      0.0%
   binary-trees          +0.0%      0.0%     -8.5%     -8.5%      0.0%
          boyer          +0.0%     -0.0%     0.024     0.024      0.0%
         boyer2           0.0%     -2.1%     0.006     0.006      0.0%
           bspt          +0.1%     +0.2%     0.006     0.006    +50.0%
      cacheprof          +0.2%     -0.0%     0.191     0.191     -1.8%
       calendar          +0.1%     +0.0%     0.001     0.001      0.0%
       cichelli          -0.0%      0.0%     0.042     0.042     -2.7%
        circsim          +0.1%     +0.2%     -2.6%     -2.6%     -2.8%
       clausify          +0.1%     +0.0%     0.020     0.021      0.0%
  comp_lab_zift          -0.0%     -0.3%     0.097     0.097    -11.1%
       compress           0.0%      0.0%     0.076     0.076      0.0%
      compress2          +0.1%     -0.0%     0.078     0.078      0.0%
    constraints          +0.1%     +0.6%     +0.4%     +0.3%     -0.8%
   cryptarithm1           0.0%      0.0%     -0.9%     -0.7%      0.0%
   cryptarithm2          +0.0%      0.0%     0.005     0.005      0.0%
            cse          +0.0%     +0.2%     0.001     0.001      0.0%
   digits-of-e1           0.0%      0.0%     +2.0%     +2.0%      0.0%
   digits-of-e2          -0.1%     +0.0%     -3.2%     -3.2%      0.0%
          eliza          +0.0%     -0.0%     0.001     0.001      0.0%
          event          +0.1%     +0.0%     0.075     0.075      0.0%
    exact-reals          +0.2%      0.0%     +1.2%     +1.2%      0.0%
         exp3_8           0.0%      0.0%     0.125     0.125      0.0%
         expert          +0.1%     +0.1%     0.000     0.001      0.0%
 fannkuch-redux           0.0%      0.0%     -6.1%     -6.1%      0.0%
          fasta          +0.0%      0.0%    +54.1%    +53.9%      0.0%
            fem          +0.0%     -0.0%     0.015     0.015      0.0%
            fft          +0.0%     +0.0%     0.024     0.024      0.0%
           fft2          +0.0%      0.0%     0.045     0.045      0.0%
       fibheaps          +0.0%      0.0%     0.016     0.016      0.0%
           fish           0.0%      0.0%     0.006     0.006      0.0%
          fluid          +0.2%     +0.3%     0.005     0.005      0.0%
         fulsom          +0.1%      0.0%     0.136     0.136      0.0%
         gamteb          -0.1%     -0.1%     0.022     0.022      0.0%
            gcd           0.0%      0.0%     0.025     0.025      0.0%
    gen_regexps          -0.0%      0.0%     0.000     0.000      0.0%
         genfft          -0.0%     -0.2%     0.019     0.019      0.0%
             gg          +0.1%     +0.0%     0.008     0.008      0.0%
           grep          +0.0%      0.0%     0.000     0.000      0.0%
         hidden          -0.0%     -0.2%     0.193     0.193      0.0%
            hpg          +0.0%      0.0%     0.068     0.068      0.0%
            ida          +0.1%     +0.4%     0.048     0.048      0.0%
          infer          +0.0%      0.0%     0.033     0.033      0.0%
        integer           0.0%      0.0%     -1.0%     -1.0%      0.0%
      integrate           0.0%      0.0%     0.060     0.060      0.0%
   k-nucleotide          -0.0%      0.0%    -10.6%    -10.8%      0.0%
          kahan           0.0%      0.0%     0.172     0.172      0.0%
        knights           0.0%      0.0%     0.004     0.004      0.0%
         lambda           0.0%      0.0%     -1.4%     -1.4%      0.0%
     last-piece          -0.1%     -1.2%     -3.5%     -3.6%      0.0%
           lcss          +0.0%      0.0%     +0.3%     +0.4%      0.0%
           life          -0.0%      0.0%     0.124     0.124      0.0%
           lift          +0.0%      0.0%     0.001     0.001      0.0%
         linear          +0.1%      0.0%     +7.2%     +7.2%      0.0%
      listcompr          +0.0%     -0.0%     0.051     0.051      0.0%
       listcopy          +0.0%     -0.0%     0.052     0.052      0.0%
       maillist          +0.1%     +0.0%     0.034     0.034     +2.7%
         mandel           0.0%      0.0%     0.033     0.033      0.0%
        mandel2          +0.0%    -11.3%     0.004     0.004      0.0%
           mate          -0.2%     +5.0%    -17.9%    -17.9%      0.0%
        minimax           0.0%      0.0%     0.002     0.002      0.0%
        mkhprog           0.0%      0.0%     0.001     0.001      0.0%
     multiplier          +0.0%     -0.0%     0.052     0.052      0.0%
         n-body          -0.0%     -0.1%     -1.9%     -1.9%      0.0%
       nucleic2          -0.3%     +1.5%     0.035     0.035      0.0%
           para          +0.2%     +1.3%     0.148     0.148      0.0%
      paraffins          +0.0%      0.0%     0.065     0.065      0.0%
         parser          +0.0%      0.0%     0.017     0.017      0.0%
        parstof          +0.2%      0.0%     0.004     0.004      0.0%
            pic          -0.0%     -1.0%     0.005     0.005      0.0%
       pidigits          +0.0%      0.0%     -3.2%     -3.2%      0.0%
          power          +0.3%      0.0%     0.190     0.191      0.0%
         pretty           0.0%      0.0%     0.000     0.000      0.0%
         primes          +0.0%      0.0%     0.041     0.041      0.0%
      primetest          -0.0%     -0.2%     0.061     0.061      0.0%
         prolog          +0.0%     -0.0%     0.001     0.001      0.0%
         puzzle          +0.2%      0.0%     0.068     0.068      0.0%
         queens           0.0%      0.0%     0.012     0.012      0.0%
        reptile          +0.1%     +0.0%     0.007     0.007      0.0%
reverse-complem           0.0%      0.0%     0.078     0.078      0.0%
        rewrite          +0.1%     -0.0%     0.011     0.011      0.0%
           rfib           0.0%      0.0%     0.009     0.009      0.0%
            rsa          +0.0%     -0.0%     0.017     0.017      0.0%
            scc           0.0%      0.0%     0.000     0.000      0.0%
          sched          +0.1%     -0.0%     0.013     0.013      0.0%
            scs          -0.3%     +0.0%     -1.5%     -1.5%      0.0%
         simple          -0.1%     -0.7%     0.109     0.109      0.0%
          solid          +0.1%     -0.1%     0.063     0.063      0.0%
        sorting          +0.0%      0.0%     0.002     0.002      0.0%
  spectral-norm           0.0%      0.0%     -0.1%     -0.1%      0.0%
         sphere          +0.0%      0.0%     0.026     0.026      0.0%
         symalg          +0.0%      0.0%     0.007     0.007      0.0%
            tak           0.0%      0.0%     0.006     0.006      0.0%
      transform          -0.0%     -0.8%     0.165     0.165      0.0%
       treejoin           0.0%      0.0%     0.077     0.077      0.0%
      typecheck          +0.1%     -0.0%     0.131     0.131      0.0%
        veritas          +0.0%      0.0%     0.001     0.001      0.0%
           wang          +0.1%     -0.0%     0.053     0.053      0.0%
      wave4main          +0.0%      0.0%     0.125     0.125      0.0%
   wheel-sieve1           0.0%      0.0%     -0.2%     -0.2%      0.0%
   wheel-sieve2          +0.0%      0.0%     0.116     0.116      0.0%
           x2n1          +0.0%      0.0%     0.003     0.003      0.0%
--------------------------------------------------------------------------------
            Min          -0.3%    -11.3%    -19.8%    -19.8%    -11.1%
            Max          +0.3%     +5.0%    +54.1%    +53.9%    +50.0%
 Geometric Mean          +0.0%     -0.1%     -2.8%     -2.8%     +0.2%

comment:34 Changed 18 months ago by bgamari

Milestone: 8.6.18.8.1

This won't be fixed in 8.6. Bumping to 8.8.

comment:35 Changed 12 months ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of low-priority tickets.

comment:36 Changed 11 months ago by simonpj

Keywords: DemandAnalysis added
Note: See TracTickets for help on using tickets.