Opened 5 years ago

Last modified 3 years ago

#10626 new bug

Missed opportunity for SpecConstr

Reported by: simonpj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.10.1
Keywords: SpecConstr Cc: nomeata, bgamari, dsc
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


Look at perf/should_run/T4830. After SpecConstr and optimisation we finally get

Rec {
Main.foo_$s$wfoo1 [Occ=LoopBreaker]
  :: Int# -> Double -> Double -> Double#
[GblId, Arity=3, Caf=NoCafRefs, Str=DmdType <L,1*U><L,U><L,U>]
Main.foo_$s$wfoo1 =
  \ (sc_s4eN :: Int#) (sc1_s4eO :: Double) (sc2_s4eP :: Double) ->
    case sc_s4eN of ds_X1SR {
      __DEFAULT ->
        case sc1_s4eO of wild_a1Tf { D# x_a1Th ->
        case sc2_s4eP of wild1_a1Tj { D# y_a1Tl ->
        case tagToEnum# @ Bool (<=## x_a1Th y_a1Tl) of _ [Occ=Dead] {
          False -> Main.foo_$s$wfoo1 (-# ds_X1SR 1#) wild1_a1Tj wild_a1Tf;
          True -> Main.foo_$s$wfoo1 (-# ds_X1SR 1#) wild_a1Tf wild1_a1Tj
      0# ->
        case sc1_s4eO of _ [Occ=Dead] { D# x_a1UL ->
        case sc2_s4eP of _ [Occ=Dead] { D# y_a1UP -> +## x_a1UL y_a1UP }
end Rec }

If we ran SpecConstr again we'd specialise this function, because the recursive calls both have boxed arguments.

I looked into why SpecConstr didn't catch it, and it's because SpecConstr's input looks like this

            case case tagToEnum# @ Bool (<=## x_a1Th y_a1Tl) of _ [Occ=Dead] {
                   False -> (wild1_a1Tj, wild_a1Tf);
                   True -> wild_X8
            of r_s4e0 { (ipv_s4e1, ipv_s4e2) ->
            $wfoo_s4db (I# (-# ds_X1SR 1#)) (Just @ (Double, Double) r_s4e0)

Notice the case-of-case which doesn't expose the Double boxes of arguments to $wfoo.

Why is that case-of-case still there? Because of Note [Single-alternative cases] in Simplify. Which is clearly a delicate spot so I don't want to meddle with it today. But it's intriguing. Maybe Sequent Core will do better.

Note that late demand analysis also catches this case, yielding the (rather good)

Rec {
Main.$w$s$wfoo [InlPrag=[0], Occ=LoopBreaker]
  :: Int# -> Double# -> Double# -> Double#
[GblId, Arity=3, Caf=NoCafRefs, Str=DmdType <S,1*U><L,U><L,U>]
Main.$w$s$wfoo =
  \ (w_s4gW :: Int#) (ww_s4h1 :: Double#) (ww1_s4h5 :: Double#) ->
    case w_s4gW of ds_X1SW {
      __DEFAULT ->
        case tagToEnum# @ Bool (<=## ww_s4h1 ww1_s4h5) of _ [Occ=Dead] {
          False -> Main.$w$s$wfoo (-# ds_X1SW 1#) ww1_s4h5 ww_s4h1;
          True -> Main.$w$s$wfoo (-# ds_X1SW 1#) ww_s4h1 ww1_s4h5
      0# -> +## ww_s4h1 ww1_s4h5
end Rec }

I'm pretty sure that late-demand-analysis should be on with -O2 but someone should do a nofib run to check.

Attachments (4)

Input.hs (11.8 KB) - added by dsc 3 years ago.
Input.dump-simpl (14.2 KB) - added by dsc 3 years ago.
ReHaskelledCore.hs (3.0 KB) - added by dsc 3 years ago.
ReHaskelledCore.dump-simpl (8.9 KB) - added by dsc 3 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 4 years ago by thomie

Type of failure: None/UnknownRuntime performance bug

comment:2 Changed 3 years ago by bgamari

Cc: nomeata bgamari added

comment:3 Changed 3 years ago by dsc

Cc: dsc added

(Not sure if my problem is different enough from this one to warrant a separate ticket.)

I also have a problem where SpecConstr apparently gets stuck and doesn't converge to a fixed point: If I translate the Core back into Haskell and compile that again, I get great code with all state constructors eliminated.

The code in question is TH-generated stream fusion code. I suppose that only the dump-splices result is relevant here, so I made it compilable and attached that (Input.hs).

The Core (also attached) contains lots of seemingly obvious SpecConstr opportunities:

test_$s$wgfold_loop =
  \ (sc
       :: FlattenState
            Int (ParamAndState Int (ConcatState YieldState () YieldState)))
    (sc1 :: Int)
    (sc2 :: Int#) ->
    case sc1 of _ [Occ=Dead] { I# y ->
    case sc of _ [Occ=Dead] {


      Flatten_RunningInner so_agux si_aguy ->

                     so_agux (ParamAndState a_aguz (CS_First AfterYielding)))
                  (+# (+# sc2 y) 42#);


                     so_agux (ParamAndState a_aguz (CS_Second () AfterYielding)))
                  (+# (+# sc2 y) 42#)


This is the second-round Core after retranslating the first Core into Haskell (hopefully without changing semantics) and compiling again:

test_$s$wgfold_loop =
  \ (sc :: Int#)
    (sc1 :: Int#)
    (sc2 :: Int#)
    _ [Occ=Dead]
    _ [Occ=Dead] ->
    case tagToEnum# (># sc2 1000000#) of _ [Occ=Dead] {
      False ->
          (+# (+# (+# (+# sc sc1) 42#) sc2) 42#) sc2 (+# sc2 1#) sc2 ();
      True -> +# sc sc1

I noticed that in the first core, GHC thinks that test_$s$wgfold_loop is lazy (Str=DmdType <L,U><L,U><L,U>). -flate-dmd-anal fixed that but doesn't seem to change much else (it unboxes one of the Int args, but doesn't affect the state type).

I compiled with stack ghc -- -O2 -ddump-simpl -ddump-to-file -dsuppress-module-prefixes -dsuppress-coercions -dsuppress-type-applications -dsuppress-uniques -dsuppress-unfoldings -fforce-recomp Input.hs using GHC 8.0.1 (without -dsuppress-uniques for producing the TH splices).

I also tried adding -fmax-simplifier-iterations=999999 -fspec-constr-count=999999 -fspec-constr-threshold=999999 -fspec-constr-recursive=999999 -flate-dmd-anal -funfolding-use-threshold=999999 to no effect (the -fspec-constr flags are redundant since I'm using the SPEC type in the loop, correct?).

Changed 3 years ago by dsc

Attachment: Input.hs added

Changed 3 years ago by dsc

Attachment: Input.dump-simpl added

Changed 3 years ago by dsc

Attachment: ReHaskelledCore.hs added

Changed 3 years ago by dsc

Attachment: ReHaskelledCore.dump-simpl added

comment:4 Changed 3 years ago by dsc

Just tried with GHC 8.0.2 and 7.10.3; similar problem.

By the way, the original source looks like this:

intersperseTest =
  $$(maybeSimplify $

        S.enumFromToNum [|| 1 ||] [|| 1000000 :: Int ||]
      & S.concatMap (\a -> yield a S.++ yield a)
      & S.intersperse [|| 42 ||]
      & S.concatMap yield
      & S.sum_
      & runIdentityE_static

(WIP library/experiment, trying for something like the streaming package with robust fusion using typed TH :))

If I change the first concatMap to just concatMap yield or remove either the intersperse or the second concatMap, the problem is not present.

I've had similar instances where the problem disappears after removing a random part of the pipeline (with that part fusing correctly if I remove a different part). Feels like I'm hitting some size limit rather than a fundamental problem (?) But as mentioned above, I already tried setting various GHC flags to large numbers.

comment:5 Changed 3 years ago by simonpj

I had a look. There are two things.

Problem 1. With the grotesque (GHC's fault not yours) SPEC argument to force specialisation, GHC does not want to generate an infinite number of specialisations; e.g.

f (a:b) = f (a::b)

This is limited by the (probably un-documented) flag -fspec-constr-recurisive=N flag. Its default value is far too low: 3. Set it to 20.

The relevant bit in SpecConstr is

is_too_recursive :: ScEnv -> (CallPat, ValueEnv) -> Bool
    -- Count the number of recursive constructors in a call pattern,
    -- filter out if there are more than the maximum.
    -- This is only necessary if ForceSpecConstr is in effect:
    -- otherwise specConstrCount will cause specialisation to terminate.
    -- See Note [Limit recursive specialisation]
-- TODO: make me more accurate

And indeed it simply counts data constructors, not even recursive ones. That will seriously limit specialisation in precisely the case where you wanted a lot!

At least we could count nesting depth rather than just counting constructors in total.

Problem 2. Your code has

                                (InterspersesState_Running (snd stored_fs_aguH))

That use of snd is fatal, because it's not inlined before SpecConstr (since it is applied to an uninformative variable). So SpecConstr doesn't "see" the case inside snd and that means it generate too few specialisations. If you instead write

--                      InterspersesState_YieldedInterspersee stored_fs_aguH
                      InterspersesState_YieldedInterspersee (fs1,fs2)
                        -> gfold_loop
--                                (InterspersesState_Running (snd stored_fs_aguH))
                                (InterspersesState_Running fs2)
--                                (ParamAndState (fst stored_fs_aguH) BeforeYielding))
                                (ParamAndState fs1 BeforeYielding))

where the old line is commented out, and replaced by the line below, then good things happen, and a single run gives

Rec {
-- RHS size: {terms: 33, types: 6, coercions: 0, joins: 0/0}
Input.test_$s$wgfold_loop [Occ=LoopBreaker]
  :: Int# -> Int# -> Int# -> Int# -> Int#
[GblId, Arity=4, Caf=NoCafRefs, Str=<S,U><S,U><S,U><S,U>]
Input.test_$s$wgfold_loop =
  \ (sc_s5hq :: Int#)
    (sc1_s5hr :: Int#)
    (sc2_s5hs :: Int#)
    (sc3_s5hp :: Int#) ->
    case tagToEnum# @ Bool (># sc_s5hq 1000000#) of {
      False ->
          (+# sc_s5hq 1#)
          (+# (+# (+# (+# sc3_s5hp sc2_s5hs) 42#) sc1_s5hr) 42#);
      True -> +# (+# (+# sc3_s5hp sc2_s5hs) 42#) sc1_s5hr
end Rec }

comment:6 Changed 3 years ago by dsc

Thank you very much! Avoiding the snd indeed fixes it in this case, and I'll also keep problem 1 in mind.

comment:7 Changed 3 years ago by simonpj

Keywords: SpecConstr added
Note: See TracTickets for help on using tickets.