Opened 9 years ago

Last modified 4 years ago

#4081 new bug

Strict constructor fields inspected in loop

Reported by: rl Owned by: benl
Priority: low Milestone:
Component: Test Suite Version: 6.13
Keywords: Cc: benl@…, nightski@…, bjornbm, chr.andreetta@…, choener@…, batterseapower, rl, bgamari@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Here is a small example to illustrate the problem:

module T where

data S a b = S !a !b

class C a where
 make :: a -> S a a

instance C Int where
 {-# NOINLINE make #-}
 make n = S n n

foo :: (C a, Num a) => a -> Int -> a
{-# INLINE foo #-}
foo x k = k `seq` m `seq` go k 0
 where
   S m n = make x

   go 0 i = i
   go k i = go (k-1) (i + m)
module U where

import T

bar :: Int -> Int -> Int
bar s k = foo s k + 1

Relying on LiberateCase seems to be the only way to unbox m outside of the loop in bar. The seq in foo doesn't help because it gets eliminated immediately.

GHC does have enough information to do this:

U.bar =
 \ (s_aaw [Dmd=Just S(A)] :: GHC.Types.Int)
   (k_aax [Dmd=Just U(L)] :: GHC.Types.Int) ->
   case k_aax
   of k1_ajh [Dmd=Just U(L)] { GHC.Types.I# ipv_ajj [Dmd=Just A] ->
   case T.$fCInt_$cmake s_aaw of _ { T.S m_ajy [Dmd=Just U(T)] _ ->
   ...

Note the demand on m. If it was an argument instead of a local binding, it would be unboxed by w/w.

Also, the seq does help if we use lazy pairs instead of strict ones.

Change History (27)

comment:1 Changed 9 years ago by simonpj

We could arrange that w/w transformed the case to add

  case m_ajy of I# m' -> ...

just inside the T.S match. As things stand today, this case would be eliminated by the "drop redundant seqs" transformation. But perhaps that transformation is too eager. It drops a case that looks like

   case x of { C y1..yn -> body }

where the y1..yn are not mentioned in body, and x is known to be evaluated. See

--      2. Eliminate the case if scrutinee is evaluated

in Simplify.

This is correct, but it might be better to 'retain' the case if there are any binders y1..yn, at least until the end of simplification (CorePrep perhaps). Then we can drop it. That would deal with this particular case at least.

Another opportunity that we do not exploit right now is strictness in a free variable. Consier

f x = letrec g y = if x>0 then ..g y'....
                   else ...g y''...
      in (g y1, g y2)

If we lambda-lifted, we'd evaluate 'x' just once; as things stand we do it each time. A modification to the w/w transform might handle this.

comment:2 Changed 9 years ago by igloo

Milestone: 6.14.1

comment:3 Changed 9 years ago by igloo

Milestone: 7.0.17.0.2

comment:4 Changed 9 years ago by simonpj

Here's a simpler example:

data T a = T !a 

foo :: T Int -> Int -> Int 
foo (T x) y = letrec blah 0 = 0
                     blah n = x + blah (n-1)
              in blah y

However Roman says that although doing the unboxing correctly is obviously the "right thing", it's no longer important for DPH.

comment:5 Changed 9 years ago by simonpj

Here's another example that Ben was looking at:

{-# LANGUAGE BangPatterns #-}
module Foo(foo) where

-- Library Code ---------------------------------------------------------------
data Thing = Manifest !Int | None

getManifestThing :: Thing -> Int
getManifestThing (Manifest t)	= t
getManifestThing _		= error "sorry"

loopIt :: (Int -> Int) -> Int -> Int
{-# INLINE loopIt #-}
loopIt f iters
 = loopOuter iters
 where	loopOuter 0	= 0
	loopOuter n	= loopInner iters + loopOuter (n - 1)

	loopInner 0	= 0
	loopInner n	= f n             + loopInner (n - 1)


-- Client Code ----------------------------------------------------------------
foo :: Thing -> Int -> Int
foo t1@(Manifest i) count 
 = i `seq` go count
 where	go 0	= 0
	go n	= loopIt (worker t1 n) count + go (n - 1)
	
worker :: Thing -> Int -> Int -> Int
worker t x n = getManifestThing t + n + x

Here we get a loop like this:

$wfoo :: Thing -> Int# -> Int#
$wfoo =
  \ (w_so2 :: Thing) (ww_so5 :: Int#) ->
    case w_so2 of _ {
      Manifest i_aaX ->
        letrec {
          $wgo_soj :: Int# -> Int#
          $wgo_soj =
            \ (ww1_snU :: Int#) ->
              case ww1_snU of ds_Xma {
                __DEFAULT ->
                  letrec {
                    $wloopOuter_son :: Int# -> Int#
                    $wloopOuter_son =
                      \ (ww2_snL :: Int#) ->
                        case ww2_snL of wild1_Xi {
                          __DEFAULT ->
                            case ww_so5 of ds1_XlU {
                              __DEFAULT ->
************************        case i_aaX of _ { I# x_amh ->
                                letrec {
                                  $wloopInner_sol :: Int# -> Int#
                                  $wloopInner_sol =
                                    \ (ww3_Xo5 :: Int#) ->
                                      case ww3_Xo5 of ds2_Xmo {
                                        __DEFAULT ->
                                          case $wloopInner_sol (-# ds2_Xmo 1)
                                          of ww4_snG { __DEFAULT ->
                                          +# (+# (+# x_amh ds2_Xmo) ds_Xma) ww4_snG
                                          };
                                        0 -> 0
                                      }; } in
                                case $wloopInner_sol (-# ds1_XlU 1) of ww3_snG { __DEFAULT ->
                                case $wloopOuter_son (-# wild1_Xi 1) of ww4_snP { __DEFAULT ->
                                +# (+# (+# (+# x_amh ds1_XlU) ds_Xma) ww3_snG) ww4_snP
                                }
                                }
                                };
                              0 -> $wloopOuter_son (-# wild1_Xi 1)
                            };
                          0 -> 0
                        }; } in
                  case $wloopOuter_son ww_so5 of ww2_snP { __DEFAULT ->
                  case $wgo_soj (-# ds_Xma 1) of ww3_snY { __DEFAULT ->
                  +# ww2_snP ww3_snY
                  }
                  };
                0 -> 0
              }; } in
        $wgo_soj ww_so5;
      None -> lvl_roB `cast` (CoUnsafe Int Int# :: Int ~ Int#)
    }

The "*" line inspects i_aaX inside the loop, but that same case could safely occur right when we unpack the constructor. I think this is the same issue as the much smaller example above, but I wanted to capture the example.

comment:6 Changed 9 years ago by rl

It turns out that this still is quite critical for DPH after all. We implement a parallel map on vectors more or less like this:

splitD :: Vector a -> Dist (Vector a)
joinD :: Dist (Vector a) -> Vector a
mapD :: (a -> b) -> Dist a -> Dist b

mapPar :: (a -> b) -> Vector a -> Vector b
mapPar f = joinD . mapD (map f) . splitD

Here, map f is a loop which is applied in parallel to chunks of the vector (which are themselves vectors). Although mapD will seq on the vector before passing it to the loop, this isn't enough for, say, vectors of pairs:

data instance Vector (a,b) = V_2 !Int !(Vector a) !(Vector b)

There is no way to have the inspection happen outside of the loop at the moment. LiberateCase does catch this but it duplicates huge amounts of code so relying on it isn't a good idea.

Fusion sometimes gets rid of this, too, but not always.

comment:7 Changed 9 years ago by simonpj

I'm pretty certain we can fix this. My plan is simply to extend the let-floater to float out (case x of I# y -> ...), where x is known to be evaluated. (Of course, for any product type, not just Int.)

How might x be known to be evaluated? The usual way is by an enclosing 'case', but that won't happen here because the inner case would simply vanish. No, it'll be because you pattern match on a strict constructor

  case v of
    C x -> ....(case x of I# y -> ...) ...

where

  data C a = C !a

I believe that these strict constructors are the cases you are concerned about, correct?

Simon

comment:8 Changed 9 years ago by rl

That sounds like a good plan. Yes, it's precisely the strict constructors I'm interested in.

comment:9 Changed 9 years ago by igloo

Milestone: 7.0.27.2.1

comment:10 Changed 8 years ago by benl

Cc: benl@… added

comment:11 Changed 8 years ago by nightski

Cc: nightski@… added

comment:12 Changed 8 years ago by bjornbm

Cc: bjornbm added

comment:13 Changed 8 years ago by simonpj

Owner: set to benl

Right this is done, I think. Give it a try. Ben or Roman do you think you might think of a way to test this? I can think of two possible ways:

  • Find a case where there is a big runtime difference, and measure that. But that is fragile to which system you are running on.
  • Dump the Core and grep for something or other. Perhaps in your example all the primops should be together, rather than separated by unboxing?

I'd just like a test that'll trip if this optimisation stops happening. Thanks.

Two main patches:

commit 9cb20b488d4986c122b0461a54bc5c970f9d8502
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Mon Jun 27 08:54:29 2011 +0100

    Add case-floating to the float-out pass
    
    There are two things in this patch. First, a new feature.
    Given     (case x of I# y -> ...)
    where 'x' is known to be evaluated, the float-out pass
    will float the case outwards towards x's binding.  Of
    course this doesn't happen if 'x' is evaluated because
    of an enclosing case (becuase then the inner case would
    be eliminated) but it *does* happen when x is bound by
    a constructor with a strict field.  This happens in DPH.
    Trac #4081.
    
    The second change is a significant refactoring of the
    way the let-floater works.  Now SetLevels makes a decision
    about whether the let (or case) will move, and records
    that decision in the FloatSpec flag.  This change makes
    the whole caboodle much easier to think about.

 compiler/simplCore/FloatOut.lhs  |  297 +++++++++++++++++++++----------------
 compiler/simplCore/SetLevels.lhs |  302 ++++++++++++++++++++++----------------
 2 files changed, 343 insertions(+), 256 deletions(-)

and a follow-up

commit a347cd7c384eb255b5507a40840205d052f137c6
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Thu Jun 30 14:48:16 2011 +0100

    A second bite at the case-floating patch
    
    When floating a case outwards we must be careful to clone
    the binders, since their scope is widening.
    
    Plus lots of tidying up.

 compiler/coreSyn/CoreSubst.lhs   |   20 +++++++-
 compiler/simplCore/SetLevels.lhs |   94 ++++++++++++++++++++++---------------
 compiler/types/Type.lhs          |   13 ++++-
 3 files changed, 85 insertions(+), 42 deletions(-)

This work tickled a scoping bug in CSE, which I fixed too

commit 3acc4683f128641a93d53a0d4e9d50e10e5e4ff0
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Thu Jun 30 14:40:25 2011 +0100

    Fix CSE to do substitution properly
    
    It was inconsistent before, now it's right

 compiler/simplCore/CSE.lhs |  130 +++++++++++++++++++++++---------------------
 1 files changed, 68 insertions(+), 62 deletions(-)

Now the code you get for $wfoo in the "example that Ben was looking at" looks better

T4081.$wfoo =
  \ (w_sr0 :: T4081.Thing) (ww_sr3 :: GHC.Prim.Int#) ->
    case w_sr0 of _ {
      T4081.Manifest i_ab6 ->
        case i_ab6 of _ { GHC.Types.I# x_sri ->            -- <---- Int unboxed here!
        letrec {
          $wgo_sra [Occ=LoopBreaker] :: GHC.Prim.Int# -> GHC.Prim.Int#
          [LclId, Arity=1, Str=DmdType L]
          $wgo_sra =
            \ (ww1_sqS :: GHC.Prim.Int#) ->
              case ww1_sqS of ds_Xpg {
                __DEFAULT ->
                  letrec {
                    $wloopInner_srb [Occ=LoopBreaker] :: GHC.Prim.Int# -> GHC.Prim.Int#
                    [LclId, Arity=1, Str=DmdType L]
                    $wloopInner_srb =
                      \ (ww2_sqA :: GHC.Prim.Int#) ->
                        case ww2_sqA of ds1_XoZ {
                          __DEFAULT ->
                            case $wloopInner_srb (GHC.Prim.-# ds1_XoZ 1)
                            of ww3_sqE { __DEFAULT ->
                            GHC.Prim.+#
                              (GHC.Prim.+# (GHC.Prim.+# x_sri ds1_XoZ) ds_Xpg) ww3_sqE
                            };
                          0 -> 0
                        }; } in
                  letrec {
                    $wloopOuter_src [Occ=LoopBreaker] :: GHC.Prim.Int# -> GHC.Prim.Int#
                    [LclId, Arity=1, Str=DmdType L]
                    $wloopOuter_src =
                      \ (ww2_sqJ :: GHC.Prim.Int#) ->
                        case ww2_sqJ of wild2_Xj {
                          __DEFAULT ->
                            case $wloopInner_srb ww_sr3 of ww3_sqE { __DEFAULT ->
                            case $wloopOuter_src (GHC.Prim.-# wild2_Xj 1)
                            of ww4_sqN { __DEFAULT ->
                            GHC.Prim.+# ww3_sqE ww4_sqN
                            }
                            };
                          0 -> 0
                        }; } in
                  case $wloopOuter_src ww_sr3 of ww2_sqN { __DEFAULT ->
                  case $wgo_sra (GHC.Prim.-# ds_Xpg 1) of ww3_sqW { __DEFAULT ->
                  GHC.Prim.+# ww2_sqN ww3_sqW
                  }
                  };
                0 -> 0
              }; } in
        $wgo_sra ww_sr3
        };
      T4081.None ->
        T4081.foo1
        `cast` (UnsafeCo GHC.Types.Int GHC.Prim.Int#
                :: GHC.Types.Int ~ GHC.Prim.Int#)
    }

comment:14 Changed 8 years ago by simonpj

Here are a couple more examples Max suggested, which I want to capture in this ticket.

module T4081a where

----------------
data S1 = S1 !Product
data Product = Product !Int

foo :: S1 -> Int
foo (S1 x) = go 0 10
  where
    go acc 0 = acc
    go acc y = case x of Product x -> go (acc + (y * x)) (y - 1)

---------------------
data S2 = S2 !Int

bar :: S2 -> Int
bar (S2 x) = go 0 10
  where
    go acc 0 = acc
    go acc y = go (acc + (x * y)) (y - 1)

comment:15 Changed 8 years ago by igloo

Milestone: 7.2.17.4.1

comment:16 Changed 8 years ago by igloo

Milestone: 7.4.17.6.1
Priority: normallow

comment:17 Changed 8 years ago by chr.andr

Cc: chr.andreetta@… added

comment:18 Changed 7 years ago by choenerzs

Cc: choener@… added

comment:19 Changed 7 years ago by simonpj

Cc: batterseapower rl added
difficulty: Unknown

This bug is actually fixed; it's only still open because we lack a regression test. Ben, Roman, Max, any chance of coming up with one? See my comment above.

Simon

comment:20 Changed 7 years ago by igloo

Milestone: 7.6.17.6.2

comment:21 Changed 7 years ago by bgamari

Cc: bgamari@… added
Version: 6.13

Any progress on the regression test?

comment:22 Changed 7 years ago by bgamari

Version: 6.13

comment:23 Changed 5 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:24 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:25 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:26 Changed 4 years ago by thomie

Component: CompilerTest Suite

Bug is fixed. Just needs a test.

comment:27 Changed 4 years ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.