Opened 9 years ago

Closed 9 years ago

#4908 closed bug (fixed)

Easy SpecConstr opportunity that is nonetheless missed

Reported by: batterseapower Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.0.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: simplCore/should_compile/T4908
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by simonpj)

I was looking at the code using that uses unsafe indexing into STUArrays from http://www.lix.polytechnique.fr/~kaustuv/expo/incr_uarray/

One of the reasons that this code runs so much more slowly than his C version is that the inner loop is not fully unboxed. It turns out that a simple SpecConstr opportunity is being missed, and I'm not sure why.

There is a local recursive function function that looks like this:

letrec {
  $wa_X1yE [Occ=LoopBreaker]
    :: forall s_aCz.
       Data.Array.Base.STUArray
         s_aCz GHC.Types.Int GHC.Types.Int
       -> GHC.Prim.Int#
       -> GHC.Prim.State# s_aCz
       -> (# GHC.Prim.State# s_aCz, () #)
  [LclId,
   Arity=3,
   Str=DmdType LLL,
   Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=3,
           Value=True, ConLike=True, Cheap=True,
           Expandable=True,
           Guidance=IF_ARGS [2 0 0] 33 3}]
  $wa_X1yE =
    \ (@ s_XDW)
      (w_X1yn
         :: Data.Array.Base.STUArray
              s_XDW GHC.Types.Int GHC.Types.Int)
      (ww_X1yr :: GHC.Prim.Int#)
      (w_X1yu :: GHC.Prim.State# s_XDW) ->
      case GHC.Prim.># ww_X1yr y_aJN of _ {
        GHC.Bool.False ->
          case w_X1yn
          of wild_XKN [Dmd=Just L]
          { Data.Array.Base.STUArray ds1_XKQ [Dmd=Just U]
                                     ds2_XKT [Dmd=Just U]
                                     n_XKW [Dmd=Just U(L)]
                                     ds3_XKZ [Dmd=Just L] ->
          case n_XKW
          of _ { GHC.Types.I# x_XNb [Dmd=Just L] ->
          letrec {
            $wa_X1z0 [Occ=LoopBreaker]
              :: GHC.Prim.Int#
                 -> GHC.Prim.Int#
                 -> GHC.Prim.State# s_XDW
                 -> (# GHC.Prim.State# s_XDW, () #)
            [LclId,
             Arity=3,
             Str=DmdType LLL,
             Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=3,
                     Value=True, ConLike=True, Cheap=True,
                     Expandable=True,
                     Guidance=IF_ARGS [0 0 0] 13 3}]
            $wa_X1z0 =
              \ (ww_s1wO :: GHC.Prim.Int#)
                (ww_s1wS :: GHC.Prim.Int#)
                (w_s1wU :: GHC.Prim.State# s_XDW) ->
                case GHC.Prim.># ww_s1wO ww_s1wS of _ {
                  GHC.Bool.False ->
                    case GHC.Prim.readIntArray#
                           @ s_XDW ds3_XKZ ww_s1wO w_s1wU
                    of _
                    { (# s2#_aK9 [Dmd=Just L], e#_aKa [Dmd=Just L] #) ->
                    case GHC.Prim.writeIntArray#
                           @ s_XDW
                           ds3_XKZ
                           ww_s1wO
                           (GHC.Prim.+# e#_aKa ww_X1yr)
                           s2#_aK9
                    of s2#_aKF [Dmd=Just L] { __DEFAULT ->
                    $wa_X1z0
                      (GHC.Prim.+# ww_s1wO 1)
                      ww_s1wS
                      s2#_aKF
                    }
                    };
                  GHC.Bool.True ->
                    (# w_s1wU, GHC.Unit.() #)
                }; } in
          case $wa_X1z0 0 (GHC.Prim.-# x_XNb 1) w_X1yu
          of _ { (# new_s_XMM [Dmd=Just L], _ #) ->
          $wa_X1yE
            @ s_XDW
            wild_XKN
            (GHC.Prim.+# ww_X1yr 1)
            new_s_XMM
          }
          }
          };
        GHC.Bool.True -> (# w_X1yu, GHC.Unit.() #)
      }; } in
$wa_X1yE
  @ s_aCz wild_aJj (GHC.Prim.+# ww_s1x2 1) new_s_aL3
}
}
};

Is being specialised with a rule like:

[LclId,
 Arity=3,
 Str=DmdType LLL,
 RULES: "SC:$wa0" [ALWAYS]
            forall {sc_s1yQ
                      :: GHC.Prim.State#
                           GHC.Prim.RealWorld
                    sc_s1yR :: GHC.Prim.Int#
                    sc_s1yS :: GHC.Types.Int
                    sc_s1yT
                      :: GHC.Prim.MutableByteArray#
                           GHC.Prim.RealWorld}
              $wa_X1yE @ GHC.Prim.RealWorld
                       (Data.Array.Base.STUArray
                          @ GHC.Prim.RealWorld
                          @ GHC.Types.Int
                          @ GHC.Types.Int
                          lvl_sSK
                          ww4_a1ss
                          sc_s1yS
                          sc_s1yT)
                       sc_s1yR
                       sc_s1yQ
              = $s$wa_s1zl
                  sc_s1yQ sc_s1yR sc_s1yS sc_s1yT]

To the final code:

letrec {
  $s$wa_s1zl
    :: GHC.Prim.State# GHC.Prim.RealWorld
       -> GHC.Prim.Int#
       -> GHC.Types.Int
       -> GHC.Prim.MutableByteArray# GHC.Prim.RealWorld
       -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
  [LclId, Arity=4, Str=DmdType LLLL]
  $s$wa_s1zl =
    \ (sc_s1yQ :: GHC.Prim.State# GHC.Prim.RealWorld)
      (sc_s1yR :: GHC.Prim.Int#)
      (sc_s1yS :: GHC.Types.Int)
      (sc_s1yT
         :: GHC.Prim.MutableByteArray#
              GHC.Prim.RealWorld) ->
      case GHC.Prim.># sc_s1yR y_aJN of _ {
        GHC.Bool.False ->
          case sc_s1yS
          of _ { GHC.Types.I# x_XNb [Dmd=Just L] ->
          letrec {
            $wa_X1z0 [Occ=LoopBreaker]
              :: GHC.Prim.Int#
                 -> GHC.Prim.Int#
                 -> GHC.Prim.State# GHC.Prim.RealWorld
                 -> (# GHC.Prim.State# GHC.Prim.RealWorld,
                       () #)
            [LclId, Arity=3, Str=DmdType LLL]
            $wa_X1z0 =
              \ (ww_s1wO :: GHC.Prim.Int#)
                (ww_s1wS :: GHC.Prim.Int#)
                (w_s1wU
                   :: GHC.Prim.State#
                        GHC.Prim.RealWorld) ->
                case GHC.Prim.># ww_s1wO ww_s1wS of _ {
                  GHC.Bool.False ->
                    case GHC.Prim.readIntArray#
                           @ GHC.Prim.RealWorld
                           sc_s1yT
                           ww_s1wO
                           w_s1wU
                    of _
                    { (# s2#_aK9 [Dmd=Just L], e#_aKa [Dmd=Just L] #) ->
                    case GHC.Prim.writeIntArray#
                           @ GHC.Prim.RealWorld
                           sc_s1yT
                           ww_s1wO
                           (GHC.Prim.+# e#_aKa sc_s1yR)
                           s2#_aK9
                    of s2#_aKF [Dmd=Just L] { __DEFAULT ->
                    $wa_X1z0
                      (GHC.Prim.+# ww_s1wO 1)
                      ww_s1wS
                      s2#_aKF
                    }
                    };
                  GHC.Bool.True ->
                    (# w_s1wU, GHC.Unit.() #)
                }; } in
          case $wa_X1z0 0 (GHC.Prim.-# x_XNb 1) sc_s1yQ
          of _ { (# new_s_XMM [Dmd=Just L], _ #) ->
          $wa_X1yE
            @ GHC.Prim.RealWorld
            (Data.Array.Base.STUArray
               @ GHC.Prim.RealWorld
               @ GHC.Types.Int
               @ GHC.Types.Int
               lvl_sSK
               ww4_a1ss
               sc_s1yS
               sc_s1yT)
            (GHC.Prim.+# sc_s1yR 1)
            new_s_XMM
          }
          };
        GHC.Bool.True -> (# sc_s1yQ, GHC.Unit.() #)
      };

But this is daft! We can see from $wa_X1yE that the third component of the STUArray is always (I# x_XNb). Why not unpack the constructor in the specialisation too?

(In fact, exactly the same pattern occurs at the original call site of the local recursive function, so this problem isn't because the specialisations are being seeded from the call site rather than the loop body).

To reproduce, compile the attached code with:

ghc -O2 -fforce-recomp --make STUArray.hs -ddump-simpl

Attachments (1)

STUArray.hs (836 bytes) - added by batterseapower 9 years ago.

Download all attachments as: .zip

Change History (4)

Changed 9 years ago by batterseapower

Attachment: STUArray.hs added

comment:1 Changed 9 years ago by simonpj

Description: modified (diff)

comment:2 Changed 9 years ago by simonpj

Status: newmerge
Test Case: simplCore/should_compile/T4908

Thanks for identifying this. It's a long-standing bug; and there was a corresponding one in the simplifier, so a good one to find.

Mon Jan 31 11:35:29 GMT 2011  simonpj@microsoft.com
  * Improve Simplifier and SpecConstr behaviour
  
  Trac #4908 identified a case where SpecConstr wasn't "seeing" a
  specialisation it should easily get.  The solution was simple: see
  Note [Add scrutinee to ValueEnv too] in SpecConstr.
  
  Then it turned out that there was an exactly analogous infelicity in
  the mighty Simplifer too; see Note [Add unfolding for scrutinee] in
  Simplify. This fix is good for Simplify even in the absence of the
  SpecConstr change.  (It arose when I moved the binder- swap stuff to
  OccAnall, not realising that it *remains* valuable to record info
  about the scrutinee of a case expression.  The Note says why.
  
  Together these two changes are unconditionally good.  Better
  simplification, better specialisation. Thank you Max.

    M ./compiler/simplCore/Simplify.lhs -38 +57
    M ./compiler/specialise/SpecConstr.lhs -18 +39

Please merge

Simon

comment:3 Changed 9 years ago by igloo

Resolution: fixed
Status: mergeclosed

Merged

Note: See TracTickets for help on using tickets.