Opened 9 years ago
Closed 9 years ago
#4908 closed bug (fixed)
Easy SpecConstr opportunity that is nonetheless missed
Reported by: | batterseapower | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Compiler | Version: | 7.0.1 |
Keywords: | Cc: | ||
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Runtime performance bug | Test Case: | simplCore/should_compile/T4908 |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description (last modified by )
I was looking at the code using that uses unsafe indexing into STUArrays from http://www.lix.polytechnique.fr/~kaustuv/expo/incr_uarray/
One of the reasons that this code runs so much more slowly than his C version is that the inner loop is not fully unboxed. It turns out that a simple SpecConstr opportunity is being missed, and I'm not sure why.
There is a local recursive function function that looks like this:
letrec { $wa_X1yE [Occ=LoopBreaker] :: forall s_aCz. Data.Array.Base.STUArray s_aCz GHC.Types.Int GHC.Types.Int -> GHC.Prim.Int# -> GHC.Prim.State# s_aCz -> (# GHC.Prim.State# s_aCz, () #) [LclId, Arity=3, Str=DmdType LLL, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=3, Value=True, ConLike=True, Cheap=True, Expandable=True, Guidance=IF_ARGS [2 0 0] 33 3}] $wa_X1yE = \ (@ s_XDW) (w_X1yn :: Data.Array.Base.STUArray s_XDW GHC.Types.Int GHC.Types.Int) (ww_X1yr :: GHC.Prim.Int#) (w_X1yu :: GHC.Prim.State# s_XDW) -> case GHC.Prim.># ww_X1yr y_aJN of _ { GHC.Bool.False -> case w_X1yn of wild_XKN [Dmd=Just L] { Data.Array.Base.STUArray ds1_XKQ [Dmd=Just U] ds2_XKT [Dmd=Just U] n_XKW [Dmd=Just U(L)] ds3_XKZ [Dmd=Just L] -> case n_XKW of _ { GHC.Types.I# x_XNb [Dmd=Just L] -> letrec { $wa_X1z0 [Occ=LoopBreaker] :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.State# s_XDW -> (# GHC.Prim.State# s_XDW, () #) [LclId, Arity=3, Str=DmdType LLL, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=3, Value=True, ConLike=True, Cheap=True, Expandable=True, Guidance=IF_ARGS [0 0 0] 13 3}] $wa_X1z0 = \ (ww_s1wO :: GHC.Prim.Int#) (ww_s1wS :: GHC.Prim.Int#) (w_s1wU :: GHC.Prim.State# s_XDW) -> case GHC.Prim.># ww_s1wO ww_s1wS of _ { GHC.Bool.False -> case GHC.Prim.readIntArray# @ s_XDW ds3_XKZ ww_s1wO w_s1wU of _ { (# s2#_aK9 [Dmd=Just L], e#_aKa [Dmd=Just L] #) -> case GHC.Prim.writeIntArray# @ s_XDW ds3_XKZ ww_s1wO (GHC.Prim.+# e#_aKa ww_X1yr) s2#_aK9 of s2#_aKF [Dmd=Just L] { __DEFAULT -> $wa_X1z0 (GHC.Prim.+# ww_s1wO 1) ww_s1wS s2#_aKF } }; GHC.Bool.True -> (# w_s1wU, GHC.Unit.() #) }; } in case $wa_X1z0 0 (GHC.Prim.-# x_XNb 1) w_X1yu of _ { (# new_s_XMM [Dmd=Just L], _ #) -> $wa_X1yE @ s_XDW wild_XKN (GHC.Prim.+# ww_X1yr 1) new_s_XMM } } }; GHC.Bool.True -> (# w_X1yu, GHC.Unit.() #) }; } in $wa_X1yE @ s_aCz wild_aJj (GHC.Prim.+# ww_s1x2 1) new_s_aL3 } } };
Is being specialised with a rule like:
[LclId, Arity=3, Str=DmdType LLL, RULES: "SC:$wa0" [ALWAYS] forall {sc_s1yQ :: GHC.Prim.State# GHC.Prim.RealWorld sc_s1yR :: GHC.Prim.Int# sc_s1yS :: GHC.Types.Int sc_s1yT :: GHC.Prim.MutableByteArray# GHC.Prim.RealWorld} $wa_X1yE @ GHC.Prim.RealWorld (Data.Array.Base.STUArray @ GHC.Prim.RealWorld @ GHC.Types.Int @ GHC.Types.Int lvl_sSK ww4_a1ss sc_s1yS sc_s1yT) sc_s1yR sc_s1yQ = $s$wa_s1zl sc_s1yQ sc_s1yR sc_s1yS sc_s1yT]
To the final code:
letrec { $s$wa_s1zl :: GHC.Prim.State# GHC.Prim.RealWorld -> GHC.Prim.Int# -> GHC.Types.Int -> GHC.Prim.MutableByteArray# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #) [LclId, Arity=4, Str=DmdType LLLL] $s$wa_s1zl = \ (sc_s1yQ :: GHC.Prim.State# GHC.Prim.RealWorld) (sc_s1yR :: GHC.Prim.Int#) (sc_s1yS :: GHC.Types.Int) (sc_s1yT :: GHC.Prim.MutableByteArray# GHC.Prim.RealWorld) -> case GHC.Prim.># sc_s1yR y_aJN of _ { GHC.Bool.False -> case sc_s1yS of _ { GHC.Types.I# x_XNb [Dmd=Just L] -> letrec { $wa_X1z0 [Occ=LoopBreaker] :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #) [LclId, Arity=3, Str=DmdType LLL] $wa_X1z0 = \ (ww_s1wO :: GHC.Prim.Int#) (ww_s1wS :: GHC.Prim.Int#) (w_s1wU :: GHC.Prim.State# GHC.Prim.RealWorld) -> case GHC.Prim.># ww_s1wO ww_s1wS of _ { GHC.Bool.False -> case GHC.Prim.readIntArray# @ GHC.Prim.RealWorld sc_s1yT ww_s1wO w_s1wU of _ { (# s2#_aK9 [Dmd=Just L], e#_aKa [Dmd=Just L] #) -> case GHC.Prim.writeIntArray# @ GHC.Prim.RealWorld sc_s1yT ww_s1wO (GHC.Prim.+# e#_aKa sc_s1yR) s2#_aK9 of s2#_aKF [Dmd=Just L] { __DEFAULT -> $wa_X1z0 (GHC.Prim.+# ww_s1wO 1) ww_s1wS s2#_aKF } }; GHC.Bool.True -> (# w_s1wU, GHC.Unit.() #) }; } in case $wa_X1z0 0 (GHC.Prim.-# x_XNb 1) sc_s1yQ of _ { (# new_s_XMM [Dmd=Just L], _ #) -> $wa_X1yE @ GHC.Prim.RealWorld (Data.Array.Base.STUArray @ GHC.Prim.RealWorld @ GHC.Types.Int @ GHC.Types.Int lvl_sSK ww4_a1ss sc_s1yS sc_s1yT) (GHC.Prim.+# sc_s1yR 1) new_s_XMM } }; GHC.Bool.True -> (# sc_s1yQ, GHC.Unit.() #) };
But this is daft! We can see from $wa_X1yE
that the third component of the STUArray is always (I# x_XNb)
. Why not unpack the constructor in the specialisation too?
(In fact, exactly the same pattern occurs at the original call site of the local recursive function, so this problem isn't because the specialisations are being seeded from the call site rather than the loop body).
To reproduce, compile the attached code with:
ghc -O2 -fforce-recomp --make STUArray.hs -ddump-simpl
Attachments (1)
Change History (4)
Changed 9 years ago by
Attachment: | STUArray.hs added |
---|
comment:1 Changed 9 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 9 years ago by
Status: | new → merge |
---|---|
Test Case: | → simplCore/should_compile/T4908 |
Thanks for identifying this. It's a long-standing bug; and there was a corresponding one in the simplifier, so a good one to find.
Please merge
Simon