Opened 9 years ago

Closed 6 years ago

Strictness analyser is to conservative about passing a boxed parameter

Reported by: Owned by: tibbe nomeata low 7.6.2 Compiler 6.13 DemandAnalysis Unknown/Multiple Unknown/Multiple None/Unknown

Description

Given the following two modules:

Fold.hs:

module Fold (Tree, fold') where

data Tree a = Leaf | Node a !(Tree a) !(Tree a)

-- Strict, pre-order fold.
fold' :: (a -> b -> a) -> a -> Tree b -> a
fold' f = go
where
go z Leaf = z
go z (Node a l r) = let z'  = go z l
z'' = f z' a
in z' `seq` z'' `seq` go z'' r
{-# INLINE fold' #-}

FoldTest.hs:

module FoldTest (sumTree) where

import Fold

sumTree :: Tree Int -> Int
sumTree = fold' (+) 0

I'd expect that the accumulator z used in go to be an unboxed Int#. However, it's boxed:

sumTree1 :: Int
sumTree1 = I# 0

sumTree_go :: Int -> Fold.Tree Int -> Int
sumTree_go =
\ (z :: Int) (ds_ddX :: Fold.Tree Int) ->
case ds_ddX of _ {
Fold.Leaf -> z;
Fold.Node a l r ->
case sumTree_go z l of _ { I# z' ->
case a of _ { I# a# ->
sumTree_go (I# (+# z' a#)) r
}
}
}

sumTree :: Fold.Tree Int -> Int
sumTree =
\ (eta1_B1 :: Fold.Tree Int) ->
sumTree_go sumTree1 eta1_B1

Given this definition of fold'

fold' :: (a -> b -> a) -> a -> Tree b -> a
fold' f = go
where
go z _ | z `seq` False = undefined
go z Leaf = z
go z (Node a l r) = go (f (go z l) a) r
{-# INLINE fold' #-}

I get the core I want. However, this version isn't explicit in that the left branch (i.e. go z l) should be evaluated before f is called on the result. In other words, I think my first definition is the one that correctly expresses the evaluation order, yet it results in worse core.

comment:1 Changed 9 years ago by tibbe

Version: 6.12.1 → 6.13

The problem is present in HEAD as well.

comment:2 follow-up:  3 Changed 9 years ago by simonmar

We established (during a conversation on IRC) that the reason for the lack of unboxing was that the function wasn't inlined until after the worker-wrapper transformation. The question is therefore, why wasn't it inlined earlier?

comment:3 in reply to:  2 Changed 9 years ago by simonmar

We established (during a conversation on IRC) that the reason for the lack of unboxing was that the function wasn't inlined until after the worker-wrapper transformation. The question is therefore, why wasn't it inlined earlier?

I think I must have misunderstood the IRC conversation in question, tibbe told me today that the function does indeed get inlined before strictness analysis and worker-wrapper. tibbe: if you could attach the output from -dverbose-core2core to the ticket, that might help.

comment:4 Changed 9 years ago by tibbe

Here are the relevant parts of -dverbose-core2core. First, inlining happens:

==================== Simplifier Phase 2 [main] max-iterations=4 ====================
a_soD :: GHC.Types.Int
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [] 1 2}]
a_soD = GHC.Types.I# 0

FoldTest.sumTree :: Fold.Tree GHC.Types.Int -> GHC.Types.Int
[LclIdX,
Arity=1,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS  18 0}]
FoldTest.sumTree =
\ (eta1_B1 :: Fold.Tree GHC.Types.Int) ->
letrec {
go_aaE [Occ=LoopBreaker]
:: GHC.Types.Int -> Fold.Tree GHC.Types.Int -> GHC.Types.Int
[LclId,
Arity=2,
Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=2, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [0 3] 12 0}]
go_aaE =
\ (z_aaF :: GHC.Types.Int) (ds_dbh :: Fold.Tree GHC.Types.Int) ->
case ds_dbh of _ {
Fold.Leaf -> z_aaF;
Fold.Node a_aaH l_aaI r_aaJ ->
case go_aaE z_aaF l_aaI of _ { GHC.Types.I# ipv_soJ ->
case a_aaH of _ { GHC.Types.I# y_aox ->
go_aaE (GHC.Types.I# (GHC.Prim.+# ipv_soJ y_aox)) r_aaJ
}
}
}; } in
go_aaE a_soD eta1_B1

And some time later demand analysis:

==================== Demand analysis ====================
a_soD :: GHC.Types.Int
[LclId,
Str=DmdType m,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [] 1 2}]
a_soD = GHC.Types.I# 0

FoldTest.sumTree :: Fold.Tree GHC.Types.Int -> GHC.Types.Int
[LclIdX,
Arity=1,
Str=DmdType S,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS  18 0}]
FoldTest.sumTree =
\ (eta1_B1 [Dmd=Just S] :: Fold.Tree GHC.Types.Int) ->
letrec {
go_aaE [Occ=LoopBreaker]
:: GHC.Types.Int -> Fold.Tree GHC.Types.Int -> GHC.Types.Int
[LclId,
Arity=2,
Str=DmdType SS,
Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=2, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [0 3] 12 0}]
go_aaE =
\ (z_aaF [Dmd=Just S] :: GHC.Types.Int)
(ds_dbh [Dmd=Just S] :: Fold.Tree GHC.Types.Int) ->
case ds_dbh of _ {
Fold.Leaf -> z_aaF;
Fold.Node a_aaH [Dmd=Just U(L)]
l_aaI [Dmd=Just S]
r_aaJ [Dmd=Just S] ->
case go_aaE z_aaF l_aaI of _ { GHC.Types.I# ipv_soJ [Dmd=Just L] ->
case a_aaH of _ { GHC.Types.I# y_aox [Dmd=Just L] ->
go_aaE (GHC.Types.I# (GHC.Prim.+# ipv_soJ y_aox)) r_aaJ
}
}
}; } in
go_aaE a_soD eta1_B1

and after that worker/wrapper

==================== Worker Wrapper binds ====================
a_soD :: GHC.Types.Int
[LclId,
Str=DmdType m,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [] 1 2}]
a_soD = GHC.Types.I# 0

FoldTest.sumTree :: Fold.Tree GHC.Types.Int -> GHC.Types.Int
[LclIdX,
Arity=1,
Str=DmdType S,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS  18 0}]
FoldTest.sumTree =
\ (eta1_B1 [Dmd=Just S] :: Fold.Tree GHC.Types.Int) ->
letrec {
go_aaE [Occ=LoopBreaker]
:: GHC.Types.Int -> Fold.Tree GHC.Types.Int -> GHC.Types.Int
[LclId,
Arity=2,
Str=DmdType SS,
Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=2, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [0 3] 12 0}]
go_aaE =
\ (z_aaF [Dmd=Just S] :: GHC.Types.Int)
(ds_dbh [Dmd=Just S] :: Fold.Tree GHC.Types.Int) ->
case ds_dbh of _ {
Fold.Leaf -> z_aaF;
Fold.Node a_aaH [Dmd=Just U(L)]
l_aaI [Dmd=Just S]
r_aaJ [Dmd=Just S] ->
case go_aaE z_aaF l_aaI of _ { GHC.Types.I# ipv_soJ [Dmd=Just L] ->
case a_aaH of _ { GHC.Types.I# y_aox [Dmd=Just L] ->
go_aaE (GHC.Types.I# (GHC.Prim.+# ipv_soJ y_aox)) r_aaJ
}
}
}; } in
go_aaE a_soD eta1_B1

comment:5 Changed 9 years ago by tibbe

To add to the confusion, using HEAD, this definition of foldl' gets an unboxed accumulator:

fold' :: (a -> b -> a) -> a -> Tree b -> a
fold' f = go
where
go z Leaf = z
go z (Node a l r) = z `seq` go (f (go z l) a) r
{-# INLINE fold' #-}

but this one doesn't

fold' :: (a -> b -> a) -> a -> Tree b -> a
fold' f = go
where
go z Leaf = z
go z (Node a l r) = z `seq` go (f (go z l) a) r

Note the missing INLINE pragma.

The strange thing is that GHC happily inlines both versions (since they're not recursive) but the former gets an unboxed accumulator for some reason I don't understand.

comment:6 Changed 9 years ago by igloo

Milestone: → 7.2.1

comment:7 Changed 8 years ago by simonpj

Summary: Missing unboxing in pre-order fold over binary tree → Strictness analyser is to conservative about passing a boxed parameter

I took a little look at this. As you say, you get the loop

sumTree_go :: Int -> Fold.Tree Int -> Int
sumTree_go =
\ (z :: Int) (ds_ddX :: Fold.Tree Int) ->
case ds_ddX of _ {
Fold.Leaf -> z;
Fold.Node a l r ->
case sumTree_go z l of _ { I# z' ->
case a of _ { I# a# ->
sumTree_go (I# (+# z' a#)) r
} } }

Notice that this loop is strict in z, but does not actually unbox z. The strictness analyser conservatively passes the boxed version under such circumstances, to avoid the possiblity of unboxing it, passing it to the function, which immediately reboxes it.

It turns out that adding a "!" to the defn of go is enough to fix this:

go z Leaf = z
go !z (Node a l r)   -- NOTE THE "!"
= let z'  = go z l
z'' = f z' a
in z' `seq` z'' `seq` go z'' r

In fact the seqs are redundant because go is strict in both args, so this gives the same resulting code

go z Leaf = z
go !z (Node a l r)   -- NOTE THE "!"
= go (f z' a) (go z l)

The code (in the sumTree call) is lovely

T4267.\$wgo :: GHC.Prim.Int# -> T4267.Tree GHC.Types.Int -> GHC.Prim.Int#
T4267.\$wgo =
\ (ww_sps :: GHC.Prim.Int#) (w_spu :: T4267.Tree GHC.Types.Int) ->
case w_spu of _ {
T4267.Leaf -> ww_sps;
T4267.Node ipv_soP ipv1_soQ ipv2_soR ->
case T4267.\$wgo ww_sps ipv1_soQ of ww1_spx { __DEFAULT ->
case ipv_soP of _ { GHC.Types.I# y_ap5 ->
T4267.\$wgo (GHC.Prim.+# ww1_spx y_ap5) ipv2_soR
} } }

So that's a workaround.

Meanwhile I think the strictness analyser should be a bit cleverer, so I'll keep the ticket open for that reason.

comment:8 Changed 8 years ago by tibbe

Aside: Does it matter if the bang pattern goes on the first or second equation of go? The first equation is already strict in z so putting the bang pattern on the second equation might be clearer. As a rule I always put bang patterns on the first equation if I want a function to be strict in some parameter.

comment:9 Changed 8 years ago by igloo

Milestone: 7.4.1 → 7.6.1 normal → low

comment:10 Changed 7 years ago by igloo

Milestone: 7.6.1 → 7.6.2

comment:11 Changed 6 years ago by simonpj

difficulty: → Unknown set to nomeata

All versions give good code now, happily! Worth adding a regression test.

comment:12 Changed 6 years ago by Joachim Breitner <mail@…>

Add testcase for #4267

comment:13 Changed 6 years ago by nomeata

Resolution: → fixed new → closed

comment:14 Changed 6 years ago by nomeata

Unfortunately, #7994 would break this test ;-)

bytes allocated value is too low:
(If this is because you have improved GHC, please
update the test so that GHC doesn't regress again)
Expected    bytes allocated: 130000 +/-10%
Lower bound bytes allocated: 117000
Upper bound bytes allocated: 143000
Actual      bytes allocated:  40992
*** unexpected failure for T4267(normal)

comment:15 Changed 6 years ago by Joachim Breitner <mail@…>

Update test cases due to call arity

Some nice improvements on already succeeding test cases (#876, #7954
and #4267)

Test #149 needed a little change, lest call arity causes a allocation
change that we do not want to test here.