#13536 closed bug (fixed)
Program which terminates instantly in GHC 8.0.2 runs for minutes with 8.2.1
Reported by: | RyanGlScott | Owned by: | |
---|---|---|---|
Priority: | highest | Milestone: | 8.2.1 |
Component: | Compiler | Version: | 8.1 |
Keywords: | Cc: | nomeata, george | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Runtime performance bug | Test Case: | T13536 |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | Phab:D3437 | |
Wiki Page: |
Description
This currently causes the vector
test suite to loop forever (see here). I've reproduced this with GHC 8.2.1 and HEAD. Unfortunately, it's not easy to isolate down to a file with no dependencies, so for now this requires vector
and QuickCheck
to reproduce. First, install them:
$ cabal install vector QuickCheck --allow-newer -w /opt/ghc/8.2.1/bin/ghc
Then take this file:
{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE TypeFamilies #-} module Main (main) where import qualified Data.Vector.Generic as V import qualified Data.Vector.Unboxed as DVU import Test.QuickCheck import Text.Show.Functions () main :: IO () main = do verboseCheck ((\f (i, b) v -> V.foldl f (i, b) v == foldl (\x -> f (unmodel x)) (i, b) ( DVU.toList v)) :: ((Int, Bool) -> (Int, Bool) -> (Int, Bool)) -> (Int, Bool) -> DVU.Vector (Int, Bool) -> Bool) instance (Arbitrary a, DVU.Unbox a) => Arbitrary (DVU.Vector a) where arbitrary = fmap DVU.fromList arbitrary class TestData a where type Model a unmodel :: Model a -> a instance TestData Bool where type Model Bool = Bool unmodel = id instance TestData Int where type Model Int = Int unmodel = id instance (Eq a, Eq b, TestData a, TestData b) => TestData (a,b) where type Model (a,b) = (Model a, Model b) unmodel (a,b) = (unmodel a, unmodel b)
Then compile it with /opt/ghc/8.2.1/bin/ghc -O2 Main.hs
(the -O2
part is important). Observe that running it never terminates.
However, the same program does terminate when compiled with 8.0.2!
Change History (31)
comment:1 Changed 2 years ago by
comment:2 Changed 2 years ago by
Actually, vector
isn't needed here. Here's an example that only requires QuickCheck
:
{-# LANGUAGE TypeFamilies #-} module Main (main) where import Test.QuickCheck import Text.Show.Functions () main :: IO () main = verboseCheck foldlTest type FoldlTest a = (a -> a -> a) -> a -> [a] -> Bool foldlTest :: FoldlTest (Int, Int) foldlTest f (i, b) v = foldl f (i, b) v == foldl (\x -> f (unmodel x)) (i, b) v class TestData a where type Model a unmodel :: Model a -> a instance TestData Int where type Model Int = Int unmodel = id instance (Eq a, Eq b, TestData a, TestData b) => TestData (a,b) where type Model (a,b) = (Model a, Model b) unmodel (a,b) = (unmodel a, unmodel b)
Another observation is that the type (Int, Int)
is crucial for triggering the infinite loop. If you use, say, FoldlTest Int
instead of FoldlTest (Int, Int)
, then it terminates again.
comment:3 Changed 2 years ago by
More progress. Here is a program which deterministically exhibits the issue (i.e., it doesn't rely on system-generated pseudorandomness):
{-# LANGUAGE TypeFamilies #-} module Main (main) where import System.Random.TF.Gen (seedTFGen) import Test.QuickCheck (arbitrary) import Test.QuickCheck.Gen (Gen(..)) import Test.QuickCheck.Random (QCGen(..)) import Text.Show.Functions () main :: IO () main = do let tfGen = seedTFGen ( 543863073959529591 , 14453565003432405558 , 3036645681517334938 , 17781306407512891751 ) qcGen = QCGen tfGen (f, (i, b), v) = case arbitrary of MkGen g -> g qcGen 30 print $ foldlTest f (i, b) v type FoldlTest a = (a -> a -> a) -> a -> [a] -> Bool foldlTest :: FoldlTest (Int, Int) foldlTest f (i, b) v = foldl f (i, b) v == foldl (\x -> f (unmodel x)) (i, b) v class TestData a where type Model a unmodel :: Model a -> a instance TestData Int where type Model Int = Int unmodel = id instance (Eq a, Eq b, TestData a, TestData b) => TestData (a,b) where type Model (a,b) = (Model a, Model b) unmodel (a,b) = (unmodel a, unmodel b)
I'd also like to retract my claim that this program is looping forever. I timed this program when compiled with -O2
on GHC 8.2.1:
137.01user 0.47system 2:17.43elapsed 100%CPU (0avgtext+0avgdata 7476maxresident)k 0inputs+0outputs (0major+936minor)pagefaults 0swaps
As opposed to GHC 8.0.2:
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 4768maxresident)k 0inputs+0outputs (0major+289minor)pagefaults 0swaps
So there's still a bug, but I don't think it's infinite in nature.
comment:4 Changed 2 years ago by
Summary: | Program which terminated in GHC 8.0.2 loops with 8.2.1 → Program which terminates instantly in GHC 8.0.2 runs for minutes with 8.2.1 |
---|
comment:5 Changed 2 years ago by
Here's a version with no dependencies:
{-# LANGUAGE TypeFamilies #-} module Main where import Control.Monad (ap, liftM, liftM2, liftM3, replicateM) import Data.Int (Int32) main :: IO () main = do let stdGen = StdGen 1523085842 1207612140 qcGen = QCGen stdGen (f, (i, b), v) = case arbitrary of MkGen g -> g qcGen 30 print $ foldlTest f (i, b) v type FoldlTest a = (a -> a -> a) -> a -> [a] -> Bool foldlTest :: FoldlTest (Bool, Bool) foldlTest f (i, b) v = foldl f (i, b) v == foldl (\x -> f (unmodel x)) (i, b) v class TestData a where type Model a unmodel :: Model a -> a instance TestData Bool where type Model Bool = Bool unmodel = id instance (Eq a, Eq b, TestData a, TestData b) => TestData (a,b) where type Model (a,b) = (Model a, Model b) unmodel (a,b) = (unmodel a, unmodel b) ------------------------------------------------------------------------------- -- random stuff data StdGen = StdGen !Int32 !Int32 stdNext :: StdGen -> (Int, StdGen) stdNext (StdGen s1 s2) = (fromIntegral z', StdGen s1'' s2'') where z' = if z < 1 then z + 2147483562 else z z = s1'' - s2'' k = s1 `quot` 53668 s1' = 40014 * (s1 - k * 53668) - k * 12211 s1'' = if s1' < 0 then s1' + 2147483563 else s1' k' = s2 `quot` 52774 s2' = 40692 * (s2 - k' * 52774) - k' * 3791 s2'' = if s2' < 0 then s2' + 2147483399 else s2' stdRange :: StdGen -> (Int,Int) stdRange _ = (1, 2147483562) stdSplit :: StdGen -> (StdGen, StdGen) stdSplit std@(StdGen s1 s2) = (left, right) where left = StdGen new_s1 t2 right = StdGen t1 new_s2 new_s1 | s1 == 2147483562 = 1 | otherwise = s1 + 1 new_s2 | s2 == 1 = 2147483398 | otherwise = s2 - 1 StdGen t1 t2 = snd (stdNext std) ------------------------------------------------------------------------------- -- QuickCheck newtype QCGen = QCGen StdGen newtype Gen a = MkGen{ unGen :: QCGen -> Int -> a } variant :: Integral n => n -> Gen a -> Gen a variant k (MkGen g) = MkGen (\r n -> g (variantQCGen k r) n) bigNatVariant :: Integer -> StdGen -> StdGen bigNatVariant n g | g `seq` stop n = chip True (fromInteger n) g | otherwise = (bigNatVariant $! chop n) $! chip False (fromInteger n) g {-# INLINE natVariant #-} natVariant :: Integral a => a -> StdGen -> StdGen natVariant n g | g `seq` stop n = chip True (fromIntegral n) g | otherwise = bigNatVariant (toInteger n) g {-# INLINE variantTheGen #-} variantTheGen :: Integral a => a -> StdGen -> StdGen variantTheGen n g | n >= 1 = natVariant (n-1) (boolVariant False g) | n == 0 = natVariant (0 `asTypeOf` n) (boolVariant True g) | otherwise = bigNatVariant (negate (toInteger n)) (boolVariant True g) boolVariant :: Bool -> StdGen -> StdGen boolVariant False = fst . stdSplit boolVariant True = snd . stdSplit variantQCGen :: Integral a => a -> QCGen -> QCGen variantQCGen n (QCGen g) = QCGen (variantTheGen n g) chip :: Bool -> Int -> StdGen -> StdGen chip finished n = boolVariant finished . boolVariant (even n) chop :: Integer -> Integer chop n = n `div` 2 stop :: Integral a => a -> Bool stop n = n <= 1 instance Functor Gen where fmap f (MkGen h) = MkGen (\r n -> f (h r n)) instance Applicative Gen where pure = return (<*>) = ap instance Monad Gen where return x = MkGen (\_ _ -> x) MkGen m >>= k = MkGen (\(QCGen r) n -> let (r1,r2) = case stdSplit r of (g1, g2) -> (QCGen g1, QCGen g2) MkGen m' = k (m r1 n) in m' r2 n ) promote :: Monad m => m (Gen a) -> Gen (m a) promote m = do eval <- delay return (liftM eval m) delay :: Gen (Gen a -> a) delay = MkGen (\r n g -> unGen g r n) listOf :: Gen a -> Gen [a] listOf gen = sized $ \n -> do k <- chooseInt (0,n) vectorOf k gen vectorOf :: Int -> Gen a -> Gen [a] vectorOf = replicateM sized :: (Int -> Gen a) -> Gen a sized f = MkGen (\r n -> let MkGen m = f n in m r n) chooseInt :: (Int, Int) -> Gen Int chooseInt rng = MkGen (\r _ -> let (x,_) = randomIvalIntegral rng r in x) qcGenRange :: QCGen -> (Int, Int) qcGenRange (QCGen g) = stdRange g qcGenNext :: QCGen -> (Int, QCGen) qcGenNext (QCGen g) = case stdNext g of (x, g') -> (x, QCGen g') randomIvalIntegral :: (Integral a) => (a, a) -> QCGen -> (a, QCGen) randomIvalIntegral (l,h) = randomIvalInteger (toInteger l, toInteger h) randomIvalInteger :: (Num a) => (Integer, Integer) -> QCGen -> (a, QCGen) randomIvalInteger (l,h) rng | l > h = randomIvalInteger (h,l) rng | otherwise = case (f 1 0 rng) of (v, rng') -> (fromInteger (l + v `mod` k), rng') where (genlo, genhi) = qcGenRange rng b = fromIntegral genhi - fromIntegral genlo + 1 q = 1000 k = h - l + 1 magtgt = k * q f mag v g | mag >= magtgt = (v, g) | otherwise = v' `seq`f (mag*b) v' g' where (x,g') = qcGenNext g v' = (v * b + (fromIntegral x - fromIntegral genlo)) chooseBool :: (Bool, Bool) -> Gen Bool chooseBool rng = MkGen (\r _ -> let (x,_) = randomRBool rng r in x) randomRBool :: (Bool, Bool) -> QCGen -> (Bool, QCGen) randomRBool (a,b) g = case (randomIvalInteger (bool2Int a, bool2Int b) g) of (x, g') -> (int2Bool x, g') where bool2Int :: Bool -> Integer bool2Int False = 0 bool2Int True = 1 int2Bool :: Int -> Bool int2Bool 0 = False int2Bool _ = True class Arbitrary a where arbitrary :: Gen a instance Arbitrary Bool where arbitrary = chooseBool (False, True) instance Arbitrary a => Arbitrary [a] where arbitrary = listOf arbitrary instance (Arbitrary a, Arbitrary b) => Arbitrary (a, b) where arbitrary = liftM2 (,) arbitrary arbitrary instance (Arbitrary a, Arbitrary b, Arbitrary c) => Arbitrary (a,b,c) where arbitrary = liftM3 (,,) arbitrary arbitrary arbitrary instance (CoArbitrary a, Arbitrary b) => Arbitrary (a -> b) where arbitrary = promote (`coarbitrary` arbitrary) class CoArbitrary a where coarbitrary :: a -> Gen b -> Gen b instance (CoArbitrary a, CoArbitrary b) => CoArbitrary (a,b) where coarbitrary (x,y) = coarbitrary x . coarbitrary y instance CoArbitrary Bool where coarbitrary False = variant (0 :: Int) coarbitrary True = variant (1 :: Int)
comment:6 Changed 2 years ago by
Here is a version without any of the random or quickcheck stuff. (I used the actual i
, b
, v
values from the test and wrote down an arbitrary strict function f
.)
{-# LANGUAGE TypeFamilies #-} module Main where main :: IO () main = do let f :: (Bool, Bool) -> (Bool, Bool) -> (Bool, Bool) f (True, False) (False, False) = (False, True) f _ _ = (True, False) ((i, b), v) = ((False,True),[(False,True),(False,False),(True,True),(True,False),(False,False),(False,True),(True,True),(True,True),(False,True),(True,False),(False,False),(True,True),(True,True),(False,False),(False,False),(False,True),(True,False),(True,False),(True,True),(True,True),(False,True),(True,False),(True,False),(True,True),(False,False),(True,True),(False,False),(True,False),(False,True),(True,True)]) print $ foldlTest f (i, b) v type FoldlTest a = (a -> a -> a) -> a -> [a] -> Bool foldlTest :: FoldlTest (Bool, Bool) foldlTest f (i, b) v = foldl f (i, b) v == foldl (\x -> f (unmodel x)) (i, b) v class TestData a where type Model a unmodel :: Model a -> a instance TestData Bool where type Model Bool = Bool unmodel = id instance (Eq a, Eq b, TestData a, TestData b) => TestData (a,b) where type Model (a,b) = (Model a, Model b) unmodel (a,b) = (unmodel a, unmodel b)
Observations so far:
- Making the match in
unmodel
lazy (unmodel ~(a,b) = ...
) makes the program fast again.
- Adding an explicit export list
module Main (main) where
also makes the program fast again.
comment:8 Changed 2 years ago by
Well this is quite interesting. The culprit is apparently the new STG CSE pass. It's fast with -fno-stg-cse
and slow with -fstg-cse
, and the only change in the STG is
case eta_s5Di of { (,) a_s5Dn [Occ=Once] b_s5Do [Occ=Once] -> - (,) [a_s5Dn b_s5Do]; + eta_s5Di; };
which is the body of
unmodel (a,b) = (unmodel a, unmodel b)
It certainly makes sense that STG CSE would do that and that it couldn't be done in Core-level CSE, because the types Model Bool
and Bool
are different there; but I don't yet understand why it is bad in this program.
comment:9 Changed 2 years ago by
Zooming out a bit:
let { go1_s5D8 [Occ=LoopBreaker] :: [(GHC.Types.Bool, GHC.Types.Bool)] -> (GHC.Types.Bool, GHC.Types.Bool) -> (GHC.Types.Bool, GHC.Types.Bool) [LclId, Arity=2, Str=<S,1*U><L,1*U(U,U)>, Unf=OtherCon []] = sat-only \r [ds_s5Dh eta_s5Di] case ds_s5Dh of { [] -> eta_s5Di; : y_s5Dk [Occ=Once] ys_s5Dl [Occ=Once] -> let { sat_s5Dq [Occ=Once, Dmd=<L,1*U(U,U)>] :: (GHC.Types.Bool, GHC.Types.Bool) [LclId] = \s [] let { sat_s5Dp [Occ=Once] :: (GHC.Types.Bool, GHC.Types.Bool) [LclId] = \u [] case eta_s5Di of { (,) a_s5Dn [Occ=Once] b_s5Do [Occ=Once] -> eta_s5Di; }; } in w_s5CS sat_s5Dp y_s5Dk; } in go1_s5D8 ys_s5Dl sat_s5Dq; }; } in
Is it okay for the sat_s5Dq
thunk to be marked as single-entry? It is passed as the second argument of the recursive call to go1
but after the CSE go1
's second argument eta_s5Di
actually appears twice (as the scrutinee of the case
, and then again as the body of the case
).
comment:10 Changed 2 years ago by
Maybe CSE here should have replaced the pair (a,b)
by the case binder, not the scrutinee? Would that work in general (in the sense of not making the occurrence analysis incorrect)?
comment:11 Changed 2 years ago by
Cc: | nomeata added |
---|
The culprit is apparently the new STG CSE pass
Ah yes! We very carefully run a final demand-analysis just before tidy-core, precisely to ensure that the used-once info (which can get invalidated) is correct before code gen. But the STG CSE pass is undoing that goodness. Total disaster.
I'll think about how best to do this. Copying Joachim who added CSE for STG>
comment:12 Changed 2 years ago by
OK, I've had a bit of a look. First, if we'd done this:
case eta_s5Di of wild1 { (,) a b -> (,) a b ====> case eta_s5Di of wild1 { (,) a b -> wild1
we'd have been fine. Because eta_s5Di
points to a single-entry thunk (as comment:9 so accurately points out) the thunk won't be updated. But wild1
will be bound to the heap-allocated pair returned from evaluating eta_s5Di
, not to the eta_s5Di
thunk, so all would be well. In fact it's better even if eta_s5Di
is updated, because if we use eta_s5Di
in the case alternative we have to save it across the eval, whereas if we use wild1
we just use the returned pair directly. Better all round.
So why are we using eta_s5Di
? Because of this code:
cse_bndr | StgApp trivial_scrut [] <- scrut' = trivial_scrut -- See Note [Trivial case scrutinee] | otherwise = bndr'
The reason for this is explained in the Note, but means that we use eta_s5Di
instead of wild1
, with exponentially worse cost! This is very bad.
Short term fix (Reid): just say
cse_bndr = bndr'
and it'll all work fine.
One side point. Binders in STG have occurrence info attached, and wild1
is marked as dead. If we use it, it'll suddenly become un-dead; it'd make me uneasy to have lying occurrence info. (Apart from anything else, the pretty printer doesn't print a dead binder, which is confusing if it is then mentioned.) Why do we need occurrence info on binders? Search for isDeadBinder
in codeGen
. However, I don't think it ever matters for case binders, so we could safely drop occurrence info for them algoteher.
Back to the main point. Why do we need that special case in cse_bndr
? Reason: consider
case x of r1 Just a -> case a of r2 Just b -> let v = Just b in Just v
We want ultimately to get
case x of r1 Just a -> case a of r2 Just b -> r1
What actually happens is this. Suppose we didn't have the special case, and always used bndr'
(as in "Short term fix" above). Then
- In the
Just a ->
alternative, we'd extendce_conAppMap
withce_conAppMap = Just a :-> r1
- Now in the
Just b ->
alternative, we further extend it thusce_conAppMap = Just a :-> r1 Just b :-> r2
- Now when we see
let v = Just b
, we'll add the substitutionv :-> r2
, and drop the let-binding (good). - But now when we see the
Just v
we'll substitute to getJust r2
. But alas! There is no entryJust r2 :-> r1
in thece_conAppMap
, onlyJust a :-> r
. (Of course,a
andr2
are synonymous here.)
So that's the problem that Note [Trivial case scrutinee]
is supposed to fix. With the cse_bndr
fix, the ce_conAppMap
looks like
ce_conAppMap = Just a :-> x Just b :-> a
And now we'll end up with
case x of r1 Just a -> case a of r2 Just b -> x
which does collapse the nested allocation, but at the expense of introducing the exponential performance bug.
But it's so unnecesary! All we need do is to use r1
instad of x
in the final result and all will be well. The crucial point is this we must only add extra references to variables (like r1
and r2
) bound to data constructors, not to variables (like x
, a
, and b
) bound to thunks.
How can we get the best of both worlds? Here's my idea
- Ensure that the range of
ce_conAppMap
mentions only variables bound to constructors; so do NOT do thecse_bndr
fix above.
- Instead, add a
ce_bndrMap
that maps a case-binder to the scrutinee. Thus, in our examplece_bndrMap = r1 :-> x r2 :-> a
- Now, just before looking up in the
ce_conAppMap
, apply thece_bndrMap
to the thing you are looking up. So just before looking upJust r2
, apply thece_bndrMap
to getJust a
and look that up. Do not do anything else with the result of applying thece_bndrMap
... it's just used to transform a key before looking it up ince_conAppMap
.
Bingo.
Now, do we really need THREE maps in CseEnv
? No: it is easy to combine ce_renaming
and ce_subst
, which is what we do in CSE.hs
.
Finally, a bug in the comments. Here:
, ce_subst :: IdEnv OutId -- ^ This substitution contains CSE-specific entries. The domain are -- OutIds, so ce_renaming has to be applied first. -- It has an entry x ↦ y when a let-binding `let x = Con y` is -- removed because `let y = Con z` is in scope.
In the second-last line, that Con y
should be Con z
.
Joachim: would you like to work on this?
comment:13 Changed 2 years ago by
comment:14 Changed 2 years ago by
Great analysis, Simon. Should be straight-forward to implement, I hope. My brain is currently somewhere else, but I can have a look over the weekend, unless someone beats me to it.
comment:15 Changed 2 years ago by
Before I forget, there's another opportunity here because of the special form of sat_s5Dp
after CSE, namely optimizing it to
sat_s5Dp = eta_s5Di
We already do this at the Core level and I imagine it wouldn't be hard to do here also. I don't know whether this pattern arises frequently in practice. But consider for example
newtype T a = MkT a f (x, y) = (MkT x, y)
(Without the MkT
the Core simplifier can turn this into f z@(_, _) = z
, which turns into much simpler code.)
comment:16 Changed 2 years ago by
Yes. It's really
case e of w { p1 -> w; ...; pn -> w } ===> e
(Core has a more general version involving strictness on x
.)
I don't know how much it happens in practice. It would not be hard to include this in StgCse
though.
comment:17 Changed 2 years ago by
Cc: | george added |
---|
comment:18 Changed 2 years ago by
Differential Rev(s): | → Phab:D3437 |
---|---|
Status: | new → patch |
Ok, did it, and also added a test case for the problem we fixed. I put it up at Phab:D3437.
I did not implement what’s suggested in comment:15. Can you open a new ticket for that?
comment:20 Changed 2 years ago by
Resolution: | → fixed |
---|---|
Status: | patch → closed |
comment:23 Changed 2 years ago by
Also, does b55f310d06b8d3988d40aaccc0ff13601ee52b84 fix the program in comment:6?
comment:25 Changed 2 years ago by
With my fix the program in comment:6 runs instantly.
So please let's add it as a regression test! (To HEAD at least.) Thanks.
comment:26 Changed 2 years ago by
Test Case: | → T13536 |
---|
So please let's add it as a regression test! (To HEAD at least.) Thanks.
How do we robustly test whether a program is “fast”? I already did add a regression test that checks for the particular problem much more targetedly.
comment:29 Changed 2 years ago by
Allright. I doubt it will pull its weight (allocation tests tend to slowly nudge out of the window, then require manual adjustment of the numbers), given that the existing test is more precise anyways, but here it is.
comment:30 Changed 2 years ago by
Status: | merge → closed |
---|
comment:19 merged to ghc-8.2
as 687e79fdf3d192cdc16bccb8b28eaec60ebb8abb.
Even stranger, that call to
unmodel
withinmain
is required to trigger the infinite loop—removingunmodel
causes the program to terminate again.