Opened 3 years ago

Closed 3 years ago

#12425 closed bug (fixed)

With -O1 and above causes ghc to use all available memory before being killed by OOM killer

Reported by: erikd Owned by:
Priority: high Milestone: 8.2.1
Component: Compiler Version: 8.0.1
Keywords: Inlining Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: perf/compiler/T12425
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I've just taken over an existing haskell package and in the process of getting it to work with ghc 8.0 discovered that when compiling one file ghc tires to use all my ram. It basically ends up using 40% of 16 Gig before being killed by the OOM (out-of-memory) killer.

The project is at: https://github.com/erikd/conduit-find.git

The problem can be reproduced by:

git clone https://github.com/erikd/conduit-find.git
cd conduit-find
git checkout test/ghc-8.0 
cabal sandbox init
cabal install --dependencies-only
cabal build

which on two of my machines fails building the Data.Cond module.

This is regression, because this code compiles fine with ghc-7.10.3 and ghc-7.8.4.

Change History (21)

comment:1 Changed 3 years ago by erikd

@carter suggested that adding the Safe language pragma might prevent this, but it didn't help.

comment:2 Changed 3 years ago by erikd

Maybe not too surprisingly, removing all the INLINE pragmas (51 in 500 lines of code) allows the code to compile even with -O2. (suggested by @mpickering).

Last edited 3 years ago by erikd (previous) (diff)

comment:3 Changed 3 years ago by thomie

Priority: normalhigh

Reproducible with HEAD. Here is a testcase that doesn't depend on any packages that aren't in the GHC tree.

module T12425 where

import Control.Applicative
import Control.Monad
import Control.Monad.Trans.State.Lazy (StateT(..))

data Result a m b = RecurseOnly (Maybe (CondT a m b))
                  | KeepAndRecurse b (Maybe (CondT a m b))

instance Monad m => Functor (Result a m) where
    fmap f (RecurseOnly l)      = RecurseOnly (liftM (fmap f) l)
    fmap f (KeepAndRecurse a l) = KeepAndRecurse (f a) (liftM (fmap f) l)
    {-# INLINE fmap #-}

newtype CondT a m b = CondT { getCondT :: StateT a m (Result a m b) }

instance Monad m => Functor (CondT a m) where
    fmap f (CondT g) = CondT (liftM (fmap f) g)
    {-# INLINE fmap #-}

instance Monad m => Applicative (CondT a m) where
    pure  = undefined
    (<*>) = undefined

instance Monad m => Monad (CondT a m) where
    return = undefined
    (>>=) = undefined

@erikd: the following change fixes the problem.

instance Monad m => Functor (CondT a m) where
-    fmap f (CondT g) = CondT (liftM (fmap f) g)
+    fmap f (CondT g) = CondT (liftA (fmap f) g)
    {-# INLINE fmap #-}

Tested with GHC 8 and HEAD. To compile conduit-find with HEAD, you need to make the following other changes:

comment:4 Changed 3 years ago by erikd

@thomie Oh wow, that change you suggested (switching from liftM to liftA) prevents this problem in conduit-find even with stock 8.0.1.

Edit: And a bit of CPP lets me use liftA for ghc >= 7.10 and liftM for earlier.

Last edited 3 years ago by erikd (previous) (diff)

comment:5 Changed 3 years ago by nomeata

A guess would be that in 8.0’s library version, liftM is implemented in a way that mentions fmap, and liftM is inlined to expose that. Then all your unconditinally marked inline functions become effectively recursive, but still keep inlining. But didn’t investigate deeper.

comment:6 Changed 3 years ago by erikd

Thats seems to be a possible explanation. Is there no loop checker for the inliner?

comment:7 Changed 3 years ago by mpickering

Keywords: Inlining added

comment:8 Changed 3 years ago by simonpj

Type of failure: OtherCompile-time performance bug

This looks bad. Inlining should not go on for ever ... there's a tick counter as a last-ditch hard stop. It could just possibly be too many INLINE pragmas, but it's quite difficult to eat 16G.

Does anyone feel able to investigate further?

Simon

comment:9 Changed 3 years ago by bgamari

Indeed there does appear to be some sort of loop here. -ddump-inlinings shows long repeated runs of,

...
Inlining done: Control.Monad.Trans.State.Lazy.$fMonadStateT1
Inlining done: Control.Monad.Trans.State.Lazy.$fMonadStateT_$c>>=
Inlining done: GHC.Base.$
Inlining done: Control.Monad.Trans.State.Lazy.runStateT
Inlining done: Control.Monad.Trans.State.Lazy.runStateT1
Inlining done: Control.Monad.Trans.State.Lazy.runStateT
Inlining done: Control.Monad.Trans.State.Lazy.runStateT1
Inlining done: GHC.Base.liftM
Inlining done:
    Control.Monad.Trans.State.Lazy.$fMonadStateT_$creturn
...

These runs are punctuated by runs of,

...
Inlining done: GHC.Base.$fApplicativeMaybe_$sliftM
...

The runs appear to get longer as the simplifier progresses. It indeed seems plausible that all of these bindings have INLINE pragmas attached.

comment:10 Changed 3 years ago by bgamari

Milestone: 8.0.28.0.3

It doesn't look like this will get fixed for 8.0.2.

comment:11 Changed 3 years ago by simonpj

I believe this is not fixed by my fix to #12776, sadly

comment:12 Changed 3 years ago by simonpj

Thomie, THANK YOU for the small repro case in comment:3. With its help I rapidly nailed the bug. Here's what is happening.

Suppose we have

let rec { f = ...g...g...
        ; g = ...f...f... }
in
case x of
  True  -> ...f..
  False -> ..f...

In each iteration of the simplifier the occurrence analyser OccAnal chooses a loop breaker. Suppose in iteration 1 it choose g as the loop breaker. That means it is free to inline f.

Suppose that GHC decides to inline f in the branches of the case, but (for some reason; eg it is not satureated) in the rhs of g. So we get

let rec { f = ...g...g...
        ; g = ...f...f... }
in
case x of
  True  -> ...g...g.....
  False -> ..g..g....

Now suppose that, for some reason, in the next iteraion the occurrence analyser chooses f as the loop breaker, so it can freely inling g. And again for some reason the simplifer inlines g at its calls in the case branches, but not in the RHS of f. Then we get

let rec { f = ...g...g...
        ; g = ...f...f... }
in
case x of
  True  -> ...(...f...f...)...(...f..f..).....
  False -> ..(...f...f...)...(..f..f...)....

You can see where this is going! Each iteration of the simplifier doubles the number of calls to f or g. No wonder GHC is slow!

(In the particular example in comment:3, f and g are the two mutually recursive fmap instances for CondT and Result. They are both marked INLINE which, oddly, is why they don't inline in each other's RHS, because the call there is not saturated.)

The root cause is that we flip-flop on our choice of loop breaker. I always thought it didn't matter, and indeed for any single iteration to terminate, it doesn't matter. But when we iterate, it matters a lot!!

I think this was not happening before because, by a fluke, we tended to always choose the same one. But, perhaps as a resul of Bartosz's work on making GHC deterministic, the occurrence analyser now reliably flip-flops in each iteration, as above.

Trac #12234 turns out to be exactly the same issue


What to do? We want the occurrence analyser to choose a loop breaker based on stable criteria, not arbitrary things. Simple idea: if two or more potential loop breakers look equally plausible, choose the one that was a loop breaker in the previous iteration. That should make it stable.

Stay tuned. But I thought I'd explain what is happening in case I get distracted.

comment:13 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In 517d03e/ghc:

Fix an asymptotic bug in the occurrence analyser

Trac #12425 and #12234 showed up a major and long-standing
bug in the occurrence analyser, whereby it could generate
explonentially large program!

There's a lot of commentary on #12425; and it's all described
in Note [Loop breakers, node scoring, and stability]

I did quite a lot of refactoring to make the code comprehensibe
again (its structure had bit-rotted rather), so the patch
looks bigger than it really is.

Hurrah!

I did a nofib run to check that I hadn't inadertently ruined
anything:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
          fluid          -0.3%     -1.5%      0.01      0.01     +0.0%
         parser          -0.9%     +0.6%      0.04      0.04     +0.0%
         prolog          -0.1%     +1.2%      0.00      0.00     +0.0%

--------------------------------------------------------------------------------
            Min          -0.9%     -1.5%     -8.6%     -8.7%     +0.0%
            Max          +0.1%     +1.2%     +7.7%     +7.8%     +2.4%
 Geometric Mean          -0.2%     -0.0%     -0.2%     -0.3%     +0.0%

I checked what happened in 'prolog'.  It seems that we have a
recursive data structure something like this

   f :: [blah]
   f x = build (\cn.  ...g...  )

   g :: [blah2]
   g y = ....(foldr k z (f y))....

If we inline 'f' into 'g' we get better fusion than the other
way round, but we don't have any way to spot that at the moment.
(I wonder if we could do worker/wrapper for functions returning
a 'build'?)  It was happening before by a fluke.

Anyway I decided to accept this; it's relatively rare I think.

comment:14 Changed 3 years ago by simonpj

Status: newmerge
Test Case: perf/compiler/T12425

I think I nailed this one, in HEAD at least. Can you see if it fixes your original library?

Probably worth merging this to 8.0.3 if we produce such a thing.

comment:15 Changed 3 years ago by erikd

I built a compiler from git HEAD, but that wouldn't build the dependencies for my library (numerous problems with the primitve and vector libraries).

I tried checking out the ghc-8.0 branch and then cherry-picking commit d03e41b4f5 but that failed due to merge conflicts in compiler/simplCore/OccurAnal.hs that seem rather tricky to resolve.

comment:16 Changed 3 years ago by simonpj

Thanks for trying.

I think this would be retro-fittable to 8.0.3 with a bit of work -- if it really matters.

comment:17 Changed 3 years ago by RyanGlScott

Sorry erikd, several of those dependencies not building on GHC HEAD were probably my fault. After adjusting some template-haskell upper version bounds on dependent packages on Hackage, I was able to successfully install conduit-find with GHC HEAD, and I did not observe any noticeable slowness.

comment:18 Changed 3 years ago by erikd

Thanks @RyanGlScott . As long it compiled with -O1 or above, then its fixed. The original problem was GHC chewing up all memory when *compiling* conduit-find.

comment:19 Changed 3 years ago by uznx

Just for the record, this bug can be reproduced by compiling Hackage's find-conduit 0.4.4 with GHC 8.0.1. find-conduit is the old version of conduit-find prior to a maintainer change, and prior to the workarounds to avoid this bug that have been incorporated into conduit-find.

comment:20 Changed 3 years ago by bgamari

Milestone: 8.0.38.2.1

At this point it is rather unlikely that there will be an 8.0.3. Re-milestoning.

comment:21 Changed 3 years ago by bgamari

Resolution: fixed
Status: mergeclosed
Note: See TracTickets for help on using tickets.