Opened 5 years ago

Last modified 3 years ago

#9349 new bug

excessive inlining due to state hack

Reported by: rwbarton Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.3
Keywords: Cc: klao@…, edsko
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The program at https://gist.github.com/jorendorff/a3005968adc8f054baf7 runs very slowly when compiled with -O or higher. It seems that arr and/or rangeMap is being inlined into the do block on lines 89-91 and recomputed for each iteration of the loop. The program runs nearly instantly when compiled with -O0 or with -O -fno-pre-inlining. (Of course, this does not mean -fpre-inlining is necessary the culprit; it could be enabling some subsequent misoptimization.)

Change History (10)

comment:1 Changed 5 years ago by klao

Cc: klao@… added

comment:2 Changed 5 years ago by rwbarton

Aha, it is the state hack marking the argument to replicateM_ as one-shot, followed by pre-inlining.

What was rather surprising to me as an end user was that {-# NOINLINE rangeMap #-} had no effect. I guess the pre-inlining does not care about such pragmas.

Simon, any comments?

comment:3 Changed 5 years ago by klao

Here's a simplified example for this bug:

import Control.Monad
import Control.Concurrent.MVar

expensive :: Int -> Int -> Int
expensive n0 a = go n0 a
  where
    go 0 x = x
    go n x = go (n-1) ((x * 37 + 1973) `rem` 177)

main :: IO ()
main = do
  mv <- newMVar (10 :: Int)
  let ex = expensive 100000 3
  replicateM_ 30000 $ do
    x <- readMVar mv
    print $ x + ex

You could probably simplify it further. The important thing is that ex is the result of some expensive computation that incorrectly gets evaluated over and over again. And you have to put some "real" IO into the replicateM_ for the bug to kick in (in the original example it was reading from the stdin, here I just read an MVar).

Unlike the original example, here marking ex as NOINLINE does make the issue disappear. (Otherwise it's the same as the original example: nearly instantaneous with -O0 or -O -fno-pre-inlining, but takes a looong time with -O.)

comment:4 Changed 5 years ago by simonpj

First, I believe that -fno-state-hack would be better than -fno-pre-inlining, because it's less drastic. You might want to check that it does indeed cure the problem.

I understand why NOINLINE is not working. The work is being duplicated by float-in, not by inlining! Consider

   let {-# NOINLINE rangeMap #-}
       rangeMap = expensive
   in 
   replicateM_ n (\s -> ...rangeMap...)

The float-in pass picks up let-bindings and floats them inward as far as possible, but not inside a lambda (unless it's a one-shot lambda). So float-in does this:

   replicateM n (\s -> ....(let rangeMap = expensive in rangeMap)...) ...

The real culprit is, of course that the \s isn't one-shot. But that's why NOINLINE doesn't help.

I suppose we could say that float-in should not move a NOINLINE binding; or at least should not move it in past a lambda, whether one-shot or not. I suppose that might be a finer-grained way of allowing you to fix the thing, a little less brutal than -fno-state-hack.

The other thing that occurs to me is that for this \s we can see (via cardinality analysis) that it is called more than once. So maybe that can override the state-hack nonsense. Needs more looking at.

Anyway I thought I'd write down what is going on. Is this a show-stopper for anyone?

Simon

comment:5 Changed 5 years ago by klao

I can confirm that -fno-state-hack does cure the problem; both in the original example and my simplified test case.

comment:6 Changed 5 years ago by rwbarton

Summary: excessive inlining with -fpre-inliningexcessive inlining due to state hack

Is this a show-stopper for anyone?

No, I don't believe so. Should I close it as a duplicate of #1168?

comment:7 Changed 5 years ago by klao

I'd like to point out before we close this issue, that it is (at least in part) a 7.8-specific regression.

My little example works perfectly fine with 7.6.3 regardless of optimization levels and flags. So, in the least, it has become easier to trigger this bug (even if some similar instances were present before).

comment:8 Changed 5 years ago by simonpj

I'm afraid I have no idea why it's become easier to trigger, but it's definitely very sensitive to how much inlining is done.

Although Trac lets you close as duplicate, I tend to leave these things open (but linked) to remind us that there are a cloud of related examples. Closed things tend to imply "no need to look at me any more". But we are inconsistent about this at the moment.

Simon

comment:9 Changed 4 years ago by nomeata

comment:10 Changed 3 years ago by edsko

Cc: edsko added
Note: See TracTickets for help on using tickets.