Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#4184 closed bug (invalid)

Squirrelly inliner behaviour leads to 80x slowdown

Reported by: bos Owned by:
Priority: normal Milestone: 7.4.1
Component: Compiler Version: 6.12.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by igloo)

Recently, I switched the mwc-random package ( over from running in the ST monad to using Roman's primitive package. I didn't notice initially, but this caused a huge performance regression.

mwc-random uses ST internally, and runs the benchmarks/Quickie benchmark in 0.017 seconds.

mwc-random uses PrimMonad internally, and takes 1.328 seconds to run the same benchmark.

That's about a factor of 80 slowdown. This causes a massive knock-on performance loss in packages such as criterion that need fast PRNGs.

The problem is very easy to reproduce:

cabal install mwc-random-
cabal install mwc-random-
darcs get
cd mwc-random/benchmarks
ghc -fforce-recomp -O  -package mwc-random- --make Quickie -o quickie-411
ghc -fforce-recomp -O  -package mwc-random- --make Quickie -o quickie-513

time ./quickie-411
time ./quickie-513

Attachments (2)

Slowdown.hs (421 bytes) - added by bos 10 years ago.
Randomish.hs (2.8 KB) - added by bos 10 years ago.

Download all attachments as: .zip

Change History (13)

Changed 10 years ago by bos

Attachment: Slowdown.hs added

Changed 10 years ago by bos

Attachment: Randomish.hs added

comment:1 Changed 10 years ago by bos

I've attached a standalone program that demonstrates the problem. To compile, simply download both source files, and then run the following:

ghc -package vector- --make -O Slowdown

comment:2 Changed 10 years ago by bos

While writing the repro, I found two distinct ways to eliminate the slowdown.

First, you can replace all uses of the PrimMonad class with ST. If you look at the code in Randomish.hs, you can see that the necessary signatures are already present, but commented out.

Alternatively, look for the comment marked "{- INLINE #-}" in the same file, and turn it into a proper INLINE pragma. The original INLINE pragma (on the local name f) worked with GHC 6.10 and 6.12 when I was using the plain ST monad, but stopped working with 6.12 when I switched to PrimMonad.

It seems that fragility in the inliner might be at the root of this problem. Dan Doel notes that he saw a similar problem, and that it's disappeared with the rewritten inliner in GHC HEAD, but this should be useful to keep around as a historical artifact and a bulwark against performance regressions.

comment:3 Changed 10 years ago by bos

Alas, my claimed workaround works in my little toy application, but not in a library.

In my statistics library, there's a resample function with the following signature:

resample :: (PrimMonad m) => Gen (PrimState m) -> [Estimator] -> Int -> Sample -> m [Resample]

It calls the uniform function detailed in the examples above, and thus triggers the slowdown. It doesn't seem to matter whether or not I inline the resample function, I still get my 80x slowdown :-(

comment:5 Changed 10 years ago by rl

Brian, the performance of your example seems to hinge on this:

class M.Unbox a => Variate a where
    uniform :: (PrimMonad m) => Gen (PrimState m) -> m a

instance Variate Double where
    uniform = f where f = uniform2 wordsToDouble
                      {-# INLINE f #-}
    {- INLINE uniform #-}

You only get good performance if you mark uniform as INLINE. This is actually expected. All code parametrised over PrimMonad should ultimately be inlined into a context where the monad is known. Otherwise, GHC can't resolve and inline (>>=) and return which is very bad.

comment:6 Changed 9 years ago by igloo

Description: modified (diff)

comment:7 Changed 9 years ago by igloo

Milestone: 6.16.1

comment:8 Changed 9 years ago by simonpj

Resolution: invalid
Status: newclosed

OK I looked at this. After an hour I realised that you have written

{- INLINE uniform #-}

rather than

{-# INLINE uniform #-}

so your alleged INLINE pragma is being ignored as a comment. As Roman says, this kills your performance.

Incidentally, you don't need that weird f any more, if you ever did. It's enough to say

    uniform = uniform2 wordsToDouble
    {-# INLINE uniform #-}

So I'll close this. But please re-open if it doesn't work.

It's unfortunate that a mis-typed pragma just looks like a comment!

Oh, take note of #4310, which describes a workaround to package vector that will be needed for a while.


comment:9 Changed 9 years ago by bos

Actually, in comment #2 above, I described the procedure for looking for the mistyped INLINE pragma, so that was quite deliberate and intended to be helpful in reproducing the issue. I apologise for not making that part of the report more visible.

I believe that the report is still valid. Thanks for pointing me at #4310 though.

comment:10 Changed 9 years ago by simonpj

So now I'm lost. You need that INLINE pragma. Why do you think it should work without it?


comment:11 Changed 9 years ago by bos

Under GHC 6.12, as comment #3 mentions, the INLINE doesn't really help when enabled. It works for a trivial test case, so a tiny benchmark gets sped back up, but in an application, the slowdown does not go away (i.e. it looks like the inlining doesn't actually take place).

comment:12 Changed 9 years ago by simonpj

Well, GHC 6.12 is water under the bridge. INLINE pragamas are completely re-implemented in HEAD, and are far far more robust. If you say INLINE, then exactly what you write is inlined, no more and no less.

So I claim this ticket is done. You need your INLINE pragma on 'uniform', and then everything works nicely. Yell if you disagree


Note: See TracTickets for help on using tickets.