Opened 5 years ago

Last modified 3 years ago

#9320 new bug

Inlining regression/strangeness in 7.8

Reported by: dolio Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.2
Keywords: Inlining Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

A couple days ago, it was reported to me that vector-algorithms had a significant performance regression (~20x) on GHC 7.8.2. The problem stems from a lack of inlining and specialization of some of the functions that were previously handled in 7.6 and earlier. The following is a reduced test case (the vector and primitive packages are required):

module A (test) where

import Control.Monad.ST
import Control.Monad
import Control.Monad.Primitive
import Data.Vector.Generic.Mutable as U

test :: (PrimMonad m, MVector v a, Num a) => Int -> v (PrimState m) a -> m a
-- test :: (MVector v a, Num a) => Int -> v s a -> ST s a
test 0 v = liftM (+1) $ unsafeRead v 0
test n v = do
  long1 v
  test (n-1) v
{-# INLINABLE test #-}

long1, long2, long3, long4 :: (PrimMonad m, MVector v a) => v (PrimState m) a -> m ()

long1 v = long2 v >> long2 v >> long2 v >> long2 v
long2 v = long3 v >> long3 v >> long3 v >> long3 v
long3 v = long4 v >> long4 v >> long4 v >> long4 v
long4 v = unsafeRead v 0 >>= unsafeWrite v 0

{-# INLINE long1 #-}
{-# INLINE long2 #-}
{-# INLINE long3 #-}
{-# INLINE long4 #-}
module Main (main) where

import Control.Monad.ST
import Data.Vector.Unboxed.Mutable as U hiding (read)
import System.Environment
import Unsafe.Coerce
import GHC.Prim

import A

test0 :: Int -> MVector s Int -> ST s Int
test0 n v = test n v
{-# NOINLINE test0 #-}

test1' :: Int -> MVector Any Int -> ST Any Int
test1' n v = test n v
{-# NOINLINE test1 #-}

test1 :: Int -> MVector a Int -> ST a Int
test1 = unsafeCoerce test1'

main = getArgs >>= \(n:b:_) ->
  print $ runST $ do
    v <- new 1
    write v 0 0
    (if read b then test0 else test1) (read n) v

Module A exports a single function, test. This function is engineered to be quite large, by inlining several other functions into it, and it is itself marked INLINABLE. Then the Main module uses this function in two different ways:

  • test0 uses test at a type that is compatible with runST
  • test1' uses test at a completely monomorphic type, which is then coerced to a runST compatible type in test1

On 7.6 I believe (though have not checked) that there will be little or no performance difference between test0 and test1. However, on 7.8.2 (and, I have been assured, 7.8.3), there is a massive speed pentalty for test0; about 70x on my machine. This seems to be due to no inining or specialization for its use of test, which can be seen from -ddump-simpl.

However, if one changes the type of test in A to be specific to ST s rather than using PrimMonad, there is no performance difference, even on 7.8.2. So, the choice to inline and specialize seems to hinge on the instantiation of all the class constraints to monomorphic types containing no variables, rather than just types that resolve all overloading. I myself did not notice this problem, because my benchmark suite uses IO, which is a concrete instantiation of the type, and doesn't exhibit this problem.

I have temporarily 'fixed' vector-algorithms by moving back to INLINE pragmas, but INLINABLE is actually preferable in that case, because it generates faster code than INLINE when the optimizations actually fire. My test case here does not illustrate that well, though.

Is it safe to assume that this was not an intentional change? It's a rather weird rule (to me) if it was.

Change History (5)

comment:1 Changed 5 years ago by simonpj

Are you certain that this ever worked, in any version of GHC?

What's happening is that we see

test0 = /\s. \n v. test @ Data.Vector.Unboxed.Base.MVector
                        @ (GHC.ST.ST s)
                        @ GHC.Types.Int
                       ($dPrimMonad_s1Mw @ s)
                       ...

We run over the program gathering up specialised cals to test, but this one is discarded again because the /\s means that the s type variable in the call isn't visible at top level.

To fix this we need to make the specialiser a bit cleverer, so that it'll generate a RULE like

RULE forall s (d:forall s. PrimMonad (ST s)).
  test @ MVector @ (ST s) @ Int (d @ s)
     = $stest s

with accompanying definition

$stest = /\s. <body-of-test> @ MVector @ (ST s) @ Int ($dPrimMonad_s1Mw @ s)

GHC has never done this. It could do, but it's a new thing.

Simon

comment:2 Changed 5 years ago by dolio

Apparently I am not certain. I've conferred with some people, and my example displays the same behavior on 7.6.3.

The intention was to simulate the behavior of the following test:

module Main (main) where

import qualified Data.Vector.Algorithms.Intro as I
import qualified Data.Vector.Unboxed as U

main = do
  let a = U.fromList [0..10000000::Int]
  print (U.length a)
  let k = U.modify (\v -> I.sort v) a
  print (U.length k)

which is an order of magnitude slower on 7.8 than on 7.6 (note, if you choose to test this, you must use vector-algorithms 0.6.0.1; I have semi-fixed the issue in 0.6.0.2). Running with -ddump-simpl yields 640 lines on 7.8.{2,3}, and ~4,600 lines on 7.6.3. So clearly the difference involves some variety of inlining or specialization.

The type of sort here is:

(PrimMonad m, MVector v e, Ord e) => v (PrimState m) e -> m ()

and the type of modify is:

Unbox a => (forall s. MVector s a -> ST s ()) -> Vector a -> Vector a

So my aim was to replicate the kind of specialization as the sorting example. But clearly I've failed. However, if your description is correct, I have no idea why 7.6.3 was inlining/specializing the sort function in this example. It has the same open s, necessarily, due to the higher rank type of modify. Can you think of anything that might be the difference?


In another direction, it might be interesting to do the kind of specialization you outline above. The PrimMonad dictionary is completely determined even though the function is polymorphic in s, and specializing the function is key to performance, even if some parts are still polymorphic (as they necessarily are for ST). Failing to unbox all the Int operations and so on merely because we are still abstracting over s makes PrimMonad a rather unusable abstraction.

I actually tried (earlier) adding a pragma:

{-# SPECIALIZE sort :: MVector s Int -> ST s () #-}

but GHC complained about it. So it seems that it can't even handle what that sort of thing generates, ignoring the fact that it can't generate it itself.

comment:3 Changed 5 years ago by carter

I believe that SPECIALIZE works better in 7.9 onwards ( or at least, the fix for my own specialize pragmas failing was sorted out there)

comment:4 Changed 3 years ago by mpickering

Keywords: Inlining added

comment:5 Changed 3 years ago by nfrisby

FYI, with GHC 7.10.2, both -O1 and -O2 give code where test0 takes 80x as much time as does test1. So it seems something here has worsened from 20x to 80x since GHC 7.8.2.

$ uname -a
Linux sci-host-a-i-bd1bec66 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.10.2
$ ghc -fforce-recomp -O1 --make Main.hs   # ~same times with -O2
$ time ./Main 500000 True; time ./Main 500000 False
1

real	0m4.703s
user	0m4.682s
sys	0m0.020s
1

real	0m0.055s
user	0m0.051s
sys	0m0.004s
$ echo '4.7 / 0.055' | bc -l
85.45454545454545454545

Edit: the following shows that the lack of "higher-rank specialization" remains the primary suspect.

$ ghc -fforce-recomp -O1 -c Main.hs -ddump-spec -ddump-rule-firings | grep -e 'SPEC/' | sort -u
"SPEC/Main $fApplicativeST @ Any" [ALWAYS]
"SPEC/Main $fApplicativeST_$cpure @ Any" [0]
"SPEC/Main $fMonadST @ Any" [ALWAYS]
"SPEC/Main $fMonadST_$cfail @ Any" [ALWAYS]
"SPEC/Main $fPrimMonadST @ Any" [ALWAYS]
"SPEC/Main $wtest @ MVector @ (ST Any) @ Int" [0]
"SPEC/Main read @ Bool" [ALWAYS]
"SPEC/Main read @ Int" [ALWAYS]
"SPEC/Main test @ MVector @ (ST Any) @ Int" [0]
"SPEC/Main unsafeWrite @ MVector @ (ST Any) @ Int" [ALWAYS]
Rule fired: SPEC/Main $fApplicativeST @ Any
Rule fired: SPEC/Main $fMonadST @ Any
Rule fired: SPEC/Main $fMonadST_$cfail @ Any
Rule fired: SPEC/Main $fPrimMonadST @ Any
Rule fired: SPEC/Main $wtest @ MVector @ (ST Any) @ Int
Rule fired: SPEC/Main unsafeWrite @ MVector @ (ST Any) @ Int
Last edited 3 years ago by nfrisby (previous) (diff)
Note: See TracTickets for help on using tickets.