#14936 closed bug (fixed)

GHC 8.4 performance regressions when using newtypes

Reported by: danilo2 Owned by:
Priority: highest Milestone: 8.4.2
Component: Compiler Version: 8.4.1
Keywords: SpecConstr Cc: sjakobi, simonpj
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: perf/should_run/T14936
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by bgamari)

Here is some serious performance regression in the following code:

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleContexts #-}

module Main where

import Prelude
import qualified Foreign.Storable as Storable
import qualified Control.Monad.State.Strict as S
import Control.Monad.IO.Class
import Foreign.Marshal.Alloc       (mallocBytes)
import Criterion.Main

newtype Foo a = Foo a

intSize :: Int
intSize = Storable.sizeOf (undefined :: Int)

slow :: Int -> IO ()
slow i = do
    ptr <- mallocBytes $ 2 * intSize
    Storable.pokeByteOff ptr intSize (0 :: Int)
    let go 0 = pure ()
        go j = do
            Foo (!_, !off) <- S.get
            !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off
            liftIO $ Storable.pokeByteOff ptr off $! (x + 1)
            go (j - 1)
    S.evalStateT (go i) (Foo ((0::Int),(intSize::Int)))


fast :: Int -> IO ()
fast i = do
    ptr <- mallocBytes $ 2 * intSize
    Storable.pokeByteOff ptr intSize (0 :: Int)
    let go 0 = pure ()
        go j = do
            (!_, !off) <- S.get
            !(x :: Int) <- liftIO $ Storable.peekByteOff ptr off
            liftIO $ Storable.pokeByteOff ptr off $! (x + 1)
            go (j - 1)
    S.evalStateT (go i) ((0::Int),(intSize::Int))


main :: IO ()
main = defaultMain
    [ bgroup "slow"
        $ (\(i :: Int) -> bench ("10e" <> show i)
        $ perRunEnv (return ())
        $ \v -> slow (10 ^ i))  <$> [7..8]
    
    , bgroup "fast"
        $ (\(i :: Int) -> bench ("10e" <> show i)
        $ perRunEnv (return ())
        $ \v -> fast (10 ^ i))  <$> [7..8]
    ]

Compiled with flags: -threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -fexcess-precision -fexpose-all-unfoldings -flate-dmd-anal -fspec-constr-keen -fspecialise-aggressively -fstatic-argument-transformation -fmax-worker-args=200

The slow function executes 2 times slower than the fast one. The only difference is that the state is wrapped in a newtype. It was working properly in GHC 8.2 (both functions were equally fast - as fast as the current fast function).

Change History (13)

comment:1 Changed 18 months ago by danilo2

Priority: highhighest

comment:2 Changed 18 months ago by RyanGlScott

Here is a slighter smaller example to demonstrate the issue:

{-# LANGUAGE BangPatterns #-}
module Bug2 where

import Control.Monad.Trans.State.Strict

newtype Foo a = Foo a

slowGo :: Int -> StateT (Foo (Int, Int)) IO ()
slowGo 0 = pure ()
slowGo j = do
  Foo (!_, !off) <- get
  slowGo (j - 1)

fastGo :: Int -> StateT (Int, Int) IO ()
fastGo 0 = pure ()
fastGo j = do
  (!_, !off) <- get
  fastGo (j - 1)

In GHC 8.2.2, if you compare the Core between these two functions (in the _$s_$w functions that perform most of the work):

$ /opt/ghc/8.2.2/bin/ghc Bug2.hs -O2 -fforce-recomp -ddump-simpl -dsuppress-idinfo -dsuppress-uniques -dsuppress-module-prefixes -dsuppress-coercions
[1 of 1] Compiling Bug2             ( Bug2.hs, Bug2.o )

==================== Tidy Core ====================
Result size of Tidy Core
  = {terms: 190, types: 298, coercions: 60, joins: 0/0}

...

Rec {
-- RHS size: {terms: 24, types: 23, coercions: 0, joins: 0/0}
fastGo_$s$wfastGo
  :: State# RealWorld
     -> Int# -> Int# -> Int# -> (# State# RealWorld, ((), (Int, Int)) #)
fastGo_$s$wfastGo
  = \ (sc :: State# RealWorld)
      (sc1 :: Int#)
      (sc2 :: Int#)
      (sc3 :: Int#) ->
      case sc3 of ds {
        __DEFAULT -> fastGo_$s$wfastGo sc sc1 sc2 (-# ds 1#);
        0# -> (# sc, ((), (I# sc1, I# sc2)) #)
      }
end Rec }

...

Rec {
-- RHS size: {terms: 25, types: 37, coercions: 6, joins: 0/0}
slowGo_$s$wslowGo
  :: State# RealWorld
     -> Int#
     -> Int#
     -> ((Int, Int) :: *) ~R# (Foo (Int, Int) :: *) =>
        Int# -> (# State# RealWorld, ((), Foo (Int, Int)) #)
slowGo_$s$wslowGo
  = \ (sc :: State# RealWorld)
      (sc1 :: Int#)
      (sc2 :: Int#)
      (sg :: ((Int, Int) :: *) ~R# (Foo (Int, Int) :: *))
      (sc3 :: Int#) ->
      case sc3 of ds {
        __DEFAULT -> slowGo_$s$wslowGo sc sc1 sc2 @~ <Co:5> (-# ds 1#);
        0# -> (# sc, ((), (I# sc1, I# sc2) `cast` <Co:1>) #)
      }
end Rec }

Then they are essentially identical (the slowGo one has an extra argument of type ((Int, Int) :: *) ~R# (Foo (Int, Int) :: *), but that is zero-width, so it shouldn't have any effect at runtime).

On the other hand, in GHC 8.4.1:

$ ~/Software/ghc-8.4.1/bin/ghc Bug2.hs -O2 -fforce-recomp -ddump-simpl -dsuppress-idinfo -dsuppress-uniques -dsuppress-module-prefixes -dsuppress-coercions                         
[1 of 1] Compiling Bug2             ( Bug2.hs, Bug2.o )

==================== Tidy Core ====================
Result size of Tidy Core
  = {terms: 163, types: 231, coercions: 54, joins: 0/0}

...

Rec {
-- RHS size: {terms: 24, types: 23, coercions: 0, joins: 0/0}
fastGo_$s$wfastGo
  :: State# RealWorld
     -> Int# -> Int# -> Int# -> (# State# RealWorld, ((), (Int, Int)) #)
fastGo_$s$wfastGo
  = \ (sc :: State# RealWorld)
      (sc1 :: Int#)
      (sc2 :: Int#)
      (sc3 :: Int#) ->
      case sc3 of ds {
        __DEFAULT -> fastGo_$s$wfastGo sc sc1 sc2 (-# ds 1#);
        0# -> (# sc, ((), (I# sc1, I# sc2)) #)
      }
end Rec }

...

Rec {
-- RHS size: {terms: 27, types: 34, coercions: 9, joins: 0/0}
$wslowGo
  :: Int#
     -> Foo (Int, Int)
     -> State# RealWorld
     -> (# State# RealWorld, ((), Foo (Int, Int)) #)
$wslowGo
  = \ (ww :: Int#) (w :: Foo (Int, Int)) (w1 :: State# RealWorld) ->
      case ww of ds {
        __DEFAULT ->
          case w `cast` <Co:4> of wild { (ds1, off) ->
          case ds1 of { I# ipv ->
          case off of { I# ipv1 ->
          $wslowGo (-# ds 1#) (wild `cast` <Co:5>) w1
          }
          }
          };
        0# -> (# w1, ((), w) #)
      }
end Rec }

This time, slowGo doesn't have something akin to slowGo_$s$wslowGo. Instead, it performs the body of the loop in $wslowGo, which uses Foo (Int, Int) instead of two unboxed Int# arguments. I could imagine that this alone contributes to the slowdown.

comment:3 Changed 18 months ago by sjakobi

Cc: sjakobi added

comment:4 Changed 18 months ago by RyanGlScott

Cc: simonpj added
Milestone: 8.4.2

This regression was introduced in commit fb050a330ad202c1eb43038dc18cca2a5be26f4a (Do not bind coercion variables in SpecConstr rules).

comment:5 Changed 18 months ago by RyanGlScott

Keywords: SpecConstr added

comment:6 Changed 18 months ago by bgamari

Description: modified (diff)

comment:7 Changed 18 months ago by bgamari

RyanGlScott, did you happen to see whether the WARN mentioned in that patch fired in the case of this program? I suppose it likely did.

comment:8 Changed 18 months ago by RyanGlScott

I wasn't using a build with assertions enabled, so I couldn't tell. I'll check shortly.

comment:9 Changed 18 months ago by RyanGlScott

bgamari, your suspicions have been confirmed:

$ ghc2/inplace/bin/ghc-stage2 -O2 -fforce-recomp -DDEBUG Bug.hs
[1 of 1] Compiling Bug2             ( Bug.hs, Bug.o )
WARNING: file compiler/specialise/SpecConstr.hs, line 2078
  SpecConstr: bad covars: [sg_s2Ba]
  $wslowGo_s2A8 (-# ds_X2vv 1#)
                (wild_Xi
                 `cast` (Sym (N:Foo[0] <(Int, Int)>_R)
                         :: ((Int, Int) :: *) ~R# (Foo (Int, Int) :: *)))
                w_s2A3

comment:10 Changed 18 months ago by Simon Peyton Jones <simonpj@…>

In 5ab8094/ghc:

SpecConstr: accommodate casts in value arguments

This commit:

  commit fb050a330ad202c1eb43038dc18cca2a5be26f4a
  Author: Simon Peyton Jones <simonpj@microsoft.com>
  Date:   Thu Oct 12 11:00:19 2017 +0100

  Do not bind coercion variables in SpecConstr rules

arranged to reject any SpecConstr call pattern that mentioned
a coercion in the pattern.

There was a good reason for that
-- see Note [SpecConstr and casts] --
but I didn't realise how important it was to accept patterns
that mention casts in /terms/.  Trac #14936 showed this up.

This patch just narrows the restriction to discard only
the cases where the coercion is mentioned only in types.
Fortunately that was pretty easy to do.

comment:11 Changed 18 months ago by simonpj

Resolution: fixed
Status: newclosed
Test Case: perf/should_run/T14936

comment:12 Changed 18 months ago by RyanGlScott

Status: closedmerge

I'll optimistically mark this as "merge", since this seems like an important performance win.

comment:13 Changed 17 months ago by bgamari

Status: mergeclosed

I was on the fence about this one but ended up merging it since it looks fairly low-risk.

Note: See TracTickets for help on using tickets.