#14790 closed bug (fixed)

eqTypeRep does not inline

Reported by: dfeuer Owned by:
Priority: normal Milestone: 8.6.1
Component: Compiler Version: 8.4.1-alpha2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D4405
Wiki Page:

Description (last modified by dfeuer)

GHC never seems to inline eqTypeRep. That's no good! It produces a Maybe, which users will almost certainly want to match on immediately.

I really don't understand why it doesn't inline. It's a tiny function, none of the functions it calls are themselves marked INLINE, and the advantage to inlining seems obvious. Does the unsafeCoerce get in the way somehow? If so, how can we fix it?

Example:

{-# language GADTs, ScopedTypeVariables, TypeApplications,
       AllowAmbiguousTypes #-}
module Foo where
import Type.Reflection

foo :: forall a. Typeable a => Bool
foo = case eqTypeRep (typeRep @a) (typeRep @Int) of
          Just _ -> True
          Nothing -> False

compiles (with -O2) to

foo [InlPrag=NOINLINE] :: forall a. Typeable a => Bool
[GblId, Arity=1, Str=<S,1*U>]
foo
  = \ (@ a_a5un) ($dTypeable_a5up :: Typeable a_a5un) ->
      case eqTypeRep
             @ *
             @ *
             @ a_a5un
             @ Int
             ($dTypeable_a5up
              `cast` (base-4.10.1.0:Data.Typeable.Internal.N:Typeable[0] <*>_N <a_a5un>_N
                      :: (Typeable a_a5un :: Constraint) ~R# (TypeRep a_a5un :: *)))
             lvl4_r69g
      of {
        Nothing -> GHC.Types.False;
        Just ds_d5CQ -> GHC.Types.True
      }

For reference, eqTypeRep is defined like this:

eqTypeRep :: forall k1 k2 (a :: k1) (b :: k2).
             TypeRep a -> TypeRep b -> Maybe (a :~~: b)
eqTypeRep a b
  | typeRepFingerprint a == typeRepFingerprint b = Just (unsafeCoerce HRefl)
  | otherwise                                    = Nothing

Change History (18)

comment:1 Changed 21 months ago by dfeuer

In case you have any doubt that eqTypeRep performance is important, remember that it's used in every single exception handler.

comment:2 Changed 21 months ago by dfeuer

Description: modified (diff)

comment:3 Changed 21 months ago by dfeuer

Description: modified (diff)

comment:4 Changed 21 months ago by dfeuer

Using an explicit TypeRep instead of a Typeable constraint makes no difference.

comment:5 Changed 21 months ago by mpickering

Compile with -dverbose-core2core and -ddump-inlinings and it will tell you exactly why it doesn’t get inlined.

comment:6 Changed 21 months ago by dfeuer

mpickering, I get a bunch of things that look like this:

Considering inlining: eqTypeRep
  arg infos [TrivArg, TrivArg]
  interesting continuation CaseCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [52 128] 210 0
  discounted size = 180
  ANSWER = NO

The continuation, some_benefit, and work-free lines all look good. I don't know how to read the guidance line. Has GHC allowed eqTypeRep to get too large to inline in most cases? If its decisions seem reasonable to you and other experienced folks, perhaps we should do something like

typeRepsSame :: forall k1 k2 (a :: k1) (b :: k2).
             TypeRep a -> TypeRep b -> Bool
typeRepsSame a b = typeRepFingerprint a == typeRepFingerprint b

eqTypeRep :: forall k1 k2 (a :: k1) (b :: k2).
             TypeRep a -> TypeRep b -> Maybe (a :~~: b)
eqTypeRep a b
  | typeRepsSame a b = Just (unsafeCoerce# HRefl)
  | otherwise        = Nothing
{-# INLINE eqTypeRep #-}

inlining a very thin wrapper, but I was surprised enough by the choice that I figured it might point to a more general inlining heuristic problem.

comment:7 Changed 21 months ago by mpickering

I think the proposed patch is the wrong idea. What does the core for eqTypeRep look like? that will surely resolve any confusion.

Looking at the line,

  guidance IF_ARGS [52 128] 210 0
  discounted size = 180
  ANSWER = NO

That says that the discount for being applied to known arguments i 52 and 128 for each argument respectively. The 210 is the overall size of the function. If a function is size 80 or less, it is inlined.

What I suspect is happening is that typeRepFingerprint is inlined into eqTypeRep which makes it big. This has some potential benefit as typeRepFingerprint is defined as a case but nothing further after that.

I think the correct solution is to mark typeRepFingerprint as {-# NOINLINE[0] typeRepFingerprint #-} so that it might only be inlined in the last phase.

It seems that the fact that eqWord64 is marked as INLINE could be part of the problem as there is really no hope that we will resolve the equality of two type fingerprints computed by very complicated expressions. Another solution could be to stop deriving Eq for Fingerprint and instead hand writing the instance and adding a similar pragma as I suggested above.

Could you test how these would work?

comment:8 Changed 21 months ago by mpickering

I looked at the interface file for Data.Typeable.Internal where eqTypeRep is defined. Here is the unfolding for eqTypeRep.

855825a70951c01dfa45bc3067373660
  eqTypeRep ::
    forall k1 k2 (a :: k1) (b :: k2).
    TypeRep a -> TypeRep b -> Maybe (a :~~: b)
  {- Arity: 2, Strictness: <S,1*U><S,1*U>,
     Unfolding: (\ @ k1
                   @ k2
                   @ a :: k1
                   @ b :: k2
                   (a1 :: TypeRep a)
                   (b1 :: TypeRep b) ->
                 let {
                   $j :: Word# -> Word# -> Maybe (a :~~: b)
                     <join 2> {- Arity: 2, Strictness: <S,U><L,U> -}
                   = \ (dt :: Word#)[OneShot] (dt1 :: Word#)[OneShot] ->
                     case b1 of wild {
                       TrType co co1
                       -> case fpTYPELiftedRep of wild1 { Fingerprint dt2 dt3 ->
                          case eqWord# dt dt2 of lwild {
                            DEFAULT -> Nothing @ (a :~~: b)
                            1#
                            -> case eqWord# dt1 dt3 of lwild1 {
                                 DEFAULT -> Nothing @ (a :~~: b)
                                 1# -> eqTypeRep1 @ k1 @ a @ k2 @ b } } }
                       TrTyCon dt2 dt3 ds ds1 ds2
                       -> case eqWord# dt dt2 of lwild {
                            DEFAULT -> Nothing @ (a :~~: b)
                            1#
                            -> case eqWord# dt1 dt3 of lwild1 {
                                 DEFAULT -> Nothing @ (a :~~: b)
                                 1# -> eqTypeRep1 @ k1 @ a @ k2 @ b } }
                       TrApp k4 a2 b2 co dt2 dt3 ds ds1 ds2
                       -> case eqWord# dt dt2 of lwild {
                            DEFAULT -> Nothing @ (a :~~: b)
                            1#
                            -> case eqWord# dt1 dt3 of lwild1 {
                                 DEFAULT -> Nothing @ (a :~~: b)
                                 1# -> eqTypeRep1 @ k1 @ a @ k2 @ b } }
                       TrFun r1 r2 a2 b2 co co1 dt2 dt3 ds ds1
                       -> case eqWord# dt dt2 of lwild {
                            DEFAULT -> Nothing @ (a :~~: b)
                            1#
                            -> case eqWord# dt1 dt3 of lwild1 {
                                 DEFAULT -> Nothing @ (a :~~: b)
                                 1# -> eqTypeRep1 @ k1 @ a @ k2 @ b } } }
                 } in
                 case a1 of wild {
                   TrType co co1
                   -> case fpTYPELiftedRep of wild1 { Fingerprint dt dt1 ->
                      $j dt dt1 }
                   TrTyCon dt dt1 ds ds1 ds2 -> $j dt dt1
                   TrApp k4 a2 b2 co dt dt1 ds ds1 ds2 -> $j dt dt1
TrFun r1 r2 a2 b2 co co1 dt dt1 ds ds1 -> $j dt dt1 }) -}

It appears that both functions I mentioned above are inlined (to no great effect) which make the definition quite large.

comment:9 Changed 21 months ago by dfeuer

mpickering, I'm happy to try it other ways, but there are a few things I don't understand:

  1. What exactly do you think is wrong with the way I did it? I know it's not perfect (see point 3) but I don't know what you dislike about it.
  1. How will the phased NOINLINE(s) you suggest reduce the size of the eqTypeRep unfolding?
  1. While we don't generally'. want to inline the case expressions scrutinizing the arguments, we presumably do want to optimize the common case where one of those arguments is completely known at compile time (such as the typeRep @Int in my example). How does this desire fit into the rest of the program?

comment:10 Changed 21 months ago by mpickering

Without a proper analysis of why the problem occurred in the first place, it can't be evaluated why the fix works at all or whether it is correct.

Adding an INLINE pragma is a very big commitment when there is already a reason that the function is not inlined. Why is the wrapper necessary in addition to the pragma?

I think you are right thought that the NOINLINE[0] pragmas I suggested won't make a difference to the unfolding. So perhaps just adding an INLINABLE pragma is the correct solution?

I don't understand your 3rd point. There doesn't appear to be much advantage at all of knowing either argument as the quality condition will seemingly never simplify further?

comment:11 Changed 21 months ago by dfeuer

Adding an INLINE pragma to a wrapper is exactly what GHC does in its worker/wrapper transformation, so I don't think it's too big a commitment. My third point is that if we know one of the arguments at compile time, we only need to inspect the other one at run time. There's no real need to follow a pointer to get the fingerprint of Int; we already know it! It's just constant folding.

comment:12 Changed 21 months ago by dfeuer

Hmmm... Actually, that constant folding notion is complicated by the fact that the TypeRep builders won't inline to expose the fingerprints anyway, thanks to recursion. Should we INLINE the wrapper function as I proposed, and just NOINLINE the worker?

comment:13 Changed 21 months ago by mpickering

I still don't understand why you need the wrapper. Adding an INLINABLE pragma seems to be the least invasive way to fix this issue? Did you try that yet?

comment:14 Changed 21 months ago by dfeuer

No, not yet. I'll do that later today if you like. But I also don't really understand what that will accomplish. INLINABLE tells GHC to make an unfolding. But GHC already makes an unfolding; it just doesn't use it. Does INLINABLE change what unfolding it makes? The purpose of the wrapper is to use the fact that we know more about what information will be available/useful at typical eqTypeRep call sites than the compiler does; we therefore have a better sense of which part will actually be useful to inline. If we inline the whole thing as it stands, then the call site will get two calls to typeRepFingerprint and a call to ==. That's certainly larger than a call to sameTypeRep; is it also better in enough cases to matter? I don't know. On the other hand, I know that in virtually all realistic cases we want to use inlining to eliminate the Maybe business.

comment:15 Changed 21 months ago by dfeuer

Differential Rev(s): Phab:D4405
Status: newpatch

mpickering, it looks like writing the wrapper and marking it INLINABLE does the trick. I wish I had a better sense of why INLINABLE does that, but okay. I also don't really understand why this wrapper should be INLINABLE, whereas the wrappers GHC manufactures are INLINE. What difference merits that distinction? Nevertheless, I've updated the differential to use INLINABLE instead, and added some comments.

comment:16 Changed 21 months ago by mpickering

When a definition is marked INLINABLE the unfolding created is the same as the source definition and so it will be smaller.

In the worker wrapper transformation (which only applies to recursive functions) the point of inlining the wrapper is to eliminate the mutual recursion in the worker which creates a nice loop working just with unboxed values. None of that is going on here.

comment:17 Changed 21 months ago by David Feuer <David.Feuer@…>

In 8529fbba/ghc:

Get eqTypeRep to inline

GHC didn't inline `eqTypeRep`, presumably because it ended up
being too big. This was unfortunate because it produces a
`Maybe`, which will almost always be scrutinized immediately.

Split `eqTypeRep` into a worker and a tiny wrapper, and mark the
wrapper `INLINABLE`. This change actually seems to reduce Core size,
at least in a small test.

Reviewers: hvr, bgamari, mpickering

Reviewed By: mpickering

Subscribers: mpickering, rwbarton, thomie, carter

GHC Trac Issues: #14790

Differential Revision: https://phabricator.haskell.org/D4405

comment:18 Changed 21 months ago by dfeuer

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.