Opened 5 years ago

Closed 3 years ago

Last modified 21 months ago

#9291 closed feature request (fixed)

Don't reconstruct sum types if the type subtly changes

Reported by: schyler Owned by: nomeata
Priority: normal Milestone:
Component: Compiler Version: 7.8.2
Keywords: Cc: Iceland_jack, ekmett, heisenbug, rwbarton, osa1, slyfox, RyanGlScott, maoe
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D2871
Wiki Page:

Description

Consider this test case:

module Main (main) where

fun :: Either Int String -> Either String String
fun x = case x of
    Left int -> Left (show int)
    Right str -> Right str
{-# NOINLINE fun #-}

main :: IO ()
main = do
    l <- getLine
    print $ fun (Right l)

The core we get for fun looks like:

Main.fun [InlPrag=NOINLINE]
  :: Data.Either.Either GHC.Types.Int GHC.Base.String
     -> Data.Either.Either GHC.Base.String GHC.Base.String
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType <S,1*U>]
Main.fun =
  \ (x_aur :: Data.Either.Either GHC.Types.Int GHC.Base.String) ->
    case x_aur of _ [Occ=Dead] {
      Data.Either.Left int_aus ->
        Data.Either.Left
          @ GHC.Base.String
          @ GHC.Base.String
          (GHC.Show.$fShowInt_$cshow int_aus);
      Data.Either.Right str_aCG ->
        Data.Either.Right @ GHC.Base.String @ GHC.Base.String str_aCG
    }

There would be less allocations and probably perform better if we just coerced x into the new type. Because the coercion is common code, this also means that if hypothetically some other sum type which had 15 constructors, and only 3 had subtle type changes, 12 of the case branches could be CSE'd into the single coerce greatly reducing code generated and also hinting the inliner better.

Change History (51)

comment:1 Changed 5 years ago by schyler

Clarification, the ticket is talking about performing the coercion in the Right -> branch, where the reconstruction *seems* pointless.

comment:2 Changed 5 years ago by nomeata

Are Core’s coercions expressible enough for this? I don’t think so. So one would have to resort to unsafeCoerce here, which wouldn’t be nice.

Or this can be done at the STG stage, but then we might miss further optimizations.

Tricky.

comment:3 Changed 5 years ago by schyler

Why would the compiler inserting unsafeCoerce be bad? I would have thought that's something the compiler should be able to do a lot of, given we have such a better type system than other languages which can't prove it's safe.

And I don't think you would want to do it at cmm, because this would improve CSE and inlining at the stg level a lot in some programs.

Last edited 5 years ago by schyler (previous) (diff)

comment:4 Changed 5 years ago by nomeata

The type system of Core is actually _better_ than that of Haskell (for some value of better). And having a typed intermediate language has many benefits, e.g. it is easier to spot bugs in the core-to-core transformations.

An example for what happens when you use unsafeCoerce, even internally: GeneralizedNewtypeDeriving used to be implemented this way, and together with Type Families or GADTS it was suddenly possible to crash a Haskell program.

comment:5 Changed 5 years ago by goldfire

The way that we have proved Haskell's type system safe is precisely because Core is proved safe. Haskell itself, from a formal standpoint, is defined solely by its transformation into Core. If the compiler used unsafeCoerce internally, then any claims of safety would be bogus.

What is seems you're asking for is a strange new beast in Core... something of a dependently-typed coercion, that requires a proof that the term it is coercing is built with the constructors whose type is not changing. I don't think this is impossible, but I don't envy the person that does the proof!

This is somewhat disappointing, because I agree that it would be great to have this optimization! Perhaps others see the theory differently...

comment:6 Changed 5 years ago by rwbarton

I think the lens package uses this pattern extensively, or at least something closely related.

In discussion with "napping" on IRC the idea arose to extend Core-level case with the power to generalize the type of its scrutinee. For example, if x :: Either Int String, then (using a syntax inspired by as-patterns, and omitting some noisy details)

case x of _ {
  xLeft@(Left int) -> {- here xLeft :: Either Int b (we don't use it) -}
    Left @ String @ String (show @ Int int)
  xRight@(Right str) -> {- here xRight :: Either a String -}
    xRight @ String
}

The appropriate amount of type generalization would depend on the type variables that appear in the fields of the constructor that was matched, after translating GADT constructors into type equality constraints. For example given the type data X a b where MkX :: b -> X Int b, if we match x :: X Int Char with MkX Char, we cannot generalize the type of x to X a Char, since the translated type of MkX is MkX :: (a ~ Int) => b -> X a b which does mention a on the left-hand side (in a constraint). This is all rather similar to role inference for data types, I imagine, except done on a per-constructor basis.

comment:7 Changed 5 years ago by goldfire

That's an interesting idea, perhaps somewhat saner than dependently-typed coercions. But, proving such a thing type-safe would be quite a bear, I think. It seems to hinge on parametricity, which is not yet proven for Haskell or Core.

comment:8 Changed 5 years ago by schyler

@rwbarton, does that still make it simple for the CSE to eliminate extra branches?

Edit: Ah, I see it does. Good idea!

Last edited 5 years ago by schyler (previous) (diff)

comment:9 Changed 5 years ago by nomeata

(Even more crazy idea: One can even take this further. What about case x of Right str -> Just str. Right and Just can be used interchangeable at runtime (same constructor number, same number of arguments of the same shape). The compiler could create only one constructor for each such shape, and optimize the code above to case x of Right _ -> x. Furthermore we could have representational coercions between datatypes that are α-equal... Although I doubt that the run-time benefit of that will be high, and we’d lose all hopes to implement something like vacuum or ghc-vis.)

comment:10 Changed 5 years ago by schyler

I would be less interested in the cost of the operation as much as I would be interested in the possibility that the CSE could merge multiple branches, which means loops can be tighter. Your idea is absolutely glorious.

Last edited 5 years ago by schyler (previous) (diff)

comment:11 Changed 5 years ago by nomeata

I would be less interested in the cost of the operation as much as I would be interested in the possibility that the CSE could merge multiple branches, which means loops can be tighter. Your idea is absolutely glorious.

If you worry about small resulting code, then doing the transformation in STG (which is untyped) might be feasible.

comment:12 in reply to:  8 Changed 5 years ago by rwbarton

Replying to schyler:

@rwbarton, does that still make it simple for the CSE to eliminate extra branches?

Good question. I don't think it would get CSEd at the Core level, but hopefully it would in a later pass (STG or Cmm).

comment:13 Changed 5 years ago by simonpj

I hate the fact that a type-system question forces us to lose a potential optimisation. But I do not know any well-typed way around this particular one.

I'm absolutely not willing for the compiler to add unsafeCoerces to the code! One of GHC's most unique and distinctive features is its commitment to a solidly-typed intermediate language, and I'm not willing to compromise that property. There is a long tradition in Haskell of doing things the Right Way and using the pain to drive innovation. (Monads, System FC, kind polymorphism, etc etc.)

All that said, STG code is un-typed and I would be quite happy with STG-level optimisations that transformed

  case e of y { 
    ...
    C a b -> C a b
    ...
  }

=====>

  case e of y { 
    ...
    C a b -> y
    ...
  }

and then commoned identical branches. (The above transformation is done in Core too, but not when the types don't match. In STG that doesn't matter.)

Simon

comment:14 Changed 5 years ago by nomeata

Isn’t the first part of that transformation just a case of CSE, just as done on Core? If we do this, we’d probably also want

  case e of y { 
    ...
    C a b -> f (C a b)
    ...
  }

=====>

  case e of y { 
    ...
    C a b -> f y
    ...
  }

comment:15 Changed 5 years ago by simonpj

Indeed.

Simon

comment:16 in reply to:  9 Changed 5 years ago by rwbarton

Replying to nomeata:

(Even more crazy idea: One can even take this further. What about case x of Right str -> Just str. Right and Just can be used interchangeable at runtime (same constructor number, same number of arguments of the same shape). The compiler could create only one constructor for each such shape, and optimize the code above to case x of Right _ -> x. Furthermore we could have representational coercions between datatypes that are α-equal... Although I doubt that the run-time benefit of that will be high, and we’d lose all hopes to implement something like vacuum or ghc-vis.)

This could make toInteger :: Int -> Integer a no-op, for example.

comment:17 Changed 5 years ago by carter

could this be done ethically at the STG or CMM layer if we were able to keep enough information lying around at those layers?

comment:18 in reply to:  17 Changed 5 years ago by simonpj

Replying to carter:

could this be done ethically at the STG or CMM layer if we were able to keep enough information lying around at those layers?

Yes, I think it could; both are much less strongly typed than Core. STG would be much better than CMM. There isn't currently an STG optimisation pass, but you could add one.

Simon

comment:19 Changed 5 years ago by heisenbug

Cc: heisenbug added

comment:20 Changed 4 years ago by rwbarton

Cc: rwbarton added

comment:21 Changed 4 years ago by osa1

Cc: osa1 added

comment:22 Changed 4 years ago by slyfox

Cc: slyfox added

comment:23 Changed 4 years ago by thomie

Type of failure: None/UnknownRuntime performance bug

comment:24 Changed 3 years ago by RyanGlScott

Cc: RyanGlScott added

comment:25 Changed 3 years ago by ekmett

Cc: ekmett added

comment:26 Changed 3 years ago by maoe

Cc: maoe added

comment:27 in reply to:  13 Changed 3 years ago by nomeata

I was considering this problem a bit more today, and my best shot at this so far turned out to be quite similar to what rwbarton already suggested above. Richard thinks this might be interesting to get right (with type safety proofs and all that).

Replying to simonpj:

I hate the fact that a type-system question forces us to lose a potential optimisation.

Me too! Although any solution will likely incur some changes to Core, and that probably needs to be justified by more than the unhappy feeling I have about Core’s type system getting in the way of producing the obviously right code.

Does anyone have a practical example where this bug bytes a lot? Maybe something with lenses, or with generic programs?

It seems to affect, for example, every Functor instance for a sum type where the type parameter is not used in at least one of the branches.

Anyways, I started writing down a possible approach with more detail at CaseBinderGeneralisation. Comments welcome.

comment:28 Changed 3 years ago by simonpj

The wiki page looks plausible... but rather a lot of work for a modest gain.

I wonder if a better approach would be a STG-level CSE pass? (No types there!)

comment:29 in reply to:  28 Changed 3 years ago by goldfire

Replying to simonpj:

The wiki page looks plausible... but rather a lot of work for a modest gain.

As Joachim hinted at above, this is also my stance. There might be some interesting theory here and some fun proofs, but I'm unconvinced that all the heavy lifting will be worth it.

comment:30 Changed 3 years ago by Iceland_jack

Cc: Iceland_jack added

comment:31 Changed 3 years ago by osa1

CSE implementation in STG is one of the things I'm hoping to try when I have some time. I have a branch here that allows STG-to-STG plugins: https://github.com/osa1/ghc/tree/stg_plugins, this may help. We just need a version of TrieMap for STG expressions and then the pass should be trivial.

comment:32 Changed 3 years ago by nomeata

I wonder if a better approach would be a STG-level CSE pass? (No types there!)

Is general CSE on the STG level safe? We would have to pay attention not to CSE two invocations of newSTRef, for example! Consider

foo :: Integer -> Integer
foo n = runST (newSTRef 1 >>= \r -> modifySTRef r (+n)   >> readSTRef r)
      + runST (newSTRef 1 >>= \r -> modifySTRef r (+2*n) >> readSTRef r)

which generated this STG

STGCSE.foo :: GHC.Integer.Type.Integer -> GHC.Integer.Type.Integer
[GblId, Arity=1, Str=<L,U>, Unf=OtherCon []] =
    \r [n_sQX]
        case
            case newMutVar# [STGCSE.foo2 GHC.Prim.realWorld#] of {
              Unit# ipv1_sR0 ->
…
            }
        of
        { Unit# ipv1_sR8 [Occ=Once] ->
              case
                  case newMutVar# [STGCSE.foo2 GHC.Prim.realWorld#] of {
                    Unit# ipv3_sRb ->
…

The two calls to newMutVar# look identical! But if we CSE them up, we change the semantics of the program.

So it would require some form of „sideeffect-freeness“ analysis on STG.

A simple form might be to CSE only constructor applications, and it would be sufficient for this issue. I guess it is good that newMutVar# does not get “inlined” to be just a constructor application at the STG stage. I wonder if there are constructor applications that we nevertheless do not want to CSE.

The problem does not exist in Core, where GHC.Magic.runRW# is not inlined, and the types of newMutVar# do not match up.

comment:33 Changed 3 years ago by simonpj

Ha ha. Excellent point!

comment:34 Changed 3 years ago by nomeata

There is value in working with a typed intermediate language, isn’t it ;-)

comment:35 Changed 3 years ago by osa1

Is this related with typed-ness of the IR? I think if we do some of the transformations we do during coreToStg we would have the same problem in Core too.

I'm wondering if a solution could be as simple as treating state tokens as different from each other when comparing expressions...

comment:36 Changed 3 years ago by nomeata

Is this related with typed-ness of the IR?

Yes! The type system of Core ensures that we cannot get the ST stuff wrong (to some extent. It still requires us to also not inline runRW#, for example).

I think if we do some of the transformations we do during coreToStg we would have the same problem in Core too.

You mean inlining runRW#? Right, we should not do that. I tried that once, and it broke :-)

I'm wondering if a solution could be as simple as treating state tokens as different from each other when comparing expressions...

Sure, such reasoning will be required. Is the only such case?

Also, in this case, CSE’ing the state token isn’t the problem per se (the state token is a zero-width value, so operationally, they are all equal). The real problem is CSE’ing newMutRef# which has a side-effect. So maybe only CSE’ing constructor applications is a safer start.

comment:37 in reply to:  36 Changed 3 years ago by rwbarton

Replying to nomeata:

Is this related with typed-ness of the IR?

Yes! The type system of Core ensures that we cannot get the ST stuff wrong (to some extent. It still requires us to also not inline runRW#, for example).

Is this really true? Say I write

f :: Bool -> IORef Int -> IO ()
f b v = v `seq` when b (writeIORef v 3)

GHC produces

f1 =
  \ b_auQ v_auR eta_B1 ->
    case v_auR `cast` ... of _ { STRef ipv_sPi ->
    case b_auQ of _ {
      False -> (# eta_B1, () #);
      True ->
        case writeMutVar# ipv_sPi f2 eta_B1 of s2#_aQd { __DEFAULT ->
        (# s2#_aQd, () #)
        }
    }
    }

What stops me from rewriting this via (er, not case-of-case but whatever it is called) to the inequivalent

f1 =
  \ b_auQ v_auR eta_B1 ->
    case v_auR `cast` ... of _ { STRef ipv_sPi ->
    case writeMutVar# ipv_sPi f2 eta_B1 of s2#_aQd { __DEFAULT ->
      case b_auQ of _ {
        False -> (# eta_B1, () #);   -- Haha, don't actually use s2#_aQd here.
        True -> (# s2#_aQd, () #)
      }
    }
    }

I think it is just the flag has_side_effects = True on writeMutVar#, and doesn't have to do with types.

Last edited 3 years ago by rwbarton (previous) (diff)

comment:38 Changed 3 years ago by nomeata

Uh. Good point. I guess we’d need a linear type system to prevent your illegal double-use of eta_B1 here :-) (It is not really “case-of-case” that you are doing there, but your point still holds.)

Note that this would be an illegal transformation not only for writeMutVar# but for anything that is not provabley total, even a pure but non-terminating function.

comment:39 Changed 3 years ago by rwbarton

Right, so it's not a very good example. But I think correctness of IO mostly relies on the state threading discipline (which doesn't need types), combined I guess with the need to preserve the semantics of possibly-diverging-but-otherwise-pure programs.

comment:40 Changed 3 years ago by nomeata

Differential Rev(s): Phab:D2871
Owner: set to nomeata

I started working on this (at the STG stage, CSE’ing only constructors, and not across lambdas) at Phab:D2871. Not merge-worthy yet, but preliminary review is welcome.

comment:41 Changed 3 years ago by nomeata

Status: newpatch

Ok, should be ready for review.

comment:42 Changed 3 years ago by Joachim Breitner <mail@…>

In 19d5c73/ghc:

Add a CSE pass to Stg (#9291)

This CSE pass only targets data constructor applications. This is
probably the best we can do, as function calls and primitive operations
might have side-effects.

Introduces the flag -fstg-cse, enabled by default with -O for now. It
might also be a good candiate for -O2.

Differential Revision: https://phabricator.haskell.org/D2871

comment:43 Changed 3 years ago by nomeata

Resolution: fixed
Status: patchclosed

Done. Would still be fun to be able to do this (and stuff like #5344) in Core, but that would require serious work on the type level.

comment:44 in reply to:  42 Changed 3 years ago by heisenbug

Replying to Joachim Breitner <mail@…>:

In 19d5c73/ghc:

Add a CSE pass to Stg (#9291)

This CSE pass only targets data constructor applications. This is
probably the best we can do, as function calls and primitive operations
might have side-effects.

Introduces the flag -fstg-cse, enabled by default with -O for now. It
might also be a good candiate for -O2.

Differential Revision: https://phabricator.haskell.org/D2871

Will you revisit the -O2 inclusion sometime later? /me was always under the impression that all performance-increasing optimisations from -O would be included in -O2.

Differential Revision: https://phabricator.haskell.org/D2871 }}}

comment:45 Changed 3 years ago by nomeata

Will you revisit the -O2 inclusion sometime later? /me was always under the impression that all performance-increasing optimisations from -O would be included in -O2.

Oh, of course it is in -O2! The question is whether it should only be in -O.

comment:46 Changed 3 years ago by heisenbug

Looking at the tests, I do not see Either x y <-> Maybe y coercions. Are these supposed to be done?

comment:47 Changed 3 years ago by nomeata

Looking at the tests, I do not see Either x y <-> Maybe y coercions. Are these supposed to be done?

Are you referring to treating a Just x directly Right x, i.e. using similarly looking constructors at a different type? That is not included here, and it is not clear whether we actually want it done. If you think we do, please open a new ticket for it.

comment:48 in reply to:  47 ; Changed 3 years ago by heisenbug

Replying to nomeata:

Looking at the tests, I do not see Either x y <-> Maybe y coercions. Are these supposed to be done?

Are you referring to treating a Just x directly Right x, i.e. using similarly looking constructors at a different type? That is not included here, and it is not clear whether we actually want it done. If you think we do, please open a new ticket for it.

Yes, that is what I mean. They would be memory-layout (and content) compatible. I'll ponder a bit more, and if I still think it's a good idea, I'll follow up with a ticket. Thanks for doing this!

Last edited 3 years ago by heisenbug (previous) (diff)

comment:49 in reply to:  48 Changed 2 years ago by heisenbug

Replying to heisenbug: I followed up with #13861 on this. Let's see what people think :-)

comment:50 Changed 21 months ago by heisenbug

@nomeata, I have

data Boo = Tru | Fal

quux True = Fal
quux False = Tru
{-# NOINLINE quux #-}

which compiles to STG

quux_rCA :: GHC.Types.Bool -> Main.Boo
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,1*U>, Unf=OtherCon []] =
    sat-only [] \r [ds_s48C] ds_s48C;

I may be mistaken, but does this preserve the original strictness? I looks like a regular lambda to me (non-strict).

The optimization is implemented by mkStgCase.

comment:51 Changed 21 months ago by nomeata

Yes, it preserves the strictness: id = \x. x is strict in x , since id ⊥ = ⊥.

Note: See TracTickets for help on using tickets.