Opened 12 months ago

Closed 11 months ago

Last modified 8 months ago

#15627 closed task (fixed)

Absent unlifted bindings

Reported by: sgraf Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.4.3
Keywords: DemandAnalysis Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: stranal/should_compile/T15627
Blocked By: Blocking:
Related Tickets: #9279 #4328 #11126 Differential Rev(s): Phab:D5153
Wiki Page:

Description (last modified by simonpj)

Transferred from ticket:9279#comment:20:

For a long time, the worker/wrapper splitter has given up on absent arguments of certain unlifted types: see Literal.absentLiteralOf and Note [Absent errors] in WwLib. This is very annoying because it means that we get left with functions that take a bunch of arguments they do not use, as in this ticket (#9279).

For lifted types T we build an absent value as a thunk of form

  aBSENT_ERROR_ID @T "Used absent value"

This does two things

  1. It gives us something, of the right type, to use in place of the value

we aren't passing any more.

  1. It gives an extra sanity check: if that value is ever used (a compiler

bug) we'll get a runtime error message.

For unlifted types we don't have thunks, so we can't do this. As you can see in absentLiteralOf, for some types we just make up a silly value: e.g. for Char# we use 'x#'; for Int# we use 0#.

Note, however that

  • Substituting a particular value serves purpose (A) but not purpose (B). A compiler bug would go undetected. This is sad: e.g. #11126 is a real bug that was detected by (B). But I see no way out.
  • It doesn't work for Array#, MutVar#, TVar# etc because we have no available literal values of those types.

So Sebastian is suggesting that we add a new literal value -- call it a rubbish value -- which can work for any (unlifted type), extending Literal something like this

data Literal = ...
  | RubbishLit Type

We need to store the type so we can still do literalType.

Now

  • Maybe we could get rid of MachNullAddr in favour of this new literal.
  • I think -- but I am not sure -- that this literal should never occur in code generation. For example, we should never pass a rubbish value to a function. Before then dead-code elimination should have got rid of it I'm not 100% certain, but if this was true, it'd be a great sanity check.
  • Yes, Literal has Eq and Ord -- but I'm not sure why. Try removing them and seeing what happens! (Generally I think it'd be better to define eqLit and cmpLit and cal them, rather than use == and >; so much easier to grep for!

And in fact, we do have eqType and cmpType.

  • Do we need to spit out a RubbishLit in the Binary instance. This seems more likely, because perhaps these rubbish values can occur in unfoldings, which are serialised as their parse tree. But the we can just serialise the Type. It won't happen much.

Change History (15)

comment:1 Changed 12 months ago by simonpj

Description: modified (diff)

comment:2 Changed 12 months ago by sgraf

I just remember posting on ticket:11126#comment:17. While having the absentError mechanism around clearly is a sanity check, why only crash when it's entered? That's far too late to be a useful mechanism to debug the reason it crashed!

So B) is just saying "If we messed up, at least crash with a marginally more descriptive error". Which isn't bad, but not nearly enough to debug this kind of crash across module boundaries.

I think -- but I am not sure -- that this literal should never occur in code generation. For example, we should never pass a rubbish value to a function. Before then dead-code elimination should have got rid of it I'm not 100% certain, but if this was true, it'd be a great sanity check.

I'm thinking the same thing. If DCE didn't get rid of it, the demand analyser probably didn't agree with the occurrence analyser (who I presume is the final authority here), which is a bug that should be caught early to detect cross module symptoms like #11126 early.

Yes, Literal has Eq and Ord -- but I'm not sure why

Actually, the very problems I had occurred in cmpLit, to which both seem to delegate. Regardless, I removed them. Let's try to see how far I get.

Do we need to spit out a RubbishLit in the Binary instance. This seems more likely, because perhaps these rubbish values can occur in unfoldings, which are serialised as their parse tree. But the we can just serialise the Type. It won't happen much.

I'd like this, but there is no Binary instance for Type. I'm pretty much stuck here.

I can see a hacky alternative here, namely to give RubbishLit the levity polymorphic forall (r :: RuntimeRep) (a :: TYPE r) type. Which is an unsafe lie again, because we only actually allow AddrRep and UnliftedRep. But this would allow to move the type application out of the literal.

Or, looking at https://hackage.haskell.org/package/ghc-8.4.3/docs/IfaceType.html#t:IfaceType, maybe serialise that instead?

Or add a new type IfaceLiteral to https://hackage.haskell.org/package/ghc-8.4.3/docs/IfaceSyn.html?

comment:3 Changed 12 months ago by simonpj

Ah I see. The problem is that we are using Literal in IfaceSyn, and that is biting us here; it would be more consistent to have IfaceLiteral. If we did we could have

data IfaceLiteral
 = ...
 | RubbishLit IfaceType
 | LitNumber LitNumType Integer   -- No Type here; we reconstruct it in tcIfaceLit
                                  -- Avoids the smelly error-thunk in the Binary get
 ...

It's annoying that Literal and IfaceLiteral would be almost the same; some boilerplate converting to and fro. But it's simple boilerplate, and uniform with everything else. And it avoids that nasty error thunk in the binary instance of Literal.

I suppose that a polymorphic literal as you suggest would be OK. You could avoid the levity polymorphism by retaining MachNullAddr.

I'm not sure which I prefer. The IfaceLiteral story is less clever, and thus perhaps preferable.

comment:4 Changed 12 months ago by sgraf

We need the Eq instance of Literal when deriving Eq AltCon. That's OK, we have eqType for that. However, there's also an instance Ord AltCon that is a little shady (sic):

-- This instance is a bit shady. It can only be used to compare AltCons for
-- a single type constructor. Fortunately, it seems quite unlikely that we'll
-- ever need to compare AltCons for different type constructors.
-- The instance adheres to the order described in [CoreSyn case invariants]
instance Ord AltCon where
  compare (DataAlt con1) (DataAlt con2) =
    ASSERT( dataConTyCon con1 == dataConTyCon con2 )
    compare (dataConTag con1) (dataConTag con2)
  compare (DataAlt _) _ = GT
  compare _ (DataAlt _) = LT
  compare (LitAlt l1) (LitAlt l2) = compare l1 l2
  compare (LitAlt _) DEFAULT = GT
  compare DEFAULT DEFAULT = EQ
  compare DEFAULT _ = LT

I will not try and remove that instance, I think it's probably needed somewhere. Questions:

  1. How would I implement cmpType, which I'd need for a faithful cmpLit function?
  2. In the likely case the answer to 1) is "don't", then does sound ignoring the MachNull type in comparisons OK? E.g. cmpType (MachNull _) (MachNull_) = EQ. Makes Ord AltCon a little more shady than it already is, but still seems in line with the invariant "only use on AltCons of same type constructor". Also, cmpType and eqType might then disagree.

comment:5 Changed 12 months ago by sgraf

Going with 2., we again get stuck in CoreMap, which requires an Ord Literal instance and seems exactly like the reason that we have it in the first place.

Question is: Would CoreMap still be OK with 2.)? E.g., can we guarantee that we never put things of different types in the same CoreMap?

comment:6 Changed 12 months ago by simonpj

I will not try and remove that instance, I think it's probably needed somewhere

But I'd like to know where! I'd remove it and replace with calls to cmpAltCon. We can't begin to answer the question about whether an imprecise comparison is ok without knowing where it is used.

comment:7 Changed 12 months ago by simonpj

Going with 2., we again get stuck in CoreMap, which requires an Ord Literal instance

Not at all! It just requires you to build a LiteralMap just like CoreMap, CoercionMap, etc in CoreMap. Using an Ord instance is very much a short-cut.

But that's a pain -- more boilerplate. So that pushes me a bit more towards the polymorphic literal solution that you propose (retaining MachNullAddr as I suggested above. Would you like to try that?

(Incidentally that will mean that you couldn't put a RubbishLit in a LitCon (because there is no facility for type application in a LitCon. But that's ok: we don't want to check for equality with rubbish values!)

comment:8 Changed 12 months ago by sgraf

Edit: Ignore this, I was just confused. literalType (RubbishLit mutVarPrimTyCon) == forall s a. MutVar# s a seems perfectly fine.

There's something weird about this polymorphic literal thing. Assuming we are talking about something like | RubbishLit TyCon, take the constructor RubbishLit mutVarPrimTyCon as an example: It's a value living in (an alleged type of) higher kind Type -> Type -> TYPE UnliftedRep.

I'm not sure this works for literalType?!

Last edited 12 months ago by sgraf (previous) (diff)

comment:9 Changed 12 months ago by simonpj

No -- don't put a TyCon in RubbishLit. Instead:

data Literal = RubbishAddr   -- Replaces MachNullAddr
             | RubbishLit 
             | ...as before...

literalType RubbishAddr = Addr#
literalType RubbishLit  = forall (a :: TYPE UnliftedRep). a

So now, the core term

(Literal RubbishLit) `App` (Type (TyConApp MutVar# [Int]))

We can't use (RubbishLit @Addr#) for RubbishAddr because Addr# :: TYPE AddrRep, which is ill-kinded.

Actually, I suppose that some C APIs might want the null addr (zero, I think?) specifically, rather than "any old rubbish value", so maybe we want to retain MachNullAddr rather than replace it with RubbishAddr.

Yet another variant would be to give RubbishLit the type forall r (a::TYPE r). a. Operationally, we simply don't initialise RubbishLit values; whereas using 0 for Int# and 'x' for Char# still forces us to initialise.

I'd be inclined to stick with the non-levity-polymorphic version initially.

comment:10 Changed 12 months ago by sgraf

Ah, of course! Much simpler. I think we have to stick with MachNullAddr, because that's how nullAddr# https://hackage.haskell.org/package/base-4.11.1.0/docs/GHC-Exts.html#v:nullAddr-35- is defined.

comment:11 Changed 12 months ago by sgraf

Differential Rev(s): Phab:D5153
Status: newpatch

comment:12 Changed 11 months ago by Krzysztof Gogolewski <krz.gogolewski@…>

In 448b77b9/ghc:

Add RubbishLit for absent bindings of UnliftedRep

Summary:
Trac #9279 reminded us that the worker wrapper transformation copes
really badly with absent unlifted boxed bindings.

As `Note [Absent errors]` in WwLib.hs points out, we can't just use
`absentError` for unlifted bindings because there is no bottom to hide
the error in.
So instead, we synthesise a new `RubbishLit` of type
`forall (a :: TYPE 'UnliftedRep). a`, which code-gen may subsitute for
any boxed value. We choose `()`, so that there is a good chance that
the program crashes instead instead of leading to corrupt data, should
absence analysis have been too optimistic (#11126).

Reviewers: simonpj, hvr, goldfire, bgamari, simonmar

Reviewed By: simonpj

Subscribers: osa1, rwbarton, carter

GHC Trac Issues: #15627, #9279, #4306, #11126

Differential Revision: https://phabricator.haskell.org/D5153

comment:13 Changed 11 months ago by monoidal

Resolution: fixed
Status: patchclosed

comment:14 Changed 11 months ago by simonpj

Test Case: stranal/should_compile/T15627

comment:15 Changed 8 months ago by simonpj

Keywords: DemandAnalysis added
Note: See TracTickets for help on using tickets.