Opened 4 years ago

Last modified 4 years ago

#11475 new bug

Lint should check for inexhaustive alternatives

Reported by: simonpj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.10.3
Keywords: Cc: anselm.scholl@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by simonpj)

GHC's simplifier prunes inaccessible alternatives from case expressions. E.g.

case x of
  A y -> e1
  _   -> ....(case x of
                A y -> r1
                B -> r2
                C -> r3) ...

It'll prune the A alternative from the inner case to get

case x of
  A y -> e1
  _   -> ....(case x of
                B -> r2
                C -> r3) ...

BUT, there is no independent check from Lint that this pruning does the Right Thing. Yet, lacking such a check, a program can pass Lint and then seg-fault, which is Very Bad.

Trac #11172 is an example. It turned out to be a bug in the pruning, which led to something like

case x of
  Left y -> e1

but in fact x ended up being bound to (Right v) for some value v. But because there is only a single alternative left, GHC does not test the tag, but instead goes ahead and accesses it the Right value as if it was a Left value. And because of pointer-tagging, it'll assume the wrong tag on the pointer, and very soon you are in seg-fault land.

So this ticket is to suggest: make Core Lint do it's best to check that all cases are in fact exhaustive. There are two ways in which we prune cases:

  • Enclosing pattern matches (as above)
  • GADT pattern matches may be exhaustive even when they don't mention all constructors.

For the latter, it's impossible for Lint to reproduce all the reasoning of the type checker, but I bet that in 99.5% of the cases the existing function dataConCantMatch will do the job.

So if Lit

  • Maintains an environment saying what each variable can't match (as the simplifier does)
  • Does a simple-minded dataConCantMatch test based on the typ of the scrutinee

I think we'd be a long way forward. It's possible that a valid program could trigger a Lint report, but let's see.

Any volunteers?

Change History (6)

comment:1 Changed 4 years ago by simonpj

Description: modified (diff)

comment:2 Changed 4 years ago by simonpj

Transferred from ghc-defs list I am currently giving #11475 a try. However I hit a few problems and wanted to ask for a bit of advice about how to proceed.

I already can handle the following:

case x of
  A y -> e1
  _   -> ....(case x of
                B -> r2
                C -> r3) ...

My first problem then was the following construct:

case (case ... of ... -> Left x) of
  Left y -> ...

Or more simple:

case Left x of
  Left y -> ...

So I have to check which constructors the scrutinee may return. I know the simplifier has to implement such logic somewhere but did not want to use the implementation for two reasons:

  • I didn't know where to find it
  • If GHC uses the same test to prune impossible alternatives and to check if they were pruned correctly, extending the linter is useless. A bug in the test code will cause also cause a bug in the linter.

So I implemented such a test myself, but now I am stuck at something like this:

foo a b = a : b

bar = ... case foo a b of
  (:) x y -> ...

I cannot see through foo, so I assume it may return (:) and []. The simplifier on the other hand is able to see through foo and did remove the [] case, causing my current implementation to throw an incorrect lint error.

How can I get a list/set of all possible AltCons returned by a function/constant? Or a list of the impossible ones? I tried to get some information from the unfolding of a variable, but it seems like unfoldings are only attached to top-level names.

Jonas

comment:3 Changed 4 years ago by simonpj

I suggest being less ambitious! Do the simple things first.

case (case ... of ... -> Left x) of
   Left y -> ...

GHC itself is unlikely to prune the alternative in the outer case, precisely because it's not so obvious that the inner case can only return Left.

  bar = ... case foo a b of
    (:) x y -> ...

Same here.

In short, dealing with in-scope pattern matches and GADTs is worth doing first.

Then when you get a false positive (which you say you are) we can think about them.

(For foo, the CprResult part of the strictness signature will tell that foo only returns (:) but I'm a bit surprised that GHC actually uses this to prune the case alternatives.

comment:4 Changed 4 years ago by simonpj

Cc: anselm.scholl@… added

comment:5 Changed 4 years ago by jscholl

Using the CRPResult did indeed help, I can now compile a few more modules. However, the next false positive is a little bit more complicated as there is no evidence I could use attached to it as far as I can tell.

Suppose there is an expression

foo x = case x of
    A -> ...
    B -> ...
    y -> error $ "foo" ++ (case y of
        C -> "C"
        D -> "D")

This is fine and works as expected. But now GHC decides to float out the expression, yielding:

lvl y = error $ "foo" ++ (case y of
    C -> "C"
    D -> "D")

foo x = case x of
    A -> ...
    B -> ...
    y -> lvl y

I don't see where GHC would record that y can only match C and D. I think it doesn't do it anywhere, most likely because no one needed that information until today.

A possible solution might be to look for unexported local functions which are non-exhaustive and record the set of allowed constructors for their arguments. All calls to such a function would then be required to only included the correct possible constructors. Or is there a better way?

comment:6 Changed 4 years ago by simonpj

Ah yes, that is tricky. I'm really not sure how far it is profitable to go here. That is, when does keeping Lint happy become burdensome?

But let me suggest a possible approach for this particular case.

  • Ids have Unfoldings inside them. One such is OtherCon [AltCon] (see CoreSyn) which says "this Id does not match any of these constructors".
  • Since they are Ids even lambda-bound variables may have unfoldings. I'm in two minds about whether this is a good idea, but you could give the \y in the defn of lvl above a OtherCon[A,B] unfolding. Then you wouldn't get a Lint error from the case in lvl.
  • But in exchange you'd have a new thing to check: that lvl was indeed applied to arguments that were not A or B. So I'm not sure we are any further forward.

Maybe this can only be done with a better flow analysis... a "super-Lint" if you like.

Note: See TracTickets for help on using tickets.