Opened 4 years ago

Last modified 3 years ago

#10417 new bug

Rule matching not "seeing through" floating and type lambda (and maybe cast)

Reported by: duncan Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.10.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by simonpj)

The nub of the problem looks like this. We have a RULE:

{-# RULES "Sequence/Sequence"
    forall a b c. Sequence (Sequence a b) c
                = Sequence a (Sequence b c)
  #-}

and some core code that looks like:

failing_example =
  \ @ s_azo ->
    Sequence ((failing_example3) `cast` ...) (...)

failing_example3 =
  \ @ s_azo ->
    Sequence (...) (...)

We would hope that the rule would match here but clearly it does not. There's obviously a few things in the way of a straightforward match:

  1. the key subterm is floated out
  2. there's a \ @ -> type lambda in the way in the subterm
  3. there's a cast in the way in the subterm

It appears that the problem is not the cast but the combination of 1 & 2. A reduced example shows the problem occurs with just the first two. That said, the real code would also rely on it working with the cast.


The rest of this is the code needed to construct the example. The code is also attached to the ticket.

{-# LANGUAGE GADTs, RankNTypes, TypeOperators, ScopedTypeVariables,
             TypeFamilies, KindSignatures, DataKinds, UndecidableInstances #-}

module RulesExample (working_example2) where

import Prelude hiding (pure, (<*>))

data Decoder s s' where

  -- explicit encoding of sequencing

  Done     :: Decoder s s
  Sequence :: Decoder a b -> Decoder b c -> Decoder a c

  -- primitive operation in "chained" style

  ConsumeWord :: Decoder (Word :*: s) s' -> Decoder s s'
  AlterStack  :: (a -> b) -> Decoder b s' -> Decoder a s'

data a :*: b = a :*: b
infixr 5 :*:

We compose actions using the explicit encoding of sequencing

a >>> b = Sequence a b

Aside: in the real thing we go via the category class:

        instance Category Decoder where
          id    = Done
          a . b = Sequence b a
          {-# INLINE (.) #-}

The Category module defines >>> as flip (.)

Now our primitives are in the chained style, and we turn them into the explicit sequencing style by chaining them with Done:

consumeWord :: Decoder s (Word :*: s)
consumeWord = ConsumeWord Done

alter :: (a -> b) -> Decoder a b
alter f = AlterStack f Done

example of explicit sequencing:

consumeWord >>> consumeWord >>> ...

but naively this will give us things like:

ConsumeWord Done `Sequence` ConsumeWord Done `Sequence` ...

and we can interpret this much faster if it's in the "chained" style

ConsumeWord (ConsumeWord (... ) )

Aside: there are good reasons not to just try and construct this chained style in the first place. We can't do it everywhere, and where it would fail we would get bad code. Rather we want to start with something ok and optimise it locally.

So, we use RULES to reassociate things and eliminate the Sequence and Done where possible:

{-# RULES "Sequence/Done"
    forall sa. Sequence Done sa = sa
  #-}

{-# RULES "Sequence/Sequence"
    forall a b c. Sequence (Sequence a b) c
                = Sequence a (Sequence b c)
  #-}

{-# RULES "Sequence/ConsumeWord"
    forall sa sa'. Sequence (ConsumeWord sa) sa'
                 = ConsumeWord (Sequence sa sa')
  #-}

{-# RULES "Sequence/AlterStack"
    forall f sa sa'. Sequence (AlterStack f sa) sa'
                   = AlterStack f (Sequence sa sa')
  #-}

Now for starters, ghc produces warnings for all three RULES like:

Rule "Sequence/Done" may never fire
      because ‘RulesExample.Sequence’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘RulesExample.Sequence’

but it's currently syntactically invalid to write such a pragma:

{-# INLINE [0] Done #-}
{-# INLINE [0] Sequence #-}

and also, the warning isn't wholly correct, the rule can fire:

working_example = consumeWord >>> consumeWord
              >>> alter (\(a :*: b :*: s) -> (a,b) :*: s)

actually, this isn't yet quite enough because we cannot persuade ghc to inline consumeWord, because it's just a static composition of constructors, not even if we hit it with the INLINE pragma:

{-# INLINE consumeWord #-}

we have to resort to the bigger hammer of an inline rule:

{-# RULES "inline consumeWord" consumeWord = ConsumeWord Done #-}

now this is good, we get exactly the core that we want:

$WDone = \ @ s_anC -> Done @~ <s_anC>_N

working_example3 =
  \ @ b_aw2 ds_dAY ->
    case ds_dAY of _ { :*: a_aoc ds1_dAZ ->
    case ds1_dAZ of _ { :*: b1_aod s_aoe -> :*: (a_aoc, b1_aod) s_aoe }
    }

working_example2 =
  \ @ b_aw2 -> AlterStack (working_example3) ($WDone)

working_example1 = \ @ b_aw2 -> ConsumeWord (working_example2)

working_example = \ @ b_aw2 -> ConsumeWord (working_example1)

This is perfect. Each constructor is statically allocated and refers to the next one in the chain. This allows the interpreter to walk over this without triggering any allocation, and the chain style means that in the usual case it doesn't have to do control flow call/return work for the Sequence/Done.

Note already that we may have an issue with rule matching on Done, since it does get changed later into $WDone. So it could actually be useful to be able to write:

{-# INLINE [0] Done #-}

Ok, so, what doesn't work...

Floating things out is not itself a problem, this still works fine, the rules fire. So apparently the \ @ b_aw2 -> type abstraction isn't a problem for the rule matching.

second_example = consumeWord >>> part2

part2 = consumeWord >>> part3

part3 = alter (\(a :*: b :*: s) -> (a,b) :*: s)

However the following fails to work. This example is actually extracted from the core of a real example that doesn't work.

artificial_example = Sequence failing_example3 failing_example1
  where
    failing_example3 = Sequence failing_example5 failing_example4
    failing_example5 = Sequence Done failing_example6
    failing_example6 = ConsumeWord Done
    failing_example4 = ConsumeWord Done
    failing_example1 = AlterStack failing_example2 Done
    failing_example2 (a :*: b :*: s) = (a,b) :*: s

here's a specific example of a rule not matching:

ailing_example3 =
  \ @ a_ax3 ->
    Sequence
      (failing_example4) (failing_example4)

failing_example4 =
  \ @ s_awp -> ConsumeWord ($WDone)

This should match the rule

    Sequence (ConsumeWord sa) sa'
  = ConsumeWord (Sequence sa sa')

but apparently does not. The combo of the floating out and the type lambda seem to get in the way. Note that in this example ghc has also CSE'd the failing_example4 with failing_example5 and so failing_example4 is used twice.

But if that's the problem here, there also equivalent instances of the Sequence/Sequence rule not matching. Also, ConsumeWord is certainly CONLIKE so we'd hope even if ghc had already CSEd that it'd allow the duplication for the rule matching. In fact looking at the simpl iterations I think the rule would have gotten a chance to fire before the CSE occured.

In the real example that the above was extracted from we also have casts in the way.

Unfortunately we need more infrastructure to illustrate the full example. I don't think the details of this infrastructure is important, except that it ends up giving us cast in the core.

We're building an abstraction on top of Decoder to let us use Applicative style.

data Fun a = Id
           | Cons a
           | O (Fun a) (Fun a)

-- this requires UndecidableInstances
type family Apply (f :: Fun *) (a :: *) :: *
type instance Apply Id        a = a
type instance Apply (Cons  x) a = x :*: a
type instance Apply (O f g)   a = Apply f (Apply g a)

we pair up stack changes and their inverses

data DecoderA0 s (f :: Fun *) a
   = DecoderA0  (Decoder s (Apply f s))  -- change the stack from   s -> f s
                (Apply f s -> (s, a))    -- change stack back from  f s -> s


{-# INLINE pure0 #-}
pure0 :: forall a s. a -> DecoderA0 s Id a
pure0 a = DecoderA0 Done (\s -> (s, a))

{-# INLINE app0 #-}
app0 :: DecoderA0 s f (a -> b) -> DecoderA0 (Apply f s) g a -> DecoderA0 s (g `O` f) b
app0 (DecoderA0 d f) (DecoderA0 d' x) =
    DecoderA0 (d >>> d') appStack
  where
    -- just flipped <*> for state monad
    appStack = \s -> case x s of
                       (s', x') -> case f s' of
                                     (s'', f') -> (s'', f' x')

{-# INLINE close0 #-}
close0 :: forall s f a. DecoderA0 s f a -> Decoder s (a :*: s)
close0 (DecoderA0 d se) = d >>> alter se'
  where
    se' :: Apply f s -> (a :*: s)
    se' s = case se s of (s', a) -> a :*: s'

now we can abstract over the stack change, because it always takes us back to the same initial stack state

data DecoderA a where
  DecoderA :: (forall s. DecoderA0 s f a) ->  DecoderA a

in the real thing we use Functor and Applicative instances, but we only need bits

pure a                    = DecoderA (pure0 a)
DecoderA f <*> DecoderA x = DecoderA (app0 f x)

infixl 4 <*>

now conversions functions:

{-# INLINE embedDecoderA #-}
embedDecoderA :: DecoderA a -> Decoder s (a :*: s)
embedDecoderA (DecoderA ap) = close0 ap

{-# INLINE[1] embedDecoder #-}
embedDecoder :: forall a. (forall s. Decoder s (a :*: s)) -> DecoderA a
embedDecoder dec =
    DecoderA popResultAp
  where
    popResultAp :: DecoderA0 s (Cons a) a
    popResultAp = DecoderA0 dec (\(a :*: s) -> (s, a))

ok, we can finally build our example...

failing_example =
    embedDecoderA $
      pure (,) <*> embedDecoder consumeWord
               <*> embedDecoder consumeWord

So all the applicative stuff gets inlined away, but we end up with this core:

$WDone = \ @ s_anC -> Done @~ <s_anC>_N

failing_example6 = \ @ s_azo -> ConsumeWord ($WDone)

failing_example4 = \ @ s_azo -> ConsumeWord ($WDone)

failing_example5 =
  \ @ s_azo ->
    Sequence (($WDone) `cast` ...) ((failing_example6) `cast` ...)

failing_example3 =
  \ @ s_azo ->
    Sequence
      ((failing_example5) `cast` ...) ((failing_example4) `cast` ...)

failing_example2 =
  \ @ s_azo s1_aox ->
    case s1_aox `cast` ... of _ { :*: a_aoH s2_aoI ->
    case s2_aoI `cast` ... of _ { :*: a1_Xpa s3_Xpc ->
    :*: (a1_Xpa, a_aoH) (s3_Xpc `cast` ...)
    }
    }

failing_example1 =
  \ @ s_azo -> AlterStack (failing_example2) ($WDone)

failing_example =
  \ @ s_azo ->
    Sequence ((failing_example3) `cast` ...) (failing_example1)

We have two examples of RULES not mathing where we would have hoped.

Firstly, we have:

Sequence (($WDone) `cast` ...) (...)

We would of course hope that the rule Sequence Done sa = sa would match here

It's unclear if this is due to the $WDone or the cast. Certainly the conversion from Done to $WDone can be worked around by using a function

{-# INLINE [0] done #-}
done = Done

and changing the code and rules to use that.

So the main problem is the one mentioned at the top of the ticket:

failing_example =
  \ @ s_azo ->
    Sequence ((failing_example3) `cast` ...) (...)

failing_example3 =
  \ @ s_azo ->
    Sequence (...) (...)

We would hope that this rule would match

  Sequence (Sequence a b) c = Sequence a (Sequence b c)

So this is the same issue as in the artificial example but now additionally there's a cast in the way in the subterm.

Attachments (1)

RulesExample.hs (9.8 KB) - added by duncan 4 years ago.
full code example

Download all attachments as: .zip

Change History (6)

Changed 4 years ago by duncan

Attachment: RulesExample.hs added

full code example

comment:1 Changed 4 years ago by duncan

The issue with the rule/constructor warning is in a separate ticket #10418 for better tracking.

comment:2 Changed 4 years ago by simonpj

Description: modified (diff)

comment:3 Changed 4 years ago by thomie

Type of failure: None/UnknownRuntime performance bug

comment:4 Changed 3 years ago by nomeata

JFTR: Still present in 8.0.1.

comment:5 Changed 3 years ago by nomeata

From looking hard at these examples, I would say it “just” floating out getting into the way for applying rules, which is a general problem. At least for the artificial_example. Hard to predict if the casts would get in the way if the float out would not prevent the rule from firing in the simple case, but one step at a time.

Note: See TracTickets for help on using tickets.