Opened 18 months ago

Last modified 18 months ago

#14826 new feature request

Flatten data types extending other data types in STG

Reported by: nomeata Owned by:
Priority: low Milestone:
Component: Compiler Version: 8.5
Keywords: Cc: sgraf
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

This idea was triggered by https://mail.haskell.org/pipermail/haskell-cafe/2018-February/128570.html although it does not solve that particular case. Hence I don’t know if there is any practical use for it, but I wanted to jot it down.

Consider a data type

data Result = Ok !Foo | NotOK Error

where Foo is a concrete algebraic data type with $n$ constructors.

Then the following transformation should be safe:

  • Do not generate an info table for Ok
  • Give the NotOK the constructor number $n+1$.
  • Replace calls to Ok with id
  • Replace
    case r as b of Of f    -> E[b,f] | NotOk e -> E[b,e]
    
    with
    case r as b of DEFAULT -> E[b,b] | NotOk e -> E[b,e]
    

This effectively makes every constructor or Foo a constructor of Ok, and eliminates the pointer indirection introduced by Foo. Checking if a Result is Ok is now a simple check of the pointer tag (if Foo and Result do not have too many constructors).

Note that Result could have additional constructors, but only one can be eliminated. This one constructor needs to

  • have one argument
  • be strict in that argument
  • that argument must be an algebraic data type (even after this flattening) does not have NotOk as a constructor.

We currently cannot do this in Core, but we could if our safe coercions were directed.

Do you think you have a case where this could give a performance boost? Maybe some parser library? You can try this now using pattern synonyms and unsafeCoerce. If you think there is something to be gained here, then we can consider implementing this.

Change History (7)

comment:1 Changed 18 months ago by nomeata

This might be useful to be able to do

data IntSet = NonEmpty !NonEmptyIntSet | Tip
data NonEmptyIntSet = Bin … | Leaf …

and get more expressive types than the current

data IntSet = Bin … | Leaf … | Tip

without having to pay a performance penalty for it.

Or even

data [a] = Nil | NonEmpty !(NEList a)
data (NEList a) = Cons a [a]

(no, I am not proposing to change the Prelude :-))

comment:2 Changed 18 months ago by sgraf

As I understand your proposal, this will re-use info pointers. Is that right? If so, is that a requirement?

I imagine we could encode something more general by lifting any data constructor consisting purely of an unboxed sum of arity n to n distinct constructor tags. That way, we can drop the 'only once' requirement.

Well, I guess this still involves non-id coercions... Still, we could encode the 'only once' thing, where we re-use constructor tags, as a special case to get erasable coercions.

comment:3 Changed 18 months ago by nomeata

One might be able to state this in terms of unboxed sums, yes.

That way, we can drop the 'only once' requirement. Well, I guess this still involves non-id coercions...

You mean the requirement that only one Foo can be flattened into a Result? Yes, if you do this with multiple ones you either get non-id coercions (which party defeat the purpose). Or, of all types are defined in the same module, you can maybe to a smart constructor tag allocation scheme where Foo gets tags 1…3, Bar gets tags 4…6, and data FooBar = Foo !Foo | Bar !Bar can now be flattened simply to a pointer to a Foo or a Bar value.

But let’s focus on the simple case first that does not require a global constructor pointer allocation pass :-)

comment:4 Changed 18 months ago by nomeata

comment:5 Changed 18 months ago by simonpj

Interesting.

Consider what happens today with the Haskell source

data T = MkT {-# UNPACK #-} !Int Bool

f (MkT n b) = MkT (n+1) b

We get the following Core

data T = MkT Int# Bool

$WMkT :: Int -> Bool -> T
$WMkT n = case n of I# n# -> MkT n# b

f x = case x of
        MkT n# b -> let n = I# n#
                    in $WMkT (n+1) b

That is

  • The "real" Core data constructor MkT has an Int# field.
  • The wrapper $WMkT is an ordinary function that unboxes the field and calls the "real" data constructor
  • Pattern matching is desugared to rebox the field

You could do the same thing here. With your example source

data Foo = A Int | B Bool | C
data Result = Ok !Foo | NotOK Error

f :: Result -> Int
f (Ok x) = wimwam x True x
f (NotOk _) = 0

you might desugar to

data Result = Ok_A Int | OK_B Bool | OK_C C | NotOK Error

f r = join j x = wimwam x True x
      in case r of
         OK_A n  -> jump j (A n)
         OK_B b  -> jump j (B b)
         OK_C    -> jump j C
         NOtOK _ -> 0

Note that I am NOT re-using the data constructor for Foo (which would be tricky and confusing). I'm simply generating new ones.


I think this would be do-able. It uses basically the same concepts as now, but generalises a bit. I wonder if we could leverage pattern synonyms rather than have more built-in stuff.


A concern: it could blow up

data T = MkT !(Maybe Int) !(Maybe Int)

Do we generate four constructors? In general a multiplicative number? You suggest just one field, but it'd be a shame of your perf went down the tubes because you added an innocuous field.

And actually two or more is fine, provide only one expands to a multiplicity:

data T = MkT !Char !(Maybe Int) Bool

is absolutely ok, desugaring to

data T = MkT_Just Char# Int Bool
       | MkT_Nothing Char# Bool

comment:6 Changed 18 months ago by nomeata

Note that I am NOT re-using the data constructor for Foo (which would be tricky and confusing).

Why not? In fact, with your desguaring, and #13861, we’d get that:

data Result = Ok_A Int | OK_B Bool | OK_C C | NotOK Error

f r = join j x = wimwam x True x
      in case r as b  of
         OK_A n  -> jump j b
         OK_B b  -> jump j b
         OK_C    -> jump j b
         NOtOK _ -> 0

I wonder if we could leverage pattern synonyms rather than have more built-in stuff.

Yes, your encoding is all doable with pattern synonyms already:

data Foo = A Int | B Bool | C
-- we want: data Result = Ok !Foo | NotOK Error
data Result = Ok_A Int | OK_B Bool | OK_C | NotOK Error

pattern Ok :: Foo -> Result
pattern Ok f <- ((\case Ok_A i -> Just (A i); OK_B b -> Just (B b); OK_C -> Just C; _ ->          Nothing) -> Just f)
  where Ok = \case A i -> Ok_A i; B b -> OK_B b; C -> OK_C

or, if people feel bold (and until #13861 does this automatically)

pattern Ok :: Foo -> Result
pattern Ok f <- ((\case NotOK _ -> Nothing; foo -> Just (unsafeCoerce foo)) -> Just f)
  where Ok = unsafeCoerce

So since this is somewhat nicely possible, it makes sense for people to play around with either of these encodings using pattern synonyms, and let us know if they notice any gains.

comment:7 Changed 18 months ago by sgraf

Cc: sgraf added
Note: See TracTickets for help on using tickets.