Opened 3 years ago

Last modified 9 months ago

#12364 new task

Demand analysis for sum types

Reported by: nomeata Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.0.1
Keywords: DemandAnalysis CPRAnalysis Cc: ilya, osa1
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

While working on #10613 it crossed my mind that it might be worthwhile to expand the demand type also onto sum types.

So instead of

  | UProd [ArgUse]     -- Product

we would have

  | UData [[ArgUse]]   -- Data type

and a function like

fromMaybe :: a -> Maybe a -> b

would have a signature of

<1*U><1*U(;1*U)>

which indicates that the second argument of fromMaybe is evaluated at most once; the first constructor of the result has no arguments, the second (separated by ;) has one argument which is also used at most once.

I could imagine that this gives a lot of useful information with parsers and other code that repeatedly retuns stuff wrapped in a Maybe or similar type.

Now, sum types are often recursive ([]…), and we probably want to be smarter about recursion here. But note that this is not a new problem. If you write

data Stream = Stream Int Stream Stream Stream

foo (Stream 0 x y z) = 0
foo (Stream 1 x y z) = foo x
foo (Stream 2 x y z) = foo y
foo (Stream _ x y z) = foo z

you already get huge demand signatures.

Change History (13)

comment:1 Changed 3 years ago by nomeata

Or maybe simply UData [ArgUse] and then have a flat list of all the constructor’s arguments. Implementation detail.

comment:2 Changed 3 years ago by simonpj

Cc: osa1 added

comment:3 Changed 3 years ago by nomeata

Work in progress on branch wip/T12354, just a quick experiment that might be able to tell us if this is worth pursuing.

But it is possibly less useful than I thought:

let foo a b = ... either return Left foobar .... or return Right foobaz
in ... case foo a b of Left x [Demand=1*U] -> ... ; Right y [Demand=1*U] ->

Because the demand analyzer is a backwards analysis, the information on the usage of foo does not propagate into the definition of foo and into foobar and foobaz.

For functions returning a product, a product demand is simply assumed [Product demands for function body] (but even there, not with single-use information). So I doubt that this is a direction worth pursuing.

comment:4 Changed 3 years ago by simonpj

Expanding demand analysis for sums will get info for

f :: Maybe Int -> Int
f Nothing = 3
f (Just x) = x + 1

Currently if a call site looks like f (Just (p+q)) we'll build a thunk for p+q. But f always evaluates it, so that thunk is useless; we could use call-by-value.

When we have unboxed sums, we can do worker/wrapper for f too.

In contrast, you seem to be thinking about the benefits for nested CPR perhaps, in comment:3

comment:5 Changed 3 years ago by nomeata

Currently if a call site looks like f (Just (p+q)) we'll build a thunk for p+q. But f always evaluates it, so that thunk is useless; we could use call-by-value.

Actually inlining would require strictness analysis, wouldn’t it? Extending that to sums might work, but I was looking into demand analysis right now. There, we would be able to turn the thunk into a non-updateable one.

I’m sure I was not confusing it with CPR, but I guess in comment:3 I was thinking about the cases where Just (p+q) is not passed to a function f, but rather returned from a function. In that case, there’d be no useful demand to feed into Just (p+q).

Last edited 3 years ago by nomeata (previous) (diff)

comment:6 Changed 3 years ago by nomeata

osa1 mentioned that they actually have a working implementation of this in https://github.com/osa1/ghc/blob/rebase-second-try/compiler/basicTypes/Demand.hs#L188, so I’ll stop working on my copy. My code is still in wip/T12364, but probably not much use.

comment:7 Changed 3 years ago by osa1

Yes, we implemented it with @jmct and it worked fine in our tests. Only problem is: We don't have a recursive type check, so for lists etc. it takes really long time (because it iterates 20 times as that's the hard-coded hard-limit for demand analysis iterations, specified in DmdAnal.hs).

We also implemented CPR if you need that...

comment:8 Changed 3 years ago by osa1

Our branch is big because we do worker/wrapper for sums, CPR, unpacking sum fields in DataCons etc. If you only need demand analysis I suggest just copying Demand.hs and DmdAnal.hs. You may then need to fix worker/wrapper pass so that it won't crash when it sees a sum demand or cpr etc.

comment:9 Changed 3 years ago by jmct

Hey everyone, I hope it's okay if I chime in for a second. I was also originally writting this comment on #12368 but I think it's a bit more relevant here.

Because of what Simon says on ticket:12368#comment:1, fixpoint convergence should not be used to perform strictness analysis on recursive sum types. I could probably dig up some examples of where it produces incorrect results if you'd like.

The standard way of handling recursive types is to ensure they're regular types, which allows you to assume a 'uniform' demand on the type. The frustrating thing when trying to apply this to Haskell is polymorphic recursion which would allow for non-uniform demands on a recursive type.

For the unboxing of sum-types we wanted to get around this by checking whether we were analysing a recursive sum type, but that turns out to be difficult in GHC at the moment: https://mail.haskell.org/pipermail/ghc-devs/2016-March/011526.html However, that only works for us because we only want to unbox non-recursive things anyway.

If you definitely want to analyse recursive types then some new theory is going to have to be worked out. The unexplored parts are strictness on nested types and the higher-kinded polymorphism possibly allowing for introduction of loops. I've thought a bit about how to do it but haven't had a serious go at it.

During my thesis defense Prof. Mycoft suggested the problem might be solvable using a PER-based analysis, which has been shown to generalize projection-based analyses. I haven't looked at that approach yet, but I did scribble down a particular thesis he told me to read.

For some background, Hinze's thesis is (IMO) the best this topic. I've read and re-read his thesis several times in the past few years and I still get new insight from it each time. You can find it on his site here: http://www.cs.ox.ac.uk/ralf.hinze/publications/#D2

comment:10 Changed 3 years ago by nomeata

I could probably dig up some examples of where it produces incorrect results if you'd like.

Isn’t it like that similar example will work for recursive product types, and hence be (likely obscure and rare, but still real) bugs in the current code?

Version 0, edited 3 years ago by nomeata (next)

comment:11 Changed 3 years ago by jmct

Possibly, but GHC's much better about detecting recursiveness for product types, see the large section titled "Deciding which type constructors are recursive" in compiler/typecheck/TcTyDecls.hs. In particular there's this section:

The "recursive" flag for algebraic data types is irrelevant (never consulted)
for types with more than one constructor.


An algebraic data type M.T is "recursive" iff
        it has just one constructor, and
        (a) it is declared in an hi-boot file (see RdrHsSyn.hsIfaceDecl)
        (b) it is declared in a source file, but that source file has a
            companion hi-boot file which declares the type
 or     (c) one can get from its arg types to T via type synonyms,
            or by non-recursive newtypes or non-recursive product types in M
             e.g.  data T = MkT (T -> Int) Bool

I'm going to speculate wildly and say that it just hasn't been that important for GHC to detect recursiveness in sum types in the past, and therefore it just hasn't gotten very much attention. There probably shouldn't be a difference in theory, but GHC as it is today is better at avoiding recursiveness in product types and therefore would be better at avoiding the issue in that case. Though you're right in principle.

Tomorrow morning I'll dig up my old notes and comment on #12368 giving an example. I don't want to confuse the two issues too much, though they're intimately related. #12368 is really about recursion in the _functions_ though, which is an important distinction (and I definitely have examples where the current behavior of the 'cunning plan' is necessary).

comment:12 Changed 3 years ago by nomeata

I was about to say “But this flag is not checked anywhere in the demand analyzer” (because it is) and did some git archeology, and I found commit changeset:3e7e5ba8333d318c38b4cfc538a97fdca0aed5b1/ghc which replaced the use of isRecursiveTyCon with one using RecTcChecker.

From my rough reading is that we look one level deep on every iteration, so we still get deeply nested result. I was able to produce an unsound result this way, but it belongs to #12368, so I added it there as a comment.

comment:13 Changed 9 months ago by sgraf

Keywords: DemandAnalysis CPRAnalysis added
Note: See TracTickets for help on using tickets.