Opened 2 years ago

Closed 11 months ago

#14152 closed task (fixed)

Float exit paths out of recursive functions

Reported by: nomeata Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.2.1
Keywords: JoinPoints Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #14137 #10918 Differential Rev(s):
Wiki Page:

Description (last modified by nomeata)

This is a spin off of a discussion from #14137.

The problem

We generally avoid inlining or floating a binding into a recursive function, because we do not want to duplicat work/allocations.

But sometimes the binding is only used inside the recursive function on the “exit path”. In which case it would be good to inline. Example:

  let x = f x0
  in let go 10 = h x
         go i = go (i+1)
     in go (0::Int) + 100 

It would be beneficial to inline x.

The problem is that it is not very obvious that this occurence of x is ok for inlining. In particular, it is not syntactically visible.

Proposed solution

If we apply loopification (#14068), this would turn into

  let x = f x0
  in let go n =
       joinrec jgo 10 = h x
               jgo i = call jgo (i+1)
       in jump jgo n
     in go (0::Int) + 100 

I’d like to call this first joinrec normal form, defined as “every recursive function where all recursive calls are tail-recursive is a recursive join point”.

This ticket proposes to transform this even further and float out all case alternatives that do not mention jgo as non-recursive join-points, as so:

  let x = f x0
  in let go n =
       join jexit = h x
       joinrec jgo 10 = call jexit
               jgo i = call jgo (i+1)
       in jump jgo n
     in go (0::Int) + 100 

I’d like to call this second joinrec normal form, defined as “in first joinrec normal form, and all subexpressions of a recursive join point j that are in tail-call position and do not mention j are join calls”.

If the floated expression has free variables that are bound inside the joinrec, they turn into parameters of the newly created joinpoint.

At this point, GHC can tell that go is called at most once, and will therefore happily inline x into the right hand side of `jexit.

Alternative solutions

Ticket #10918 uses Call Arity results to learn that x is one-Shot, and inline it even in the original code. This works, but the problem is that float-out will undo this. See 10918#comment:5.

Limitation

It only works for recursive functions that are join points, or can be turned into join points by loopification (#14068). It does not work forexample for

let x = f x0
let go 0 = h x
    go n = (go (n-1) + 1
in go 10

although it would be equally desirable to float h x out of go so that x can be inlined.

Preservation

A remaining tricky point is that we need to stop one of these carefully-constructed non-recursive join points being inlined into a recursive join point, even if it is invoked at just one place. That should not be hard. And in a final run of the simplifer (or in CorePrep) we could switch off that restriction and let it inline. (Ticket #14137 is about inlining more join points into recursive join points, so it is the antithesis to the present ticket.)

Change History (45)

comment:1 Changed 2 years ago by simonpj

Description: modified (diff)
Keywords: JoinPoints added

comment:2 Changed 2 years ago by nomeata

I am wondering about the best place to perform this transformation. I could do it relatively easily as part of loopification (#14068), which I currently inserted between the occurrence analyser and the simplifier. But we also want to do it to joinrecs that are already joinrecs?

The float out phase might also be a suitable place, since it – well – floats out expressions.

Or maybe the simplifier itself (which has a notions of floating expressions, doesn’t it) can do it?

Or should it be a pass of its own?

Simon, what is your intuition here?

comment:3 Changed 2 years ago by simonpj

Definitely not in the simplifier.

FloatOut is already jolly complicated and I want to make it do NO FLOATING for join points. (There is currently a mess.)

I'd suggest doing it as part of loopification.

And I'd suggest NOT doing loopification with every occurrence analysis; that feels far too often to me.

comment:4 Changed 2 years ago by nomeata

And I'd suggest NOT doing loopification with every occurrence analysis; that feels far too often to me.

The check of whether loopification can be done is pretty cheap, and falls out of Occurence Analysis anyways: I just check whether the occurrence info indicates that this is a recursive function that admits loopification:

    RecursiveTailCalled join_arity <- tailCallInfo occ

So don’t see a reason to check this every time. Once a binding is loopified, it is a recursive join point, and loopification does not look at it any more.

comment:5 Changed 2 years ago by nomeata

Note to myself:

Common up identical exit paths

I expect much code to look like this:

let x = f x0
joinrec go 0 = h x
        go 10 = h x
        go 30 = if something then h x     else call go (n-1)
        go 40 = if something then h (x+1) else call go (n-1)
        go n = call go (n-1)
in call go 10

where a recursive function checks a bunch of conditions and has the same exit code multiple times. Ih would be silly to create three joint points with h x as the right hand side. So when floating out these exit expressions, I plan to check if the same expression is already being floated out and use that join point, to get

let x = f x0
join exit1 = h x
join exit2 = h (x+1)
joinrec go 0 = call exit1
        go 10 = call exit1
        go 30 = if something then call exit1 else call go (n-1)
        go 40 = if something then call exit2 else call go (n-1)
        go n = call go (n-1)
in call go 10

comment:6 Changed 2 years ago by simonpj

same exit code multiple times

Or you could just rely on CSE to do this; it's set up to do precisely that. I think it works correctly for join points. (Can you check the latter point?)

comment:7 Changed 2 years ago by simonpj

The check of whether loopification can be done is pretty cheap, and falls out of Occurence Analysis anyways

Hmm. I just want to avoid an extra traversal on every single iteration. I suppose you could just do it as part of occurrence analysis? The occurrence analyser doesn't generally change the structure of the code; but it does do so in one situation; it drops dead code to improve the occurrence info on other things.

comment:8 Changed 2 years ago by nomeata

I implemented this transformation; pushed to wip/T14152 for storage. Don’t review the code too harshly yet, in particular feeding a list of fresh uniques to maybeLoopify is a temporary crutch.

Observations so far:

  • Right now I am running this only after loopification. But that is unsatisfactory:

    If we want this, we certainly also want it for functions that are joinrec to start with… but where?

    Note that we need access to fresh uniques; this rules out the occurrence analyzer which is currently pure.
  • Currently, we get the desired transformation, but the next iteration of the simplifier gets rid of it again, in three different ways:
    • preInlineUnconditionally inlines them because they are values, so canInlineInLam is true.
    • postInlineConditionally inlines them because they are marked as “work free”.
    • completeCall does inlining at the call-site, again because they are marked as “work free”.
    It is not clear to me how to fix it. It is not false that they are work-free, and we certainly want to inline these join-points into non-recursive uses (should they for some reason appear).

I agree with Simon that floating these things out and keeping them out feels “wrong” in the sense that it fights against the usual working of the simplifier.

So maybe back to the drawing board and thinking harder how we can tell the simplifier that it is safe to inline something into the exit path inside the recursive function, and keep it there? Could we somehow syntactically mark exit paths using a new form of ticks?

comment:9 Changed 2 years ago by simonpj

I agree with Simon that floating these things out and keeping them out feels “wrong” in the sense that it fights against the usual working of the simplifier.

Well, now I understand better, I do quite like this approach. I'm not ready to give it up yet!

  • preInlineUnconditionally. Yes caonInlineInLam is true, but int_cxt (short for "interesting context") is probably false.
  • completeCall: I think we may be saying "never inline a join point that appears on the RHS of a recursive join point". We can make that happen by giving its unfolding an UnfoldingGuidance of UnfNever. We can spot that it apperas on the RHS of a recursive join point because its occ-info will have occ_in_lam = True.
  • postInlineUnconditionally. As the comments say, the main goal here is to solve this
            --      let x = f y in
            --      case v of
            --         True  -> case x of ...
            --         False -> case x of ...
    
    Here we want to inline x, otherwise it allocates a thunk. But join points do not allocate thunks; perhaps we can simply not do postInlineUnconditionally on the OneOcc branch?

And in fact giving it an UnfoldingGuidance of UnfNever will also stop it inlining.

NB: right at the end (in CorePrep perhaps) we may want to inline them after all, just to reduce clutter and jumps.

comment:10 Changed 2 years ago by nomeata

I do quite like this approach. I'm not ready to give it up yet!

I like this game of bad-cop/good-cop with switching roles. Certainly a productive way of investigating an issue :-)

  • preInlineUnconditionally: Well, int_cxt certainly is True – otherwise int_cxt && canInlineInLam rhs would be False and inlining would not happen. Are you saying that it should be made False?
  • completeCall: Let me try to make sure I understand the reasoning: Normally a join point will not have occ_in_lam, because if it would occur under a “normal” lambda, it wouldn’t be a tail-call. The exception are the lambdas on the RHS of a join points, as these are ignored when calculating occ_in_lam. But there is an exception to that: Inside a recursive join point we do set occ_in_lam.

So the occurrence analyzer should detect this? I’ll try.

NB: right at the end (in CorePrep perhaps) we may want to inline them after all, just to reduce clutter and jumps.

Right, I keep that in the back of my mind, but first I need it to actually _not_ inline :-)

comment:11 Changed 2 years ago by simonpj

Are you saying that it should be made False?

Ah... no, for functions int_cxt is true if it is applied.

Perhaps for join-points preInlineUnconditionally should never be True if in_lam is true. That validates what we are trying here.

But there is an exception to that: Inside a recursive join point we do set occ_in_lam.

Yes: occ_in_lam is true for a join point precisely when that join point is the exit of a recursive loop! That's just what we want.

comment:12 Changed 2 years ago by nomeata

Thanks for the guidance, that nailed it:

  • The code isJoinId bndr, isOneOcc (idOccInfo bndr), occ_in_lam (idOccInfo bndr) reliably detects an exit join point now, it seems. (I guess this deserves its own function isExitJoinPoint)
  • When I create the join point, I add the correct OccInfo, so that this is there even before OccurAnal runs the next time.
  • In simplLetUnfolding, when I detect an exit join point, I do not creating an unfolding. This prevents unfolding in completeCall and postInlineUnconditionally.
  • In preInlineUnconditionally I check for exit join points explicityly.

Maybe it would be cleaner to leave the unfolding alone and simply check for isExitJoinPoint in all three places… but no, the simplifier seems to throw away the idOccInfo. So better stick to not creating an unfolding.

comment:13 Changed 2 years ago by simonpj

the simplifier seems to throw away the idOccInfo.

That's by design. The occ-info after simplification isn't necessarily valid. E.g.

x = e
f = \y. ...x...
y = ...(f 1)...(f 2)...

Currently x occurs once, but after inlining f twice, x occurs twice. So the simplifier carefullly deletes it. (Except for robust info like "this lambda binder is dead".)

comment:14 Changed 2 years ago by nomeata

Ok, so wip/T14152 has an implementation that works (judging from looking at the Core output). It’s not polished, but good enough to play around to see what happens. It’s also inefficient (it should probably work on an fv-annotated syntax tree instead of recomputing the free variables repeatedly.

But it still applies “exitification” only together with “loopification”. This makes it hard to evaluate the merits of this ticket independently of loopification. Also, if we want “exitification”, then we want it for all joinrecs.

Is there a better place to do it? Should it simply be a pass on its own that we run maybe after the first simplifier (which introduces most joinrecs?)?

comment:15 Changed 2 years ago by simonpj

You could make two separate passes. Or one that (controlled by a flag) does either or both of loopification and exitification.

Of course if it does both, it'd better do exitification to prior joinrecs and to the ones introduced by loopification

My guess is that we'll want to do both at once, and do so once or at most twice in the pipeline.

comment:16 Changed 2 years ago by nomeata

Note that loopification is not really a pass: The information of whether we can do it falls out of the simplifier, and the change to the AST is a simple local change and O(1).

I will turn exitification into a pass of its own.

comment:17 Changed 2 years ago by nomeata

Wow, the results are great!

Nofib allocations
Benchmark name 			previous 	change 		now 	
nofib/allocs/fannkuch-redux 	 870976696 	- 99.99% 	   64728 	bytes
nofib/allocs/k-nucleotide 	1089567552 	- 91.74% 	89963760 	bytes

Nofib runtimes
Benchmark name 			previous 	change 		now 	
nofib/time/CSD 			0.536 		- 6.72% 	0.5 		seconds
nofib/time/FS 			0.413 		- 3.15% 	0.4 		seconds
nofib/time/cryptarithm1 	0.529 		- 4.91% 	0.503 		seconds
nofib/time/integer 		1.562 		+ 3.01% 	1.609 		seconds
nofib/time/last-piece 		0.425 		- 3.06% 	0.412 		seconds

Almost too good to be true. I wonder where the bug is… :-)

comment:18 Changed 2 years ago by nomeata

Inlining the exit join points in the "final" simplifier iteration is not a clear win:

Nofib runtimes
Benchmark name 			previous 	change 		now 	
nofib/time/VS		 	0.372	 	+ 16.13% 	0.432 	seconds
nofib/time/cryptarithm1 	0.503 		+ 4.17% 	0.524 	seconds
nofib/time/k-nucleotide 	5.558 		- 7.72% 	5.129 	seconds

my theory is that floating exit paths out of tight loops makes tight loops very small which is beneficial for *mumble mumble* weird hardware reasons.

(I know, someone needs to look at the code instead of speculating…)

comment:19 Changed 2 years ago by simonpj

Wow that's great. I would like to understand what's going on in comment:18. Inlining should simply remove a jump -- and the Cmm passes may well, I believe, do that anyway. As you say, actual insight is needed here!

comment:20 Changed 2 years ago by nomeata

As you say, actual insight is needed here!

Sigh. It seems that VS has identical core. So the difference must be in the libraries, or be completely spurious. But tracking down that requires two new freshly built checkouts and more time, and it might just be spurious…

comment:21 Changed 2 years ago by nomeata

Some of these measurements seem to be unreliably, in particular time/VS.

Right now the whole branch, including inlining them back in the end, yields this:

Nofib allocations
Benchmark name 			previous 	change 		now 	
nofib/allocs/fannkuch-redux 	870976696 	- 99.99% 	64700		bytes
nofib/allocs/k-nucleotide 	1089567552 	- 91.78% 	89603872 	bytes

Nofib runtimes
Benchmark name 			previous 	change 		now 	
nofib/time/CSD 			0.536 		- 8.02% 	0.493 	seconds
nofib/time/cryptarithm1 	0.529 		- 4.73% 	0.504 	seconds
nofib/time/digits-of-e1 	0.704 		- 3.27% 	0.681 	seconds
nofib/time/fannkuch-redux 	4.401 		+ 3.84% 	4.57 	seconds
nofib/time/integer 		1.562 		+ 3.59% 	1.618 	seconds
nofib/time/k-nucleotide 	5.426 		- 6.28% 	5.085 	seconds

The effect of just the inlining patch is

Nofib runtimes
Benchmark name 			previous 	change 		now 	
nofib/time/VS 			0.442 		- 17.19% 	0.366 	seconds
nofib/time/digits-of-e1 	0.704 		- 3.27% 	0.681 	seconds
nofib/time/k-nucleotide 	5.585 		- 8.95% 	5.085 	seconds

That the inlining patch has not changed over the version in comment:18, it was just rebased onto a slightly improved exitification patch, yet the runtime of VS now improves by 17% rather than regress by 16%. I draw this conclusion: There is some sensitivity to layout in SD that seems to be tickled by my changes in a non-systematic way. I might have a closer look at the regressions (and gains), but not sure if ICFP is the right time to do that.

comment:22 Changed 2 years ago by nomeata

Looking at some of the performance changes observed. The two benchmarks with allocation changes show differences in Core, as expected. (Thunks previously bound outside a joinrec are now inlined.)

About those runtime changes: All of these have unchanged Core! So the change must be in the libraries… Did not dig deeper yet.

comment:23 Changed 2 years ago by nomeata

Digging deeper: For CSD, -ticky, even with all of base instrumted, does not not show anything. Same for cryptarithm1.

Judging from the module sizes, in base only Data.Semigroup, GHC.Arr and GHC.Float are affected (but maybe floating a let binding can happen with equally sized output binary.). A quick look at the core difference shows that some thunks are floated inside a joinrec, or even inlined completely, e.g. in Arr’s (//) and accum.

None of these look as if they’d affect a benchmark like CSD.

It must be the weather, or the constellation of the stars.

comment:24 Changed 2 years ago by nomeata

Status: newpatch

Anyways, this is ready for review, either on https://phabricator.haskell.org/D3903 or in git, branch wip/T14152. The latter has the advantage that the basic patch can be inspected independent of the on-the-fly-CSE addition (which maybe be nice due to smaller Core, but may not be worth the slight complication of the code).

comment:25 Changed 22 months ago by nomeata

Description: modified (diff)

comment:26 Changed 22 months ago by Joachim Breitner <mail@…>

In 0e953da1/ghc:

Implement a dedicated exitfication pass #14152

The idea is described in #14152, and can be summarized: Float the exit
path out of a joinrec, so that the simplifier can do more with it.
See the test case for a nice example.

The floating goes against what the simplifier usually does, hence we
need to be careful not inline them back.

The position of exitification in the pipeline was chosen after a small
amount of experimentation, but may need to be improved. For example,
exitification can allow rewrite rules to fire, but for that it would
have to happen before the `simpl_phases`.

Perf.haskell.org reports these nice performance wins:

    Nofib allocations
    fannkuch-redux    78446640  - 99.92%      64560
    k-nucleotide     109466384  - 91.32%    9502040
    simple            72424696  -  5.96%   68109560

    Nofib instruction counts
    fannkuch-redux  1744331636  -  3.86% 1676999519
    k-nucleotide    2318221965  -  6.30% 2172067260
    scs             1978470869  -  3.35% 1912263779
    simple           669858104  -  3.38%  647206739
    spectral-norm    186423292  -  5.37%  176411536

Differential Revision: https://phabricator.haskell.org/D3903

comment:27 Changed 22 months ago by simonpj

Great! Is the final "inline them back in" pass there, or not?

comment:28 Changed 22 months ago by nomeata

I left it out for now, because instruction counts increase. I tried to analyse it, and here is what I found:

I read through a lot of core, comparing -dverbose-core2core between the version without inlining exit join points at the end, and the version with. In compress2, I don’t spot a difference in Core. In fem (a big program), there are a fair number of exit join points, but I don’t see anything fishy going on…

What I do, how ever, see is exit join points that are called from two positions in the code, where inlining duplicates the code. (Common block elimination might help with that, but #14226 seems to get in the way.)

I also see calls to stg_gc_noregs() turn into calls to stg_gc_unpt_r1(R1), but I am not sure what that means, or if that is a good thing or a bad thing.

I expect that the “useless” unconditional jump to a non-inlined exit join point, which was the motivation for the final inlining, will be eliminated on the Cmm stage without much ado.

So the performance measurement says “don't inline”, and looking at the Core, not inlining seems to be fine, and the Cmm also looks better. So in the light of that I think I’ll conclude that the final inlining is neither necessary nor useful.

Anyways, the final inline patch still lives in wip/T14152 for anyone to play with.

comment:29 Changed 22 months ago by simonpj

I left it out for now, because instruction counts increase

So, just to be sure: allocation counts did not change?

What I do, how ever, see is exit join points that are called from two positions in the code, where inlining duplicates the code.

What if you inline only join points that are called once? So that no code duplication is involved.

I'm having a hard time seeing why that should do anything bad. And it can do something good... if you have

join j x = ...y...
in 
  ...jump j v...

then y will get pushed onto the stack at the join, so that its RHS knows where to find it. If it's inlined that may not happen. So may be more than just eliminating a jump.

Still, I accept that it's not top priority.

comment:30 in reply to:  29 Changed 22 months ago by nomeata

Replying to simonpj:

I left it out for now, because instruction counts increase

So, just to be sure: allocation counts did not change?

Nope. Byte-for-byte identical.

What I do, how ever, see is exit join points that are called from two positions in the code, where inlining duplicates the code.

What if you inline only join points that are called once? So that no code duplication is involved.

I doubt that the code duplication is the real culprit here. More likely things like different order of blocks leading to different decision later in the code generation, e.g. in the register allocator, leading to less or more register pressure and different registers being saved? Do you know the difference between stg_gc_noregs() and stg_gc_unpt_r1(R1)?

I'm having a hard time seeing why that should do anything bad. And it can do something good... if you have

join j x = ...y...
in 
  ...jump j v...

then y will get pushed onto the stack at the join, so that its RHS knows where to find it. If it's inlined that may not happen. So may be more than just eliminating a jump.

Well, according to the numbers it's not happening.

Remember that the jump to an exit point is almost always going to the right-hand-side of a case alternative. Which is, as far I as I can tell, already a jump target, so the y in your example would have to be pushed on the stack just the same, woudn’t it?

I understand that this is all a bit unsatisfying intellectually, but I don’t want to sink more time into this, at least not without concrete examples or other evidence that show that what we are doing now is indeed bad.

comment:31 Changed 22 months ago by simonpj

I understand that this is all a bit unsatisfying intellectually, but I don’t want to sink more time into this, at least not without concrete examples or other evidence that show that what we are doing now is indeed bad.

Yes that's fine. If we have 100 Joachim cycles I'd rather they were spent on the fattest target, and this doesn't look like one.

Back to loopification next?

comment:32 Changed 22 months ago by nomeata

Back to loopification next?

Aye!

comment:33 Changed 17 months ago by simonpj

Joachim, I've just been looking at Exitify.hs in pursuit of #15005. It's not easy going.

Could you add some careful Notes? For example, here are the things I stumbled over immediately:

  • In the top-level exitifyProgram I think we make no attempt to exitify top level recursive functions. That's fine, but say so.
  • Then in go of exitifyProgram I think we are looking at the RHS of a top-level function, and applying goBind to every binding within it. (NB: goBind calls go to deal with the RHS of the bindings; I'd be inlined do that in go so that goBind gets the post-exitify bindings.)
  • Mysteriously we do not exitify the RHS of a case alternative (goAlts). This exception seems highly mysterious, possibly a bug?
  • Then for recursive join-point bindings, we apply exitify to them (after having recursively processed the RHSs).
  • There is some magic in go in exitify to pick out the ones we want to exitify. Explaining these conditions in a note would be good

comment:34 Changed 17 months ago by simonpj

isExitJoinId :: Var -> Bool
isExitJoinId id = isJoinId id && isOneOcc (idOccInfo id) && occ_in_lam (idOccInfo id)

Isn't the isOneOcc redundant? All join-point should have OneOcc. Maybe it'd be clearer to say

isExitJoinId id 
  | isJoinId id
  = case idOccInfo id of
      IAmDead -> False
      IAmALoopBreaker {} -> False  -- A joinrec is never an exit join-ids
      OneOcc { occ_in_lam = in_lam } -> in_lam
      ManyOccs {} -> pprPanic "isExitJoinId" id
  | otherwise
  = False

Also I'd put this function in SimplUtils or something. Occurrence info is only valid on InIds, so it's crucial that isExitJoinId is only called on freshly occ-analysed code. It's not a generic function you can call anywhere.

comment:35 Changed 17 months ago by simonpj

What's this (in Exitify):

  where
    exit_id_tmpl = mkSysLocal (fsLit "exit") initExitJoinUnique ty
                    `asJoinId` join_arity
                    `setIdOccInfo` exit_occ_info

    -- See Note [Do not inline exit join points]
    exit_occ_info =
        OneOcc { occ_in_lam = True
               , occ_one_br = True
               , occ_int_cxt = False
               , occ_tail = AlwaysTailCalled join_arity }

The occurrence analyser sets occ-info. No one else should. What happens if you leave this out?

comment:36 Changed 17 months ago by simonpj

Also in Exitify:

        -- We cannot abstract over join points
        captures_join_points = any isJoinId args

Isn't this vacuously false? Join points are never arguments -- or they would not be join points!

Ah, no... args is (confusingly) not the argument of the call (although you use args for this purpose in the collectArgs call...aargh. Can we rename args to abs_vars, the variables over which we are going to abstract?

Ah! You probably need to topologically sort those abs_vars; see the sortQuantVars call in SetLevels.abstractVars. A bug waiting to happen.

Returning to captures_join_points this can only be true of a call of form j e1 .. en, which you deal with in an earlier case... so the test would better be done there.

comment:37 Changed 17 months ago by simonpj

Consider

joinrec j x y z = case x of
                   True -> <big>
                   False -> ...j...

where <big> does not mention j. It looks to me that you don't even attempt to turn the whole of <big> into an exit point. Instead you recurse into it. But why don't you? The bigger the better!

join exit y = <big> in
joinrec j x y z = case x of
                    True -> exit y
                    False -> ...j...

comment:38 Changed 17 months ago by nomeata

Wow, thanks for the thorough code review. I am not sure how I go that goAlts thing wrong. That ought to cripple the analysis significantly! Maybe part of some late refactoring before the merge? Anyways, I will go through all your comments in detail and implement what I can, maybe this afternoon.

comment:39 Changed 17 months ago by nomeata

In the top-level exitifyProgram I think we make no attempt to exitify top level recursive functions. That's fine, but say so.

There are no top-level join points. Note that exitification does not make new join points.

But I added a one-line comment to remind of this fact.

Then in go of exitifyProgram I think we are looking at the RHS of a top-level function, and applying goBind to every binding within it. (NB: goBind calls go to deal with the RHS of the bindings; I'd be inlined do that in go so that goBind gets the post-exitify bindings.)

Not sure I follow the NB. Isn’t it the common idiom to have a goExpr (here go), and a goBind that are mutually recursive?

Mysteriously we do not exitify the RHS of a case alternative (goAlts). This exception seems highly mysterious, possibly a bug?

Not mysterious at all: I exitify joint points, i.e. recursive binders. The RHS of a case alternative is just an expression – nothing to exitify there. And goAlts calls go just fine…

I am actually not sure what is confusing about exitifProgram. Its comment says

Traverses the AST, simply to find all joinrecs and call 'exitify' on them.

and it does precisely that.

There is some magic in go in exitify to pick out the ones we want to exitify. Explaining these conditions in a note would be good

Every of the three conditions has a comment, and the first two point to notes…

Maybe it'd be clearer to say … I'd put this function in SimplUtils or something.

done

The occurrence analyser sets occ-info. No one else should. What happens if you leave this out?

If I leave this out, then until the next run of the occ analyzer, isExitJoinId will not detect an exit join point as such. From your clarification about that function I conclude that it shall only be used by the simplifier, which runs the occ analyzer before, so it looks like I can leave this out. Will try.

Can we rename args to abs_vars, the variables over which we are going to abstract?

Of course, sorry for that.

You probably need to topologically sort those abs_vars

Thanks, done.

Returning to captures_join_points this can only be true of a call of form j e1 .. en

I don’t think that’s true. What if I exitify something like

  case b of True -> jump j 1; False -> jump j 2

This is not of this form, and I need to check whether I can actually abstract over this?

It looks to me that you don't even attempt to turn the whole of <big> into an exit point. Instead you recurse into it. But why don't you?

Sure I do: The case

        -- We have something to float out!

applies to all expressions. Only when we decide not to float e do we actually destruct it and recurse into it.

Last edited 17 months ago by nomeata (previous) (diff)

comment:40 Changed 17 months ago by nomeata

All join-point should have OneOcc

Not true, it seems, as witnessed by testsuite/tests/codeGen/should_compile/jmp_tbl.hs (already with -O, so without exitification). Your proposed code for isExitJoinPoint panics, complaining about $j_s3v8:

./inplace/bin/ghc-stage2 -dcore-lint testsuite/**/jmp_tbl.hs -O -dverbose-core2core -ddump-simpl-iterations -fforce-recomp
…
      join {
        $j_s3v8
          :: (# GHC.Prim.State# GHC.Prim.RealWorld, (PipeState, b_a2YY) #)
        [LclId[JoinId(0)]]
        $j_s3v8 = case lvl_s3te of wild_00 { } } in
      join {
        $j_s3ve
          :: (# GHC.Prim.State# GHC.Prim.RealWorld, (PipeState, b_a2YY) #)
        [LclId[JoinId(0)]]
        $j_s3ve
          = join {
              $j_s3vc
                :: (# GHC.Prim.State# GHC.Prim.RealWorld, (PipeState, b_a2YY) #)
              [LclId[JoinId(0)]]
              $j_s3vc
                = join {
                    $j_s3va
                      :: (# GHC.Prim.State# GHC.Prim.RealWorld, (PipeState, b_a2YY) #)
                    [LclId[JoinId(0)]]
                    $j_s3va = case phase_X1s7 of { __DEFAULT -> jump $j_s3v7 } } in
                  case phase_X1s7 of {
                    __DEFAULT -> jump $j_s3va;
                    HCc -> jump $j_s3v8
                  } } in
            case phase_X1s7 of {
              __DEFAULT -> jump $j_s3vc;
              Ccpp -> jump $j_s3v8
            } } in
      case phase_X1s7 of {
        __DEFAULT -> jump $j_s3ve;
        Cc -> jump $j_s3v8
Last edited 17 months ago by nomeata (previous) (diff)

comment:41 Changed 17 months ago by nomeata

Those changes that I were able to do so far are now in https://phabricator.haskell.org/D4576 I will push this as soon as Harbormaster has confirmed that it does not regress anything.

comment:42 Changed 17 months ago by nomeata

Ah! You probably need to topologically sort those abs_vars; see the sortQuantVars call in SetLevels.abstractVars. A bug waiting to happen.

While sleeping over it, I was wondering: Did I really get that wrong? And I checked: No, I did not! The abs_vars are derived from captured, and that is always in dependency order by construction. So no need for sortQuantVars.

So I reverted the (unnecessary and hence misleading) use of sortQuantVars in 270e3e9bbaabad3d9a1348cea9e46a9ecf1e5ec2, but left a comment that should prevent me from forgetting this again.

comment:43 Changed 16 months ago by simonpj

All join-point should have OneOcc

Aha. They should, but indeed they don't. I've created Trac #15091 to fix this

comment:44 Changed 16 months ago by Simon Peyton Jones <simonpj@…>

In 61b245a0/ghc:

Small refactoring in Exitify

This refactoring was provoked by our conversation on
Trac #14152.  No change in behaviour.

comment:45 Changed 11 months ago by simonpj

Resolution: fixed
Status: patchclosed

Joachim and I think we are all done here... closing.

Note: See TracTickets for help on using tickets.