Opened 20 months ago

Last modified 9 months ago

#14684 new bug

combineIdenticalAlts is only partially implemented

Reported by: mpickering Owned by: sjakobi
Priority: normal Milestone: 8.10.1
Component: Compiler Version: 8.2.2
Keywords: newcomer, CSE Cc: AndreasK, dfeuer
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D4542
Wiki Page:

Description

combineIdenticalAlts commons up branches of case expressions which have the same RHS.

However, it is not fully implemented so opportunities to common up branches are missed. For very big case expressions in might impact compilation time but it could be something which is enabled by -O2.

For example, the C and D case for foo are not commened up but the A and B case in foo2 are.

module Foo where

data T = A | B | C | D


foo x = case x of
          A -> 0
          B -> 1
          C -> 2
          D -> 2

foo2 x = case x of
          A -> 2
          B -> 2
          C -> 0
          D -> 1

Change History (20)

comment:1 Changed 20 months ago by AndreasK

Cc: AndreasK added
Last edited 20 months ago by AndreasK (previous) (diff)

comment:2 Changed 20 months ago by simonpj

In CoreUtils, Note [Combine identical alternatives] acknowledges this problem, saying

To avoid an expensive test, we just merge branches equal to the *first*
alternative; this picks up the common cases
     a) all branches equal
     b) some branches equal to the DEFAULT (which occurs first)

And indeed you are right that it'd be better to combine

foo x = case x of   ==>   foo x = case x of
          A -> 0                    DEFAULT -> 2
          B -> 1                    A -> 0
          C -> 2                    B -> 1
          D -> 2

(Reminder: in Core the DEFAULT alternative always comes first, if it exists.)

Fortunately, we now have TrieMap.CoreMap, an efficient finite map whose keys are CoreExprs. Using this I think we can efficiently ask (as we iterate oover the alternatives) "have I seen this RHS in an earlier alternative?". More advanced: find the RHS that occurs most often. Take care that in the common case the RHSs are (a) large and (b) different, then the test does not exhaustively traverse the RHSs; just looks far enough to establish they are different.

This a nice well-contained task, if someone would like to have a go.

comment:3 Changed 20 months ago by mpickering

Keywords: newcomer added

I'll mark it for a newcomer as it is well-specified now and should be a good introduction to the code base.

If anyone wants to take it on then feel free to ask for help on #ghc.

comment:4 Changed 19 months ago by sjakobi

I'm giving this ticket a try.

comment:5 Changed 19 months ago by sjakobi

Owner: set to sjakobi

comment:6 Changed 19 months ago by bgamari

Differential Rev(s): Phab:D4419
Milestone: 8.6.1

comment:7 Changed 19 months ago by Ben Gamari <ben@…>

In fe98cd75/ghc:

Combine the CoreAlts with the most common RHS

Unless there already is a DEFAULT alternative, look for the most common
RHS and create a new DEFAULT alt.

Previously, only the very first RHS was considered.

Test Plan: make test TEST="T7360 T14684"

Reviewers: bgamari

Subscribers: AndreasK, mpickering, rwbarton, thomie, carter

GHC Trac Issues: #14684

Differential Revision: https://phabricator.haskell.org/D4419

comment:8 Changed 19 months ago by simonpj

It's invisible in the message above, but I think this commit is only on branch wip/T14684. Is that what you intended? Or did you mean to commit to HEAD?

comment:9 in reply to:  8 Changed 19 months ago by sjakobi

Replying to simonpj:

It's invisible in the message above, but I think this commit is only on branch wip/T14684. Is that what you intended? Or did you mean to commit to HEAD?

We agreed that the new most-common-RHS code should be moved into -O2 before it can land in master.

BTW, regarding your comment above:

Fortunately, we now have TrieMap.CoreMap, an efficient finite map whose keys are CoreExprs. Using this I think we can efficiently ask (as we iterate oover the alternatives) "have I seen this RHS in an earlier alternative?". More advanced: find the RHS that occurs most often. Take care that in the common case the RHSs are (a) large and (b) different, then the test does not exhaustively traverse the RHSs; just looks far enough to establish they are different.

In the common case you describe, my code will still insert every alternative into the CoreMap. I haven't been able to find a less expensive way to establish that the alternatives are all different, but I'd love to hear about any ideas!

I have also found CSE.combineAlts:

{- Note [Combine case alternatives]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
combineAlts is just a more heavyweight version of the use of
combineIdenticalAlts in SimplUtils.prepareAlts.  The basic idea is
to transform

    DEFAULT -> e1
    K x     -> e1
    W y z   -> e2
===>
   DEFAULT -> e1
   W y z   -> e2

In the simplifier we use cheapEqExpr, because it is called a lot.
But here in CSE we use the full eqExpr.  After all, two alternatives usually
differ near the root, so it probably isn't expensive to compare the full
alternative.  It seems like the same kind of thing that CSE is supposed
to be doing, which is why I put it here.

Should combineAlts get the same optimization as combineIdenticalAlts? Might combineAlts even be the better place to apply the optimization?

comment:10 Changed 18 months ago by sjakobi

Differential Rev(s): Phab:D4419Phab:D4542

comment:11 Changed 18 months ago by simonpj

Should combineAlts get the same optimization as combineIdenticalAlts? Might combineAlts even be the better place to apply the optimization?

Actually I think you are right -- CSE probably would be a better place. And then (because it doesn't happen all the time), you can probably do this stuff regardless of -O2. (Having a flag might still be good just so you can switch it on and off at will.)

I added some comments to Phab which will be useful either way.

comment:12 in reply to:  11 Changed 18 months ago by sjakobi

Cc: dfeuer added

Replying to simonpj:

Thanks for your comments, Simon!

I realized that if I apply the combine-most-common-alts optimization only in CSE.combineAlts, we'd get "suboptimal" results in cases like this:

case x of
  A -> 1
  B -> 1
  C -> 2
  D -> 2
  E -> 2

Here, combineIdenticalAlts (which in my understanding runs first) would combine the A and B alts, defeating CSE.combineAlts which could combine C, D and E.

I'm not sure if this warrants doing the optimization in both places with -O2.


I also noticed that unlike combineIdenticalAlts CSE.combineAlts doesn't combine the ticks of the identical alts. Would you, David, happen to know why this is permissible in this case?

comment:13 Changed 18 months ago by simonpj

Here, combineIdenticalAlts (which in my understanding runs first) would combine the A and B alts, defeating CSE.combineAlts which could combine C, D and E.

But that would introduce a nasty phase-ordering problem. Instead, maybe CSE.combineAlts should transform

case x of
  DEFAULT -> 1
  C -> 2
  D -> 2
  E -> 2

(which the user might have written) into

case x of
  DEFAULT -> 2
  A -> 1
  B -> 1

comment:14 in reply to:  13 Changed 18 months ago by sjakobi

Replying to simonpj:

Expanding the DEFAULT case before recombining the alts is an interesting idea. I'm wondering though if we'd recreate alts that may have been excluded (by the programmer or optimization steps) based on information that isn't available in CSE. At worst we'd still just recreate the same DEFAULT alt, so I believe no pessimisation is possible.

The design I currently have in mind involves largely imitating SimplUtils.prepareAlts to

  1. detect which constructors are impossible and should not be considered when expanding the DEFAULT.
  2. use a variant of refineDefaultAlt to expand the DEFAULT alt into alts using the remaining possible constructors.

This also requires threading an additional UniqSupply argument through CSE which we need to create binders when expanding the DEFAULT case.

Does this sound right?

Last edited 18 months ago by sjakobi (previous) (diff)

comment:15 Changed 18 months ago by simonpj

I'm wondering though if we'd recreate alts that may have been excluded (by the programmer or optimization steps) based on information that isn't available in CSE.

Good point.

Let's not be over-elaborate.

My suggestion:

  • Try doing it all in CSE.combineAlts, as you suggest in comment:9
  • Switch off combineIdenticalAlts altogether
  • Do not try the clever stuff suggested in comment:13 for the reasons you mention

If we can get away with doing this only in CSE, that'd be an advantage, I think. Less duplication of effort.

comment:16 Changed 18 months ago by sjakobi

Keywords: CSE added

comment:17 Changed 17 months ago by sjakobi

As I haven't made any progress with this in a while, and as I'm not sure I'll make any soon, I'm recording where I'm currently stuck:

While combineIdenticalAlts uses

cheapEqExpr' :: (Tickish Id -> Bool) -> Expr b -> Expr b -> Bool

CSE.combineAlts uses

eqExpr :: InScopeSet -> CoreExpr -> CoreExpr -> Bool

What I haven't figured out, is how to translate the use of eqExpr into using a CoreMap where CoreExprs end up as the same key if they are equal according to eqExpr in_scope.

I wanted to look at how the rest of CSE uses CoreMaps, but I didn't get around to that yet.

comment:18 Changed 15 months ago by bgamari

Milestone: 8.6.18.8.1

These won't be addressed in GHC 8.6.

comment:19 Changed 9 months ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of low-priority tickets.

comment:20 Changed 9 months ago by osa1

Bumping milestones of low-priority tickets.

Note: See TracTickets for help on using tickets.