Opened 5 years ago

Closed 14 months ago

Last modified 11 months ago

#9136 closed bug (fixed)

Constant folding in Core could be better

Reported by: simonpj Owned by:
Priority: normal Milestone: 8.6.1
Component: Compiler Version: 7.8.2
Keywords: Cc: akio, maoe
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D2858
Wiki Page:

Description

Constant folding in Core is still a bit feeble. For example, when staring at the code for nofib/shootout/k-nucleotide I saw this expression:

 (x +# 8) -# 1

which would be relatively easy to optimise.

The difficulty is knowing where to stop. What about (8 +# x) -# 1?

Anyway, this ticket is to record the issue. Currently constant folding is done by the built-in RULES in compiler/prelude/PrelRules.

Change History (31)

comment:1 Changed 5 years ago by nomeata

That reminded me of something, so I filed #9137. If we had that we could specify RULES like

forall x y z. (isLiteral y, isLiteral z) => (x +# y) -# z = x +# (y -# z)
forall x y z. (isLiteral y, isLiteral z) => (x -# y) -# z = x -# (y +# z)
...

that would rewrite the terms to have the literals next to each other.

Last edited 5 years ago by nomeata (previous) (diff)

comment:2 Changed 5 years ago by simonpj

You could do that in PrelRules too. Yes, being able to do it in source code would be cool, but it all requires more syntax, a story about what predicates are allowed, etc etc, and it is far from obvious where to stop. The advantage of PrelRules is that you can do anything!

comment:3 Changed 5 years ago by schyler

I'd be against the thought of this kind of optimization being a rewrite rule.

comment:4 Changed 5 years ago by nomeata

Care to explain?

(x +# y) -# z = x +# (y -# z) looks very much like rewriting to me, and rewrite rules are a powerful mechanism for that. And if we were able to specify that declaratively in source code, instead of letting the compiler grow and grow, even better, I’d say.

Also note that constant folding is also done with (built-in, more powerful) RULES already.

comment:5 Changed 5 years ago by nomeata

The difficulty is knowing where to stop. What about (8 +# x) -# 1?

Surely there must be standard solutions in (non-functional) compilers. I asked the compiler guys next door and they pointed me to Muchnick’s Advanced Compiler Design & Implementation, where Fig 12.6 lists 20 transformation rules which should move constants in an expression involving +, - and * together and combine the constants. I don’t understand them yet, but something along these lines would work well.

comment:6 Changed 5 years ago by nomeata

Owner: set to nomeata

I couldn’t resist to get started on this, so branch wip/T9136 contains a test case (d4081919/ghc) and a first implementation for + (115fd8b043/ghc). Not polished and complete yet, and not yet validated, but it goes in the right direction I think.

comment:7 Changed 5 years ago by nomeata

Ok, implemented also for Word.

The code does a transformation from x -# (Lit n) to x +# (Lit -n) and then only works on tress of additions. I didn’t do that for Word yet, because foo +# 4294967295 is not very readable in Core. Should I?

comment:8 Changed 5 years ago by carter

One concern if have off hand: couldn't this sort of optimization change how a program under/overflows? I can't think of an example off hand, but could that happen?

comment:9 Changed 5 years ago by nomeata

As long as you do it on types that wrap around (like Int# and Word#) or are unbounded (Integer): No, I don’t think so. Also, if there are any issues, I assume they would already be present in the existing constant folding code.

comment:10 Changed 5 years ago by nomeata

Refactored the code; no more transformation x -# (Lit n) to x +# (Lit -n) unless it is required to combine some literals. This yields prettier Core and this way it validates without having to update test result core output.

comment:11 Changed 5 years ago by nomeata

Status: newpatch

Nofib shows that this optimization yields no measurable gain – most likely because later stages (maybe gcc) perform them anyways. So do we still want it, for the sake of pleasing Core, and maybe for the benefit of other backends like GHCJS?

If yes, this can go in, so marking this as ready for review.

comment:12 Changed 5 years ago by carter

@nomeata .... so I thought a bit about your reply... and part of my concern stems from the fact that the wrapping behavior of Int in GHC is actually a choice of "undefined" behavior by the various Haskell language standards. Point being, while wrap around is "folklore behavior", its actually deemed undefined by the standard. And it constitutes moving that undefined "folklore" into being a strong assumption in ghc.

how would GCC come into play? theres no peephole optimization by the system assembler level in -fasm to my knowledge.

most of the examples i've seen where arithmetic optimization makes a HUGE win are in the context of heavy bit fiddling sequences (I have some example that are 40 instructions long in -fasm, but optimized to NOOP on -fllvm), and various sophisticated array indexing computations.

If nofib doesn't show a measurable difference and theres no motivating example program that shows a strong win.. I'd argue its not worth adding the rules.

I'm happy to share some more interesting examples, but I'm not sure if they actually can be handled by a rewrite rules discipline.

comment:13 Changed 5 years ago by nomeata

If you have examples where such constant folding can be a win I’d of course be interested.

comment:14 Changed 5 years ago by nomeata

@SPJ: Since you opened the ticket I guess it’s your call whether you think this is worth it, even if nofib shows no measurable improvement.

comment:15 Changed 5 years ago by skeuchel

I would like to further work on this, i.e. add multiplication. Is there still interest in having this?

comment:16 Changed 5 years ago by simonpj

I had a look and the patch seemed broadly OK, through rather under-commented. Notably the parameters to the key function, whose name I now forget, and a user-readable version of the rewrite that each block of code performs.

Care is needed to be sure that the rewrites really are semantics preserving, given that overflow may happen. It's probably ok because the types involved do wrap-around, but that very point needs a careful Note in the code.

By all means extend it further, but please do document it well.

One thing to bear in mind is that these rewrite rules are tried for every single expression (op a b), where op is (say) Int# addition, and every single iteration of the simplifier. So it needs to fail fast if it ends up doing nothing. An alternative would be an entirely separate pass, which can then be as expensive as you like.

comment:17 Changed 5 years ago by thoughtpolice

Status: patchinfoneeded

@Joachim: is this patchset still in limbo? If there's no measurable performance impact, but it improves code and doesn't hurt performance, I don't see why you cannot merge it.

comment:18 Changed 5 years ago by nomeata

@Joachim: is this patchset still in limbo? If there's no measurable performance impact, but it improves code and doesn't hurt performance, I don't see why you cannot merge it.

Compiler code complexity would be one reason to not merge it. Other than that I mostly forgot about this... and so I’m not sure about the completeness of my work (and it needs more comments, as always). I’ll eventually return to it, but without high priority. Just leave it assigned to me.

comment:19 Changed 5 years ago by simonpj

I too am concerned about code complexity that does not buy performance; and also on the effect on compile time. That's not "no", just caution.

Simon

comment:20 Changed 5 years ago by nomeata

Owner: nomeata deleted

Given the general doubt about these changes, I’ll unassign this ticket.

comment:21 Changed 3 years ago by akio

Cc: akio added

I just came across the core output ((1 +# ((x -# 1) +# 1)) +# 1) +# 1 from some array indexing code, and thought that this kind of constant folding would be nice. However I don't have a standalone benchmark (yet) and consequently I don't know how much speed improvement could be achieved by the optimization.

comment:22 Changed 3 years ago by maoe

Cc: maoe added

comment:23 Changed 3 years ago by hsyl20

Differential Rev(s): Phab:D2858

comment:24 Changed 16 months ago by Ben Gamari <ben@…>

In fea04de/ghc:

Enhanced constant folding

Until now GHC only supported basic constant folding (lit op lit, expr op
0, etc.).

This patch uses laws of +/-/* (associativity, commutativity,
distributivity) to support some constant folding into nested
expressions.

Examples of new transformations:

   - simple nesting: (10 + x) + 10 becomes 20 + x
   - deep nesting: 5 + x + (y + (z + (t + 5))) becomes 10 + (x + (y + (z + t)))
   - distribution: (5 + x) * 6 becomes 30 + 6*x
   - simple factorization: 5 + x + (x + (x + (x + 5))) becomes 10 + (4 *x)
   - siblings: (5 + 4*x) - (3*x + 2) becomes 3 + x

Test Plan: validate

Reviewers: simonpj, austin, bgamari

Reviewed By: bgamari

Subscribers: thomie

GHC Trac Issues: #9136

Differential Revision: https://phabricator.haskell.org/D2858

comment:26 Changed 14 months ago by bgamari

Milestone: 8.6.1
Status: infoneededpatch

Oh dear, yes, excellent point; I dropped the ball on this. I'll re-merge for 8.6.

comment:27 Changed 14 months ago by Ben Gamari <ben@…>

In 60e4bb4d/ghc:

Enhanced constant folding

Until now GHC only supported basic constant folding (lit op lit, expr op
0, etc.).

This patch uses laws of +/-/* (associativity, commutativity,
distributivity) to support some constant folding into nested
expressions.

Examples of new transformations:

   - simple nesting: (10 + x) + 10 becomes 20 + x
   - deep nesting: 5 + x + (y + (z + (t + 5))) becomes 10 + (x + (y + (z + t)))
   - distribution: (5 + x) * 6 becomes 30 + 6*x
   - simple factorization: 5 + x + (x + (x + (x + 5))) becomes 10 + (4 *x)
   - siblings: (5 + 4*x) - (3*x + 2) becomes 3 + x

Test Plan: validate

Reviewers: simonpj, austin, bgamari

Reviewed By: bgamari

Subscribers: thomie

GHC Trac Issues: #9136

Differential Revision: https://phabricator.haskell.org/D2858

(cherry picked from commit fea04defa64871caab6339ff3fc5511a272f37c7)

comment:28 Changed 14 months ago by bgamari

Merged for 8.6.

comment:29 Changed 14 months ago by bgamari

Resolution: fixed
Status: patchclosed

comment:30 Changed 11 months ago by spacekitteh

Do these rules fire for floating point operations? Because they are neither associative nor distributive.

comment:31 in reply to:  30 Changed 11 months ago by hsyl20

Replying to spacekitteh:

Do these rules fire for floating point operations? Because they are neither associative nor distributive.

No, only for Int# and Word#.

Note: See TracTickets for help on using tickets.