Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#5237 closed bug (fixed)

Inefficient code generated for x^2

Reported by: scpmw Owned by: daniel.is.fischer
Priority: normal Milestone: 7.4.1
Component: libraries/base Version: 7.0.3
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Okay, this one is quite surprising to me. Suppose the following trivial Haskell program:

main = print (2.0 ^ 2)

This innocent example actually ends up generating multiple pages of Core! To see what's going on we just have to look at a few type signatures:

$wg1_r11f  :: Double# -> Type.Integer -> Double# -> Double#
$wf1 :: Double# -> Type.Integer -> Double#

The function $wf1 is what is called to perform the calculation. First thing to note is that it actually compiles ^2 into a loop (that's what $wg1_r11f is for). Second, to make matters worse, it ends up using Integer of all things as the index type (defaulting at work, probably).

This has obviously drastic performance implications. A more complicated example program actually got about 4 times faster just by replacing all instances of x^2 with x*x.

I guess this probably is not easy to fix in the general case (speculative loop unrolling?). But couldn't we have, say, a rule that catches the worst cases?

Attachments (2)

0001-Fix-5237.patch (3.6 KB) - added by daniel.is.fischer 8 years ago.
0001-Add-rules-for-powers-with-small-exponents-fixes-5237.patch (2.6 KB) - added by daniel.is.fischer 8 years ago.

Download all attachments as: .zip

Change History (25)

Changed 8 years ago by daniel.is.fischer

Attachment: 0001-Fix-5237.patch added

comment:1 Changed 8 years ago by daniel.is.fischer

Some rules for small exponents. Maybe rules for more exponents and exponent types would be good, but I didn't want to increase the number of rules too much.

it ends up using Integer of all things as the index type (defaulting at work, probably)

Yes, if the type isn't otherwise restricted, that's what happens.

comment:2 Changed 8 years ago by igloo

Milestone: 7.4.1
Status: newpatch

comment:3 Changed 8 years ago by igloo

Component: Compilerlibraries/base

comment:4 Changed 8 years ago by daniel.is.fischer

I've tested the rules in the last days, with mixed success. For Int and Integer, where the rules are in the same place as (^), they fire okay, unless specialisation gets in the way (so far, that affects Integer -> Integer -> Integer, Integer -> Int -> Integer and Int -> Int -> Int). To get the rules for Word exponents to fire, I had to specify the exponent as W# n##, so those rules seem pretty useless.

For the types we have specialisations for, specialising is done before rule-rewriting, apparently. I think it might be good to have a rule-rewriting pass before any other optimisations are tried, based on the assumption that rules generally are written to give large benefits, while general optimisations often give only a small benefit. If it's easy enough to add a rule-scanning pass before other simplifications, it's worth thinking about it.

comment:5 Changed 8 years ago by simonpj

The usual approach is to say {-# NOINLINE [1] #-} on functions that you want to match on the LHS of rules. Did you try that? Or, can you be more specific about what isn't firing?

Thanks for looking at this.

Simon

comment:6 Changed 8 years ago by daniel.is.fischer

Not sure I understand correctly. What we want is that occurrences of expr ^ 2 or expr ^ (2 :: Type) in source code are rewritten to expr * expr (similarly for exponents 3 and 4).

With the rules, that works fine if the exponent isn't given an explicit type or the type is Int or Integer - unless the type of expr is Integer (or Int if the exponent is specified as Int). For those types, (^) is specialised, and in the core replaced by the specialised version, then GHC.Real.^_^2 expr 2 doesn't match the rule anymore1.

Making (^) {-# NOINLINE [1] #-} instead of {-# INLINABLE #-} would cost a lot in other places, so that's not a good option; out of sheer morbid curiosity I tried it anyway, it didn't help. But if INLINABLE supports phase control (that isn't mentioned in the users guide, the syntax is accepted, however), perhaps (^) should become {-# INLINABLE [1] #-}?

With Word exponents, the problem seems to be that the exponent is moved to a top-level binding and the rule-matcher sees expr ^ lvl_fooBar instead of expr ^ (3 :: Word). But I don't think specifying the exponent as Word is common, mostly the exponent will not be given an explicit type and defaulted to Integer or specified as Int for better performance. So I'd just leave out the rules for Word.

1 Perhaps the compiler just considers the specialisation more specific than the rules and hence that wins? Compiling with extra rules for the specialised types to check.

comment:7 Changed 8 years ago by daniel.is.fischer

Indded, it seems the specialisations were considered a better match at those types. Extra rules for the specialised types fire.

comment:8 Changed 8 years ago by simonpj

I'm still unclear what the problem is. Could you boil out a standalone case that illustrates the problem you are trying to solve. The numeric system is complicated and my brain is small. Thanks.

Simon

comment:9 Changed 8 years ago by daniel.is.fischer

The problem is that if you write expr ^ 2 in the source, that's exactly what you get. But it's not what you want.

Consider the programme

{-# LANGUAGE BangPatterns #-}
module Main (main) where

import System.Environment (getArgs)

fun :: Double -> Double
fun x = go 0 0.5
  where
    go !acc z
      | x < z   = acc
      | otherwise = go (acc + z^2) (z+0.25)

main :: IO ()
main = getArgs >>= mapM_ (print . fun . read)

Compiling it with ghc-7.2.1, I get nearly 32K of Core and an executable delivering

dafis@schwartz:~/Haskell/BeginnersTesting> ./squareTest721 +RTS -s -RTS 1.2e7
2.3040000720013998e21
   2,304,132,792 bytes allocated in the heap
         186,544 bytes copied during GC
          28,992 bytes maximum residency (1 sample(s))
          26,288 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4411 colls,     0 par    0.01s    0.01s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0004s    0.0004s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.53s  (  2.53s elapsed)
  GC      time    0.02s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    2.54s  (  2.54s elapsed)

  %GC     time       0.6%  (0.6% elapsed)

  Alloc rate    910,944,778 bytes per MUT second

  Productivity  99.4% of total user, 99.4% of total elapsed

Compiling it with ghc-7.3.20110926 with rewrite rules, I get 6.6K of Core and an executable delivering

dafis@schwartz:~/Haskell/BeginnersTesting> ./squareTest73R +RTS -s -RTS 1.2e7
2.3040000720013998e21
         132,184 bytes allocated in the heap
           3,304 bytes copied during GC
          44,200 bytes maximum residency (1 sample(s))
          17,240 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.07s  (  0.07s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.07s  (  0.07s elapsed)

  %GC     time       0.1%  (0.1% elapsed)

  Alloc rate    1,767,189 bytes per MUT second

  Productivity  99.5% of total user, 103.1% of total elapsed

Since (^) is {-# INLINABLE #-}, GHC will most of the time create a type-specialised version of the exponentiation-by-repeated-squaring algorithm (a wrapper to test that the exponent isn't negative and to unpack the arguments if applicable, and two loops for the work). That function is then called with an exponent of 2. This creates a lot of code, much of which is never used (one loop runs twice, the other not at all). And it is an out-of-line function call, which can cost a lot of time in a loop.

Having GHC rewrite expr ^ 2 into expr * expr, a) no code for (^) has to be generated (or linked), b) for many types you get an inline multiplication instead of a function call.

Theoretically, a compiler could in such a situation, when one argument is statically known at compile time, try to evaluate the function a few steps to see what gives (the speculative loop unrolling mentioned by scpmw in the ticket). In this case, it'd find

2 < 0 ? No => calculate f x 2
even 2 ? Yes => calculate f (x*x) 1
even 1 ? No => 1 == 1 ? Yes => result is (x*x)

I would expect it to be tremendously hard to implement such a speculative evaluation in a way that would often yield useful results and not unduly increase compile times, though. So while no true magic is available, let's add a few rewrite rules to catch the cases where using (^) hurts most:

{-# RULES
"^2/Integer"  forall x. x ^ (2 :: Integer) = x*x
"^3/Integer"  forall x. x ^ (3 :: Integer) = (x*x)*x
"^4/Integer"  forall x. x ^ (4 :: Integer) = let y = x*x in y*y
  #-}

Fine. Now scpmw's 2.0 ^ 2 gets rewritten to 2.0 * 2.0 (and that is then evaluated to 4.0), generally, occurrences of expr ^ 2 get rewritten as desired.

Unless the type of expr is Integer, because in GHC.Real, (^) is specialised for the type Integer -> Integer -> Integer. Then GHC has two rules to choose from, ^2/Integer with matching exponent (and type), and the specialisation with matching argument and exponent type. Specialisation wins in this case. Since Integer multiplication isn't a primop but a function call, the effect isn't nearly as dramatic, but for the analogue of the above programme, I find a reduction of allocation and running time by a factor of roughly 2.3.

The more specific rewrite rule with type of base and exponent specified as Integer fires, though, since it's more specific than the specialisation.

Now, if you're compiling with -Wall, GHC will tell you

    Warning: Defaulting the following constraint(s) to type `Integer'
               (Integral b0) arising from a use of `^' at squareTest.hs:11:32
               (Num b0) arising from the literal `2' at squareTest.hs:11:33

and you might make it ^(2 :: Int) to have warning-free code and besides, Int is more efficient than Integer, isn't it? At least I have done that.

But now the above rules don't match and you get the expensive (^) again. So add rules for Int exponents. Since there are also specialisations of (^) for the types Integer -> Int -> Integer and Int -> Int -> Int, we also need extra rules for these types to win over specialisation.

Finally, what if the defaulting is resolved by making it ^(2 :: Word), after all, the exponent mustn't be negative, so why use a type with negative values? Therefore the first patch included rules for Word exponents. Since Word isn't available in GHC.Real, I added them to GHC.Word, but when testing, the only way to get them to fire was giving the exponent in the form W# n##, so I left them out of the second patch.

Afterthought: What about (^^)?

comment:10 Changed 8 years ago by igloo

Status: patchnew

So if I understand correctly, the problem is that with q.hs:

{-# RULES "^^^2/Integer" forall x. x ^^^ (2 :: Integer) = x * x #-}
{-# SPECIALISE (^^^) :: Integer -> Integer -> Integer #-}

{-# NOINLINE (^^^) #-}
(^^^) :: (Num a, Integral b) => a -> b -> a
x ^^^ y = 1

v :: Integer
v = 8 ^^^ 2

main :: IO ()
main = print v

we get the specialisation matching and not the rule, and thus the result is 1 rather than 64:

$ ghc -O -ddump-rule-firings q.hs
[1 of 1] Compiling Main             ( q.hs, q.o )
Rule fired: Class op fromInteger
Rule fired: SPEC Main.^^^
Rule fired: Class op show
Linking q ...
$ ./q
1

If we comment out the specialisation then we get the desired:

$ ghc -O -ddump-rule-firings q.hs
[1 of 1] Compiling Main             ( q.hs, q.o )
Rule fired: ^^^2/Integer
Rule fired: Class op *
Rule fired: timesInteger
Rule fired: Class op show
Linking q ...
$ ./q
64

(as a sidenote, I think the rule should be

{-# RULES "^^^2/Integer" forall x. x ^^^ (2 :: Integer) = let v = x in v * v #-}

in case x is a large expression)

comment:11 Changed 8 years ago by daniel.is.fischer

Yes. And with

{-# RULES "^^^2/Both"    forall (x :: Integer). x ^^^ (2 :: Integer) = x #-}

alongside the two others, that fires.

Re sidenote: yes, it should be for safety. That occurred to me today too. I've done a few tests and so far GHC always was clever enough to share the expression, but it's better not to rely on it.

With regard to the rule/specialisation competition, what if the {-# SPECIALISE #-} pragma is removed? Due to the {-# INLINABLE #-}, with optimisations, specialisations will (probably) be created at the call sites, won't they? So then we wouldn't need the extra rules. However, I think that the specialisations would be created in each module and not once per programme/package, and that would mean code-bloat. So what would be worse, more rules or the code-bloat?

comment:12 Changed 8 years ago by daniel.is.fischer

Patch now with let-binding to ensure sharing of x.

comment:13 Changed 8 years ago by igloo

Is adding the more specific rules the right thing to do?

Or would it make more sense for the optimiser to always choose rules over specialisations?

comment:14 Changed 8 years ago by daniel.is.fischer

I think it would be better if the optimiser chose rules over specialisations. If that's easily enough done, great. Otherwise adding the more specific rules is the right thing.

comment:15 Changed 8 years ago by simonpj

Actually I realise that we already have a good mechanism for prioritisation: namely the simplifer's "phases".

  • If you want a RULE to match on a function, you had better make sure the function doesn't inline too early. Typically use a NOINLINE pragma:
    {-# RULE forall x.  f [x] = blah #-}
    {-# NOINLINE [1] f #-}
    f x = ...blah2...
    
  • The simplifier works in phases, currently phase 2, then 1, then 0. The RULE is always active. The NOINLINE stops f inlining until phase 1. So the rule gets a chance to fire in the earlier phase 2.
  • You can attach phases to rules too, which say in which phases they are active. Ditto SPECIALISE pragmas
  • GHC automatically generates specialisations for overloaded functions. The consequent rules are active precisely when the original function's INLINE pragma (if any) is active. So if you say NONLINE [1] f, then any automatically generated specialisations of f will be active only in phase 1 and later.

Does that give you enough to go on?

Simon

comment:16 Changed 8 years ago by daniel.is.fischer

I'm afraid no. Take Ian's programme from above, we have a {-# RULES #-}, {-# SPECIALISE #-} and {-# NOINLINE #-} pragma. Both, rule and specialisation, match. We want the rule to fire, but it's always SPEC that fires. If adding phase control to specialise pragmas works without making it {-# SPECIALISE [NO]INLINE [k] #-}, I haven't found out how. Every variation of that I tried, together with {-# NOINLINE [k] #-} or without, still had the specialisation firing.

The only way to get the rule to fire was to remove the SPECIALISE pragma. But if we remove that from GHC.Real, we get code bloat because a specialisation will be generated in every module using (^).

Interestingly, adding a phase to the RULES pragma prevented that from firing even without the SPECIALISE.

comment:17 Changed 8 years ago by simonpj

I finally get it. What we want to say is

{-# RULES "^^^2/Integer" forall x. x ^^^ (2 :: Integer) = x * x #-}
{-# SPECIALISE [1] (^^^) :: Integer -> Integer -> Integer #-}

Note the [1] for SPECIALISE, saying "only run the specialise rule in phase 1 and later. So the RULE gets a chance to run in the earlier phase 2.

This is the right way to solve this, not by making rules override specialisation; that's far too fragile.

But on looking at GHC I see that SPECIALISE pragmas don't let you specify a phase in which they become active. That's just a stupid oversight; like any RULE or INLINE pragma, it should definitely have a phase control. That would let you solve the problem, right?

Simon

comment:18 Changed 8 years ago by daniel.is.fischer

I suppose it would.

comment:19 Changed 8 years ago by simonpj@…

commit 814d864125bdd03d8bc6c3fc551f393b21942c6c

Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Thu Nov 24 12:35:33 2011 +0000

    Support "phase control" for SPECIALISE pragmas
    
    This featurelet allows Trac #5237 to be fixed.
    The idea is to allow SPECIALISE pragmas to specify
    the phases in which the RULE is active, just as you can
    do with RULES themselves.
      {-# SPECIALISE [1] foo :: Int -> Int #-}
    
    This feature is so obvious that not having it is really a bug.
    There are, needless to say, a few wrinkles.  See
       Note [Activation pragmas for SPECIALISE]
    in DsBinds

 compiler/deSugar/DsBinds.lhs      |   60 +++++++++++++++++++++++++++++++++++-
 compiler/parser/Parser.y.pp       |   11 ++++---
 compiler/parser/RdrHsSyn.lhs      |    3 +-
 docs/users_guide/glasgow_exts.xml |   40 ++++++++++++++++++++++++
 4 files changed, 106 insertions(+), 8 deletions(-)

comment:20 Changed 8 years ago by simonpj

Owner: set to daniel.is.fischer

Daniel: right, it's done. Can you now fix the original bug using the new phase-control on specialisation? Thanks.

comment:21 Changed 8 years ago by daniel.is.fischer

I'm going to test how it works right away.

comment:22 Changed 8 years ago by daniel.is.fischer

Resolution: fixed
Status: newclosed

Works great.

Fixed by

commit 088c9661321de044b4092756e2b563ffd23c6f96
Author: Daniel Fischer <daniel.is.fischer@googlemail.com>
Date:   Fri Nov 25 03:46:46 2011 +0100

    Rules for powers with small exponents (fixes #5237)
    
    Calculating small powers by direct multiplication is more efficient
    than using (^). For small exponents known at compile time, there are
    now rewrite rules.

Regression test: perf/should_run/T5237.hs

comment:23 Changed 8 years ago by simonpj

Thank you Daniel!

Note: See TracTickets for help on using tickets.