Opened 8 years ago

Closed 3 years ago

#5615 closed bug (fixed)

ghc produces poor code for `div` with constant powers of 2.

Reported by: Lennart Owned by: akio
Priority: normal Milestone: 8.2.1
Component: Compiler Version: 7.4.1-rc1
Keywords: Cc: dterei, carter.schonwald@…, akio, tibbe, wren@…, michalt, maoe, liyang
Operating System: Unknown/Multiple Architecture: x86
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D2486
Wiki Page:

Description

The code for Int (div x c) where c is a constant of the form 2n should be implemented with an arithmetic right shift, currently it's a call to a function. This optimization should be done for all signed types.

Change History (32)

comment:1 Changed 8 years ago by simonpj

Owner: set to daniel.is.fischer

Daniel, might you have time to look at this? (Or Ian.)

Simon

comment:2 Changed 8 years ago by daniel.is.fischer

I'll see if I can find the relevant modules of the compiler.

comment:3 Changed 8 years ago by Lennart

Also, mod of powers of 2 should be an AND instruction.

comment:4 Changed 8 years ago by simonmar

From this input:

module Foo where

foo :: Int -> Int
foo x = x `div` 64

With -O2 we get as far as a call to GHC.Base.divInt#:

Foo.foo =
  \ (x_abl :: GHC.Types.Int) ->
    case x_abl of _ { GHC.Types.I# ww_awV ->
    case GHC.Base.divInt# ww_awV 64 of ww1_ax3 { __DEFAULT ->
    GHC.Types.I# ww1_ax3
    }
    }

but divInt# wasn't inlined:

Considering inlining: GHC.Base.divInt#
    arg infos [TrivArg, ValueArg]
    uf arity 2
    interesting continuation ArgCtxt False
    some_benefit True
    is exp: True
    is cheap: True
    guidance IF_ARGS [0 0] 139 0
    discounted size = 109
    ANSWER = NO

And divInt# has this unfolding:

435aa24ba2ad252f2b1992da3e8faa90
  divInt# :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
    {- Arity: 2, HasNoCafRefs, Strictness: LL,
       Unfolding: (\ x# :: GHC.Prim.Int# y# :: GHC.Prim.Int# ->
                   case GHC.Prim.># x# 0 of wild {
                     GHC.Types.False
                     -> case GHC.Prim.<# x# 0 of wild1 {
                          GHC.Types.False -> GHC.Prim.quotInt# x# y#
                          GHC.Types.True
                          -> case GHC.Prim.># y# 0 of wild2 {
                               GHC.Types.False -> GHC.Prim.quotInt# x# y#
                               GHC.Types.True
                               -> case GHC.Prim.quotInt#
                                         (GHC.Prim.+# x# 1)
                                         y# of wild3 { DEFAULT ->
                                  GHC.Prim.-# wild3 1 } } }
                     GHC.Types.True
                     -> case GHC.Prim.<# y# 0 of wild1 {
                          GHC.Types.False
                          -> case GHC.Prim.<# x# 0 of wild2 {
                               GHC.Types.False -> GHC.Prim.quotInt# x# y#
                               GHC.Types.True
                               -> case GHC.Prim.># y# 0 of wild3 {
                                    GHC.Types.False -> GHC.Prim.quotInt# x# y#
                                    GHC.Types.True
                                    -> case GHC.Prim.quotInt#
                                              (GHC.Prim.+# x# 1)
                                              y# of wild4 { DEFAULT ->
                                       GHC.Prim.-# wild4 1 } } }
                          GHC.Types.True
                          -> case GHC.Prim.quotInt#
                                    (GHC.Prim.-# x# 1)
                                    y# of wild2 { DEFAULT ->
                             GHC.Prim.-# wild2 1 } } }) -}

I'm rather surprised it has such a large size, since it is mostly primops and inline cases. I suspect we're counting too much for those cases, they ought to look cheap because they're inline (not evals). And there ought to be a big discount for that constant argument.

comment:5 Changed 8 years ago by igloo

Milestone: 7.6.1

Other than the unpleasant solution of adding rules for

x `div` 2
x `div` 4
x `div` 8
...
x `div` 9223372036854775808

I think we would need a rule in prelude/PrelRules.lhs to do this. But I think this would get us back to the problem of needing to get an Id for unsafeShiftR from somewhere.

Would we be better off moving rule application to a different point in the compiler, when we are in a MonadThings monad?

comment:6 Changed 8 years ago by dterei

Cc: dterei added

comment:7 Changed 8 years ago by simonpj

difficulty: Unknown

I don't see the problem here. We can add a rule for divInt# in PrelRules. The rule would replace (x `divInt#` 8) with (x `uncheckedIShiftRA#` 3); except that obviously instead of 8 we'd have any power of 2. We can get the Id for any primop easily.

Daniel might you look at this?

Simon

comment:8 Changed 8 years ago by simonmar

I think I was hoping that we'd be able to do it without a built-in rule, since the backends already turn constant divisions into shifts (and LLVM probably has a range of more sophisticated translations). If only divInt# was inlined, it would just work.

comment:9 Changed 8 years ago by simonpj

The trouble is that, as you can see above divInt# has a pretty big unfolding, and inlining that at every call site is perhaps overkill. Mind you, if the denominator is constant, at least some of that unfolding is dead code. But there's also a test on the numerator too; quotInt# is the primop I think (rightly or wrongly).

I wonder whether this numerator test stuff affects the correctness or otherwise of replacing divInt with a shift?

Anyway, adding a RULE woudl be fine I think.

comment:10 Changed 8 years ago by simonmar

Well, that's a good point. Integer right shift is equivalent to division truncated to negative infinity (i.e. div), so the test on the numerator would not be necessary. That's a good argument for having a built in rule.

For quot some additional checks would be necessary in the rule. I just checked what gcc does, and for x / 2 it generates the clever sequence

        movl    %edi, %eax
        shrl    $31, %eax
        addl    %edi, %eax
        sarl    %eax

which is basically the same as x < 0 ? (x+1) >> 1 : x >> 1. Dividing by 4 instead gives this:

        leal    3(%rdi), %eax
        testl   %edi, %edi
        cmovns  %edi, %eax
        sarl    $2, %eax

which is x < 0 ? (x + 3) >> 2 : x >> 2.

We could do the same tricks with a built-in rule for quot.

comment:11 Changed 8 years ago by simonmar

Version: 7.2.17.4.1-rc1

One more thing to add: the backend does have some clever rewriting of quot# to shift, this comment is from cmm/CmmOpt.hs:

                -- shift right is not the same as quot, because it rounds
                -- to minus infinity, whereasq quot rounds toward zero.
                -- To fix this up, we add one less than the divisor to the
                -- dividend if it is a negative number.
                --
                -- to avoid a test/jump, we use the following sequence:
                --      x1 = x >> word_size-1  (all 1s if -ve, all 0s if +ve)
                --      x2 = y & (divisor-1)
                --      result = (x+x2) >>= log2(divisor)
                -- this could be done a bit more simply using conditional moves,
                -- but we're processor independent here.
                --
                -- we optimise the divide by 2 case slightly, generating
                --      x1 = x >> word_size-1  (unsigned)
                --      return = (x + x1) >>= log2(divisor)

comment:12 Changed 8 years ago by augustss

Does the backend optimization of quot interact nicely with how div is encoded using quot? I want div (with a power of 2) to turn into a shift, because div (unlike quot) is equivalent to a shift.

comment:13 Changed 8 years ago by simonmar

Unfortunately not, which is why this ticket is still open. I'm just pointing out related pieces we already have in place in the compiler.

comment:14 Changed 7 years ago by igloo

Milestone: 7.6.17.6.2

comment:15 Changed 6 years ago by carter

Cc: carter.schonwald@… added

comment:16 Changed 5 years ago by tibbe

Cc: tibbe added

comment:17 Changed 5 years ago by thomie

How about implementing (and optimizing) Euclidean division?

Division and Modulus for Computer Scientists (6 pages pdf) by Daan Leijen.

"Boute argues that Euclidean division is superior to the other ones in terms of regularity and useful mathematical properties, allthough floored division, promoted by Knuth, is also a good definition. Despite its widespread use, truncated division is shown to be inferior to the other definitions.

An interesting mathematical property that is only satisfied by Euclidean division is the shift-rule. A compiler can use this to optimize divisions by a power of two into an arithmetical shift or a bitwise-and operation."

D divE (2n) = D asr n D modE (2n) = D and (2n - 1)

Note: Dart's modulo operator (%) uses Euclidean division.

Link to the original paper by Raymond T. Boute: The Euclidean Definition of the Functions div and mod (1992) (18 pages pdf).

Edit: replace dead link.

Last edited 5 years ago by thomie (previous) (diff)

comment:18 Changed 5 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:19 in reply to:  17 Changed 5 years ago by dfeuer

Replying to thomie:

Here's an updated link to Daan Leijen's paper. Euclidean div/mod, along with other benefits, works much better than Haskell div/mod with the idea I've been playing with of implementing div-like things and mod-like things using divMod-like things. In particular, CSE can do some really nifty stuff with either quotRem or divModEuclidean, but it has no hope with divMod (at least as it's currently implemented) in the general case, because the latter has complicated code that gets all torn up before it gets to CSE. I don't know if there's a way to fix that. It does work out if one or both of the operands is known and there's sufficient inlining.

comment:20 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:21 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:22 Changed 4 years ago by WrenThornton

Cc: wren@… added

comment:23 Changed 4 years ago by michalt

Cc: michalt added

comment:24 Changed 4 years ago by maoe

Cc: maoe added

comment:25 Changed 4 years ago by bgamari

Type of failure: None/UnknownRuntime performance bug

comment:26 Changed 4 years ago by thomie

Milestone: 8.0.1

comment:27 Changed 4 years ago by liyang

Cc: liyang added

comment:28 Changed 3 years ago by akio

Cc: akio added

Am I right that for div and mod this is just waiting to be implemented (in PrelRules)?

comment:29 Changed 3 years ago by simonpj

Looks like it!

I'd look carefully at comment:19 and its links.

comment:30 Changed 3 years ago by akio

Differential Rev(s): Phab:D2486
Owner: changed from daniel.is.fischer to akio

I implemented rules for div and mod. I decided not to do anything with Euclidean division because (1) the rules I added are valid for the current semantics (floor division) as well as for Euclidean division, and (2) I believe that changing the semantics of div and mod is a much bigger change and should be discussed separately.

comment:31 Changed 3 years ago by Ben Gamari <ben@…>

In 6ea6242/ghc:

Turn divInt# and modInt# into bitwise operations when possible

This implements #5615 for divInt# and modInt#.

I also included rules to do constant-folding when the both arguments
are known.

Test Plan: validate

Reviewers: austin, simonmar, bgamari

Reviewed By: bgamari

Subscribers: hvr, thomie

Differential Revision: https://phabricator.haskell.org/D2486

GHC Trac Issues: #5615

comment:32 Changed 3 years ago by bgamari

Milestone: 8.2.1
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.