Opened 8 years ago
Closed 3 years ago
#5615 closed bug (fixed)
ghc produces poor code for `div` with constant powers of 2.
Reported by: | Lennart | Owned by: | akio |
---|---|---|---|
Priority: | normal | Milestone: | 8.2.1 |
Component: | Compiler | Version: | 7.4.1-rc1 |
Keywords: | Cc: | dterei, carter.schonwald@…, akio, tibbe, wren@…, michalt, maoe, liyang | |
Operating System: | Unknown/Multiple | Architecture: | x86 |
Type of failure: | Runtime performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | Phab:D2486 | |
Wiki Page: |
Description
The code for Int (div x c)
where c
is a constant of the form 2^{n} should be implemented with an arithmetic right shift, currently it's a call to a function. This optimization should be done for all signed types.
Change History (32)
comment:1 Changed 8 years ago by
Owner: | set to daniel.is.fischer |
---|
comment:4 Changed 8 years ago by
From this input:
module Foo where foo :: Int -> Int foo x = x `div` 64
With -O2 we get as far as a call to GHC.Base.divInt#
:
Foo.foo = \ (x_abl :: GHC.Types.Int) -> case x_abl of _ { GHC.Types.I# ww_awV -> case GHC.Base.divInt# ww_awV 64 of ww1_ax3 { __DEFAULT -> GHC.Types.I# ww1_ax3 } }
but divInt#
wasn't inlined:
Considering inlining: GHC.Base.divInt# arg infos [TrivArg, ValueArg] uf arity 2 interesting continuation ArgCtxt False some_benefit True is exp: True is cheap: True guidance IF_ARGS [0 0] 139 0 discounted size = 109 ANSWER = NO
And divInt#
has this unfolding:
435aa24ba2ad252f2b1992da3e8faa90 divInt# :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# {- Arity: 2, HasNoCafRefs, Strictness: LL, Unfolding: (\ x# :: GHC.Prim.Int# y# :: GHC.Prim.Int# -> case GHC.Prim.># x# 0 of wild { GHC.Types.False -> case GHC.Prim.<# x# 0 of wild1 { GHC.Types.False -> GHC.Prim.quotInt# x# y# GHC.Types.True -> case GHC.Prim.># y# 0 of wild2 { GHC.Types.False -> GHC.Prim.quotInt# x# y# GHC.Types.True -> case GHC.Prim.quotInt# (GHC.Prim.+# x# 1) y# of wild3 { DEFAULT -> GHC.Prim.-# wild3 1 } } } GHC.Types.True -> case GHC.Prim.<# y# 0 of wild1 { GHC.Types.False -> case GHC.Prim.<# x# 0 of wild2 { GHC.Types.False -> GHC.Prim.quotInt# x# y# GHC.Types.True -> case GHC.Prim.># y# 0 of wild3 { GHC.Types.False -> GHC.Prim.quotInt# x# y# GHC.Types.True -> case GHC.Prim.quotInt# (GHC.Prim.+# x# 1) y# of wild4 { DEFAULT -> GHC.Prim.-# wild4 1 } } } GHC.Types.True -> case GHC.Prim.quotInt# (GHC.Prim.-# x# 1) y# of wild2 { DEFAULT -> GHC.Prim.-# wild2 1 } } }) -}
I'm rather surprised it has such a large size, since it is mostly primops and inline cases. I suspect we're counting too much for those cases, they ought to look cheap because they're inline (not evals). And there ought to be a big discount for that constant argument.
comment:5 Changed 8 years ago by
Milestone: | → 7.6.1 |
---|
Other than the unpleasant solution of adding rules for
x `div` 2 x `div` 4 x `div` 8 ... x `div` 9223372036854775808
I think we would need a rule in prelude/PrelRules.lhs
to do this. But I think this would get us back to the problem of needing to get an Id
for unsafeShiftR
from somewhere.
Would we be better off moving rule application to a different point in the compiler, when we are in a MonadThings
monad?
comment:6 Changed 8 years ago by
Cc: | dterei added |
---|
comment:7 Changed 8 years ago by
difficulty: | → Unknown |
---|
I don't see the problem here. We can add a rule for divInt#
in PrelRules
. The rule would replace (x `divInt#` 8)
with (x `uncheckedIShiftRA#` 3)
; except that obviously instead of 8 we'd have any power of 2. We can get the Id
for any primop easily.
Daniel might you look at this?
Simon
comment:8 Changed 8 years ago by
I think I was hoping that we'd be able to do it without a built-in rule, since the backends already turn constant divisions into shifts (and LLVM probably has a range of more sophisticated translations). If only divInt#
was inlined, it would just work.
comment:9 Changed 8 years ago by
The trouble is that, as you can see above divInt#
has a pretty big unfolding, and inlining that at every call site is perhaps overkill. Mind you, if the denominator is constant, at least some of that unfolding is dead code. But there's also a test on the numerator too; quotInt#
is the primop I think (rightly or wrongly).
I wonder whether this numerator test stuff affects the correctness or otherwise of replacing divInt
with a shift?
Anyway, adding a RULE woudl be fine I think.
comment:10 Changed 8 years ago by
Well, that's a good point. Integer right shift is equivalent to division truncated to negative infinity (i.e. div
), so the test on the numerator would not be necessary. That's a good argument for having a built in rule.
For quot
some additional checks would be necessary in the rule. I just checked what gcc does, and for x / 2
it generates the clever sequence
movl %edi, %eax shrl $31, %eax addl %edi, %eax sarl %eax
which is basically the same as x < 0 ? (x+1) >> 1 : x >> 1
. Dividing by 4 instead gives this:
leal 3(%rdi), %eax testl %edi, %edi cmovns %edi, %eax sarl $2, %eax
which is x < 0 ? (x + 3) >> 2 : x >> 2
.
We could do the same tricks with a built-in rule for quot
.
comment:11 Changed 8 years ago by
Version: | 7.2.1 → 7.4.1-rc1 |
---|
One more thing to add: the backend does have some clever rewriting of quot#
to shift, this comment is from cmm/CmmOpt.hs
:
-- shift right is not the same as quot, because it rounds -- to minus infinity, whereasq quot rounds toward zero. -- To fix this up, we add one less than the divisor to the -- dividend if it is a negative number. -- -- to avoid a test/jump, we use the following sequence: -- x1 = x >> word_size-1 (all 1s if -ve, all 0s if +ve) -- x2 = y & (divisor-1) -- result = (x+x2) >>= log2(divisor) -- this could be done a bit more simply using conditional moves, -- but we're processor independent here. -- -- we optimise the divide by 2 case slightly, generating -- x1 = x >> word_size-1 (unsigned) -- return = (x + x1) >>= log2(divisor)
comment:12 Changed 8 years ago by
Does the backend optimization of quot interact nicely with how div is encoded using quot? I want div (with a power of 2) to turn into a shift, because div (unlike quot) is equivalent to a shift.
comment:13 Changed 8 years ago by
Unfortunately not, which is why this ticket is still open. I'm just pointing out related pieces we already have in place in the compiler.
comment:14 Changed 7 years ago by
Milestone: | 7.6.1 → 7.6.2 |
---|
comment:15 Changed 7 years ago by
Cc: | carter.schonwald@… added |
---|
comment:16 Changed 5 years ago by
Cc: | tibbe added |
---|
comment:17 follow-up: 19 Changed 5 years ago by
How about implementing (and optimizing) Euclidean division?
Division and Modulus for Computer Scientists (6 pages pdf) by Daan Leijen.
"Boute argues that Euclidean division is superior to the other ones in terms of regularity and useful mathematical properties, allthough floored division, promoted by Knuth, is also a good definition. Despite its widespread use, truncated division is shown to be inferior to the other definitions.
An interesting mathematical property that is only satisfied by Euclidean division is the shift-rule. A compiler can use this to optimize divisions by a power of two into an arithmetical shift or a bitwise-and operation."
D div_{E} (2^{n}) = D asr n D mod_{E} (2^{n}) = D and (2^{n - 1})
Note: Dart's modulo operator (%) uses Euclidean division.
Link to the original paper by Raymond T. Boute: The Euclidean Definition of the Functions div and mod (1992) (18 pages pdf).
Edit: replace dead link.
comment:19 Changed 5 years ago by
Replying to thomie:
Here's an updated link to Daan Leijen's paper. Euclidean div/mod, along with other benefits, works much better than Haskell div/mod with the idea I've been playing with of implementing div
-like things and mod
-like things using divMod
-like things. In particular, CSE can do some really nifty stuff with either quotRem
or divModEuclidean
, but it has no hope with divMod
(at least as it's currently implemented) in the general case, because the latter has complicated code that gets all torn up before it gets to CSE. I don't know if there's a way to fix that. It does work out if one or both of the operands is known and there's sufficient inlining.
comment:20 Changed 5 years ago by
Milestone: | 7.10.1 → 7.12.1 |
---|
Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.
comment:22 Changed 4 years ago by
Cc: | wren@… added |
---|
comment:23 Changed 4 years ago by
Cc: | michalt added |
---|
comment:24 Changed 4 years ago by
Cc: | maoe added |
---|
comment:25 Changed 4 years ago by
Type of failure: | None/Unknown → Runtime performance bug |
---|
comment:26 Changed 4 years ago by
Milestone: | 8.0.1 |
---|
comment:27 Changed 4 years ago by
Cc: | liyang added |
---|
comment:28 Changed 3 years ago by
Cc: | akio added |
---|
Am I right that for div
and mod
this is just waiting to be implemented (in PrelRules)?
comment:30 Changed 3 years ago by
Differential Rev(s): | → Phab:D2486 |
---|---|
Owner: | changed from daniel.is.fischer to akio |
I implemented rules for div
and mod
. I decided not to do anything with Euclidean division because (1) the rules I added are valid for the current semantics (floor division) as well as for Euclidean division, and (2) I believe that changing the semantics of div
and mod
is a much bigger change and should be discussed separately.
comment:32 Changed 3 years ago by
Milestone: | → 8.2.1 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Daniel, might you have time to look at this? (Or Ian.)
Simon