Opened 8 years ago

Closed 8 years ago

#5161 closed bug (fixed)

Poor performance of division; unnecessary branching

Reported by: rtvd Owned by: igloo
Priority: normal Milestone: 7.2.1
Component: libraries/base Version: 7.0.3
Keywords: div quot rem mod performance slow branching Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

In case an Int* is divided by a constant the compiled code contains unnecessary checks whether the value being divided is minBound. This check is necessary only if a value is being divided by -1 so that an exception would be thrown.

The branch can be removed by the GHC's optimiser after a small change: the two parts of the corresponding condition have to be flipped. This provides an opportunity to short-circuit this condition to 'false' without knowing the first argument (patch is attached).

Attachments (1)

0001-Performance-improvement-for-division-got-rid-of-an-u.patch (13.0 KB) - added by rtvd 8 years ago.
patch

Download all attachments as: .zip

Change History (6)

comment:1 Changed 8 years ago by rtvd

Status: newpatch

Please can someone review and the patch and put it into repository so that it would appear in the next GHC release?

comment:2 Changed 8 years ago by simonpj

Your changes replace the guard

   | x == minBound && y == (-1) = overflowError 

by

   | y == (-1) && x == minBound = overflowError

At first I couldn't see how this would make any difference. After all, if y is 2 (say), then the overflow case is statically inaccessible, so the test should vanish. Here's why it doesn't. Initially we have something roughly like this:

quot x y = case x of
             -9223372036854775808 -> case y of 
                                        -1 -> overflowError
                                        DEFAULT -> x `quot#` y
             DEFAULT -> x `quot#` y

(I'm omitting the unboxing etc.)

Now if we have the call (p `quot` 2), and we inline quot we get:

  case p of
    -9223372036854775808 -> case 2 of 
                              -1 -> overflowError
                              DEFAULT -> p `quot#` 2
    DEFAULT -> p `quot#` 2

The inner case simplifies, leaving

  case p of
    -9223372036854775808 -> p `quot#` 2
    DEFAULT              -> p `quot#` 2

At this point I thought we'd just have a case with two identical branches, which is eliminated (assuming it doesn't affect strictness, which it doesn't in this case), leaving the desired (p `quot#` 1).

BUT in the minBound branch we know what p is, and GHC cleverly does the division at compile time, giving

  case p of
    -9223372036854775808 -> -4611686018427387904
    DEFAULT              -> p `quot#` 2

Darn! Now the branches aren't identical!

By switching the order of tests you've stopped this happening. But it'll happen instead if you have a constant numerator. The term (2 `quot` q) will turn into

case q of
  -1      -> 0
  DEFAULT -> 2 `quot#` q

Constant denominators are more common than constant numerators, so I think your patch is a good one -- Ian can you apply please? But also can you add this Trac comment into the code with Note [Ordering of tests] or something, and point to it from the test sites?

I wish I could see a good way to make GHC generate the best code regardless of test ordering. I can think of some un-satisfying solutions:

  • Delay constant folding until a later phase
  • Delay case-of-case until a later phase

but neither are great. Phase ordering is very delicate and non-modular.

comment:3 Changed 8 years ago by igloo

Owner: set to igloo

comment:4 Changed 8 years ago by igloo

Milestone: 7.2.1

comment:5 Changed 8 years ago by igloo

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.