Opened 4 years ago

Closed 3 years ago

#11701 closed bug (fixed)

ghc generates significant slower code

Reported by: HuStmpHrrr Owned by:
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.10.3
Keywords: efficiency Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D1997
Wiki Page:

Description

I've already started a discussion here in SO: http://stackoverflow.com/questions/35941674/latest-ghc-generates-slower-code

So again, I realized that the latest version of ghc produces significantly slower code than older version. my default ghc is the latest version as of now:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.10.3

I have also two other old versions installed in my local machine.

test code as following

import Data.Word
import Data.List
import System.Environment

collatzNext :: Word32 -> Word32
collatzNext a = (if even a then a else 3*a+1) `div` 2

-- new code
collatzLen :: Word32 -> Int
collatzLen a0 = lenIterWhile collatzNext (/= 1) a0

lenIterWhile :: (a -> a) -> (a -> Bool) -> a -> Int
lenIterWhile next notDone start = len start 0 where
    len n m = if notDone n
                then len (next n) (m+1)
                else m
-- End of new code

main = do
    [a0] <- getArgs
    let max_a0 = (read a0)::Word32
    print $ maximum $ map (\a0 -> (collatzLen a0, a0)) [1..max_a0]

following are the three runs in my machine:

$ ~/Tools/ghc-7.6.1/bin/ghc -O2 Test.hs 
[1 of 1] Compiling Main             ( Test.hs, Test.o )
Linking Test ...

$ time ./Test 1000000
(329,837799)

real    0m1.901s
user    0m1.896s
sys     0m0.000s

$ ~/Tools/ghc/bin/ghc -O2 Test.hs 
[1 of 1] Compiling Main             ( Test.hs, Test.o )
Linking Test ...

$ time ./Test 1000000
(329,837799)

real    0m10.562s
user    0m10.528s
sys     0m0.036s

$ ~/Tools/ghc-7.4.2/bin/ghc -O2 Test.hs 
[1 of 1] Compiling Main             ( Test.hs, Test.o )
Linking Test ...

$ time ./Test 1000000
(329,837799)

real    0m1.879s
user    0m1.876s
sys     0m0.000s

Obviously we can tell latest version of ghc produces worse code than the older two versions.

Change History (8)

comment:1 Changed 4 years ago by HuStmpHrrr

according to the investigation of Zeta in SO, it's because even is not inlined.

comment:2 Changed 4 years ago by bgamari

Wow, indeed 7.10.3 chooses not to inline even. I've not looked into why but this is quite surprising. Inlining even produces a significant speedup (roughly a factor of five) as one would expect.

comment:3 Changed 4 years ago by bgamari

Ahh, the problem appears to be that GHC.Real.even and odd are not marked as INLINE but rather are merely specialized to Integer and Int, whereas you use it at Word32. We could specialize these to these other types but it seems to me like they are cheap enough operations to be worth inlining.

Last edited 4 years ago by bgamari (previous) (diff)

comment:4 Changed 4 years ago by bgamari

Differential Rev(s): Phab:D1997
Milestone: 8.0.1

I've opened Phab:D1997 to address this particular issue although I suspect we generally need to audit our use of inlining vs. specialization.

comment:5 Changed 4 years ago by bgamari

Status: newpatch

comment:6 Changed 4 years ago by simonpj

A factor of 5 is an impressively big loss from failing to inline even!

I think we should investigate why it isn't being inlined automatically. I tried a tiny case:

module Foo where
  even       :: (Integral a) => a -> Bool
  even n          =  n `rem` 2 == 0


module Bar where
  import Foo
  f :: Int -> Bool
  f x = Foo.even x

Sure enough, even is not inlined. With -dverbose-core2core -ddump-inlinings we get

Considering inlining: even
  arg infos [ValueArg, TrivArg]
  interesting continuation BoringCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [60 0] 240 0
  discounted size = 120
  ANSWER = NO

So we are only getting a tiny discount from the fact that we are giving a completely fixed dictionary to even; even though even's body is dominated by dictionary selections that would disappear if we inlined even.

So maybe we should look at our discounting scheme. See CoreUnfold.classOpSize and ufDictDiscount in particular.

But there's a bit more to it than that. Here's the unfolding for even:

     Unfolding: (\ @ a ($dIntegral :: Integral a) (eta :: a) ->
                 let {
                   $dReal :: Real a = $p1Integral @ a $dIntegral
                 } in
                 let {
                   $dNum :: Num a = $p1Real @ a $dReal
                 } in
                 ==
                   @ a
                   ($p1Ord @ a ($p2Real @ a $dReal))
                   (rem @ a $dIntegral eta (fromInteger @ a $dNum even2))
                   (fromInteger @ a $dNum even1)) -}

The original argument is $dIntegral and only two of the seven dictionary-selection operations are applied to that argument; and only they attract discounts.

Food for thought here. This function obviously should be inlined (when applied to a particular fixed dictionary) but it's not quite clear how to make that happen.

Making it INLINEABLE, as the patch does, makes it specialisable which is good. Then it becomes small, and then it gets inlined. That path works quite well. Maybe all overloaded functions, perhaps up to some size limit, should automatically be INLINEABLE.

Food for thought

Simon

comment:7 Changed 4 years ago by Ben Gamari <ben@…>

In 2841ccab/ghc:

Mark GHC.Real.even and odd as INLINEABLE

Previously they were merely specialised at Int and Integer. It seems to
me that these are cheap enough to be worth inlining. See #11701 for
motivation.

Test Plan: Validate

Reviewers: austin, hvr, simonpj

Reviewed By: simonpj

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1997

GHC Trac Issues: #11701

comment:8 Changed 3 years ago by bgamari

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.