Opened 16 months ago

Last modified 7 months ago

#15185 patch bug

Enum instance for IntX / WordX are inefficient

Reported by: guibou Owned by:
Priority: normal Milestone: 8.10.1
Component: Core Libraries Version: 8.4.2
Keywords: Newcomer Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D4820
Wiki Page:

Description

The following factorial benchmark show high discrepancy in performance between Int and Int64 (and similarly between Word and Word64) when using product [1..n] to compute factorial:

{-# LANGUAGE TypeApplications #-}
import Criterion.Main

import Data.Int
import Data.Word
import Numeric.Natural

fact :: Integral t => t -> t
fact n = product [1..n]

fact2 :: Integral t => t -> t
fact2 n = go 1 n
  where go acc 1 = acc
        go acc n = go (acc * n) (n - 1)

main :: IO ()
main = defaultMain [
  bgroup "fact 20" [
     bench "Int"  $ whnf (fact @Int) 20
     , bench "Word"  $ whnf (fact @Word) 20
     , bench "Int64"  $ whnf (fact @Int64) 20
     , bench "Word64"  $ whnf (fact @Word64) 20
     , bench "Integer"  $ whnf (fact @Integer) 20
     , bench "Natural"  $ whnf (fact @Natural) 20
               ]
  ,
  bgroup "fact2 20" [
     bench "Int"  $ whnf (fact @Int) 20
     , bench "Word"  $ whnf (fact @Word) 20
     , bench "Int64"  $ whnf (fact @Int64) 20
     , bench "Word64"  $ whnf (fact @Word64) 20
     , bench "Integer"  $ whnf (fact @Integer) 20
     , bench "Natural"  $ whnf (fact @Natural) 20
               ]
  ]
benchmarking fact 20/Int
time                 17.33 ns   (17.22 ns .. 17.41 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 17.21 ns   (17.16 ns .. 17.29 ns)
std dev              222.4 ps   (184.5 ps .. 282.3 ps)
variance introduced by outliers: 15% (moderately inflated)

benchmarking fact 20/Word
time                 17.25 ns   (17.01 ns .. 17.63 ns)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 17.19 ns   (17.11 ns .. 17.36 ns)
std dev              358.6 ps   (199.4 ps .. 674.3 ps)
variance introduced by outliers: 32% (moderately inflated)

benchmarking fact 20/Int64
time                 166.7 ns   (165.4 ns .. 168.5 ns)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 166.8 ns   (166.0 ns .. 168.5 ns)
std dev              4.077 ns   (1.854 ns .. 6.592 ns)
variance introduced by outliers: 35% (moderately inflated)

benchmarking fact 20/Word64
time                 560.5 ns   (550.6 ns .. 575.0 ns)
                     0.995 R²   (0.989 R² .. 1.000 R²)
mean                 556.5 ns   (549.1 ns .. 575.9 ns)
std dev              36.08 ns   (9.195 ns .. 60.69 ns)
variance introduced by outliers: 78% (severely inflated)

benchmarking fact 20/Integer
time                 421.9 ns   (419.8 ns .. 425.3 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 427.7 ns   (425.5 ns .. 431.3 ns)
std dev              9.238 ns   (6.311 ns .. 16.19 ns)
variance introduced by outliers: 28% (moderately inflated)

benchmarking fact 20/Natural
time                 534.5 ns   (532.6 ns .. 536.3 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 533.7 ns   (532.6 ns .. 535.3 ns)
std dev              4.535 ns   (3.652 ns .. 6.226 ns)

benchmarking fact2 20/Int
time                 17.14 ns   (17.07 ns .. 17.23 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 17.15 ns   (17.10 ns .. 17.24 ns)
std dev              227.1 ps   (167.4 ps .. 286.9 ps)
variance introduced by outliers: 16% (moderately inflated)

benchmarking fact2 20/Word
time                 16.93 ns   (16.86 ns .. 17.02 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 16.90 ns   (16.86 ns .. 16.95 ns)
std dev              149.8 ps   (107.4 ps .. 192.5 ps)

benchmarking fact2 20/Int64
time                 165.1 ns   (164.6 ns .. 165.7 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 164.8 ns   (164.5 ns .. 165.1 ns)
std dev              1.033 ns   (778.6 ps .. 1.363 ns)

benchmarking fact2 20/Word64
time                 545.3 ns   (542.8 ns .. 547.9 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 543.6 ns   (542.4 ns .. 545.2 ns)
std dev              4.741 ns   (3.736 ns .. 5.824 ns)

benchmarking fact2 20/Integer
time                 419.1 ns   (417.2 ns .. 421.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 418.2 ns   (416.8 ns .. 420.2 ns)
std dev              5.415 ns   (4.094 ns .. 7.147 ns)
variance introduced by outliers: 12% (moderately inflated)

benchmarking fact2 20/Natural
time                 533.8 ns   (532.2 ns .. 536.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 533.2 ns   (532.1 ns .. 534.8 ns)
std dev              4.428 ns   (3.217 ns .. 5.741 ns)

In fact, Int and Word are efficient, Int64 and Word64 are respectively 10x and 20x slowers, when they should be identical on 64 bit platform. Word64 is even slower than Integer!

Replacing fact by a handwritten recursion, in fact2 gives something more coherent where all the 64 bits types have roughly the same efficiency.

I observed in the source code that Int and Int64 are not implemented as type synonym.

Using -ddump-rule-firings, I observed that a rule fold/build fired and removed the fold for the Int case, and not for the Int64, I don't understand much ;(

This is not dramatic because most people will use Int instead of Int64, but it is a surprising behavior. I also observed that smaller integral (such as Int8, Int16, Int32) behaves as badly as Int64) and this is much more important because there is no other default and more efficient type synonym.

Note that I'll be interested to provide a patch myself if mentored a bit. Especially, I'm thinking about merging the Int and Int64 implementation (with respect to the architecture default word size).

Tested on GHC 8.4.2

Change History (5)

comment:1 Changed 16 months ago by guibou

Type of failure: None/UnknownRuntime performance bug

comment:2 Changed 15 months ago by sighingnow

Differential Rev(s): Phab:D4820
Status: newpatch

comment:3 Changed 15 months ago by bgamari

Milestone: 8.6.18.8.1

These won't be addressed for GHC 8.6.

comment:4 Changed 10 months ago by bgamari

Milestone: 8.8.18.10.1

This won't happen for 8.8.

comment:5 Changed 7 months ago by AndreasK

Keywords: Newcomer added

The efficient implementation comes from libraries/base/GHC/Enum.hs where rules are defined.

Implementing this should just be a matter of copying this approach for the other Int/Word variants.

The correct place to put the rules for the Int/Word variants would be GHC.Int/GHC.Word in base.

Note: See TracTickets for help on using tickets.