Opened 4 years ago

Last modified 4 years ago

#10992 new bug

Performance regression due to lack of inlining of `foldl` and `foldl'`.

Reported by: aleator Owned by:
Priority: normal Milestone:
Component: Core Libraries Version: 7.10.2
Keywords: performane Cc: ekmett
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

For some reason Data.List.sum is much slower than the naive recursive definition for it. Does not seem happen in 7.8.

  import Data.List
  import Criterion.Main
 
  sum1 [] = 0
  sum1 (x:xs) = x + sum1 xs
 
  sum2 = foldr (+) 0
  sum3 = foldl' (+) 0
  sum4 = getSum #. foldMap Sum
 
 
  main = do
      let els = [1..100000]
      defaultMain [
       bench "sum0" $ whnf  (sum :: [Int] -> Int) els
       ,bench "sum" $ whnf  (sum :: [Int] -> Int) els
       ,bench "sum1" $ whnf (sum1:: [Int] -> Int) els
       ,bench "sum2" $ whnf (sum2:: [Int] -> Int) els
       ,bench "sum3" $ whnf (sum3:: [Int] -> Int) els
       ,bench "sum4" $ whnf (sum4:: [Int] -> Int) els
       ]
 
{-
 
benchmarking sum0
time                 128.0 ms   (121.7 ms .. 133.8 ms)
                     0.997 R²   (0.994 R² .. 1.000 R²)
mean                 132.8 ms   (130.1 ms .. 136.3 ms)
std dev              4.622 ms   (2.980 ms .. 6.577 ms)
variance introduced by outliers: 11% (moderately inflated)
 
benchmarking sum
time                 131.0 ms   (125.7 ms .. 136.7 ms)
                     0.998 R²   (0.993 R² .. 1.000 R²)
mean                 131.9 ms   (128.6 ms .. 136.7 ms)
std dev              5.971 ms   (2.903 ms .. 9.649 ms)
variance introduced by outliers: 11% (moderately inflated)
 
benchmarking sum1
time                 26.29 ms   (25.43 ms .. 27.24 ms)
                     0.995 R²   (0.991 R² .. 0.998 R²)
mean                 26.30 ms   (25.73 ms .. 26.98 ms)
std dev              1.293 ms   (953.2 μs .. 1.804 ms)
variance introduced by outliers: 15% (moderately inflated)
 
benchmarking sum2
time                 26.39 ms   (25.49 ms .. 27.26 ms)
                     0.995 R²   (0.991 R² .. 0.998 R²)
mean                 26.33 ms   (25.77 ms .. 26.95 ms)
std dev              1.295 ms   (941.5 μs .. 1.878 ms)
variance introduced by outliers: 15% (moderately inflated)
 
benchmarking sum3
time                 7.695 ms   (7.670 ms .. 7.722 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 7.703 ms   (7.691 ms .. 7.733 ms)
std dev              51.21 μs   (25.43 μs .. 99.41 μs)
 
benchmarking sum4
time                 26.29 ms   (25.40 ms .. 27.21 ms)
                     0.995 R²   (0.989 R² .. 0.998 R²)
mean                 26.36 ms   (25.88 ms .. 27.08 ms)
std dev              1.364 ms   (954.5 μs .. 1.899 ms)
variance introduced by outliers: 20% (moderately inflated)
  
  -}

Change History (4)

comment:1 Changed 4 years ago by nomeata

Summary: Performance regressionPerformance regression due to lack of inlining of `foldl` and `foldl'`.

I looked into this, and the cause is clear: In 7.10, foldl (+) 0 (which is how GHC.List.sum is defined) resp. foldl' (+) 0 are not inlined here. If they were, GHC would be able to transform both to the ideal code corresponding to an explicit recursion with accumulator.

If I use these definitions:

sum0 xs = sum xs
sum3 xs = foldl' (+) 0 xs
sum5 xs = foldl (+) 0 xs

then I get good code in all cases.

The reason is that foldl is set up so that it inlines once the third argument is present:

{-# INLINE foldl #-}
foldl k z0 xs = ...

{-# INLINE foldl' #-}
foldl' k z0 xs =

This makes sense when trying to achieve list fusion, but even with two arguments (or maybe even one), inlining would be useful for the strictness analyzer to fire.

I’m however not sure how relevant unsaturated calls to foldl are in practice.

comment:2 in reply to:  1 Changed 4 years ago by thomie

Cc: ekmett added
Component: CompilerCore Libraries
Type of failure: None/UnknownRuntime performance bug

Replying to nomeata:

If I use...:

sum0 xs = sum xs

then I get good code...

Tricky.

comment:3 Changed 4 years ago by nomeata

I should add that I’m inclined to suggest to change the definitions to

{-# INLINE foldl #-}
foldl k z0 = \xs -> ...

{-# INLINE foldl' #-}
foldl' k z0 = \xs -> ...

which should fix the problem at hand, but I’d like to have at least a second opinion on this by someone.

comment:4 Changed 4 years ago by simonpj

comment:3 sounds plausible to me.

Needs a Note to explain! (And a nofib run.)

Note: See TracTickets for help on using tickets.