Opened 2 years ago

Last modified 10 months ago

#14062 infoneeded bug

Pure syntax transformation affects performance.

Reported by: danilo2 Owned by:
Priority: high Milestone:
Component: Compiler Version: 8.0.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by bgamari)

Hi! Let's consider the following code (compiled with -O2, mtl and criterion needed):

{-# LANGUAGE FlexibleContexts #-}

module Main where

import Prelude
import Criterion.Main
import Control.Monad.State.Strict
import Control.Monad.Identity

repeatM :: Monad m => m a -> Int -> m ()
repeatM f = go where
    go 0 = pure ()
    go i = f >> go (i - 1)
{-# INLINE repeatM #-}

incState :: MonadState Int m => m ()
incState = modify' (1+) ; {-# INLINE incState #-}

test1, test2 :: Int -> Int
test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a >> get)) n ; {-# INLINE test1 #-}
test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get ; {-# INLINE test2 #-}

main :: IO ()
main = do
    defaultMain
        [ bgroup "monad transformers overhead"
            [ bench "test1"     $ nf test1 100000000
            , bench "test2"     $ nf test2 100000000
            ]
        ]

Functions test1 and test2 differ only syntactically and this difference should not affect GHC's inliner, because their implementations use fully saturated calls. The generated core for test1 and test2 is practically identical (there is an additional alias created for test1: test1 = lvl1_rhor 'cast' ...).

The problem is that test1 runs 3 times faster than test2.

As a side note - if we add more state transformers to test1, it optimizes them all away, while test2 runs slower with each new transformer applied.

Change History (6)

comment:1 Changed 2 years ago by danilo2

Description: modified (diff)

comment:2 Changed 2 years ago by simonpj

I can't reproduce this. I didn't use Criterion; I just ran the programs. Both produced the same code with ghc-8.0 and HEAD, and both ran equally fast.

Interestingly, HEAD ran them 25% faster than 8.0. The Core looks the same, so I don't know where that improvement comes from, but I'm sure it'll make you happy.

Can anyone else reproduce this?

comment:3 Changed 2 years ago by bgamari

Description: modified (diff)

I also can't reproduce this result with 8.2.1,

benchmarking monad transformers overhead/test1
time                 412.5 ms   (378.3 ms .. 443.0 ms)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 425.7 ms   (418.6 ms .. 430.2 ms)
std dev              6.749 ms   (0.0 s .. 7.751 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking monad transformers overhead/test2
time                 373.2 ms   (341.6 ms .. 532.9 ms)
                     0.970 R²   (NaN R² .. 1.000 R²)
mean                 372.6 ms   (351.4 ms .. 392.8 ms)
std dev              34.07 ms   (0.0 s .. 34.92 ms)
variance introduced by outliers: 22% (moderately inflated)

comment:4 Changed 18 months ago by simonpj

danilo2, can you still reproduce this with 8.4?

comment:5 Changed 10 months ago by George

I can't reproduce with 8.6.1 either:

benchmarking monad transformers overhead/test1
time                 94.84 ms   (91.89 ms .. 97.81 ms)
                     0.998 R²   (0.993 R² .. 1.000 R²)
mean                 97.52 ms   (95.91 ms .. 100.6 ms)
std dev              3.344 ms   (1.380 ms .. 4.705 ms)

benchmarking monad transformers overhead/test2
time                 95.69 ms   (94.49 ms .. 98.26 ms)
                     0.999 R²   (0.996 R² .. 1.000 R²)
mean                 95.81 ms   (95.24 ms .. 97.28 ms)
std dev              1.375 ms   (323.2 μs .. 2.151 ms)

comment:6 Changed 10 months ago by George

Status: newinfoneeded
Type of failure: None/UnknownRuntime performance bug
Note: See TracTickets for help on using tickets.