Opened 12 months ago

Last modified 8 months ago

#15717 new bug

Performance regression in for_ alternatives from GHC 8.2.2 to newer GHCs

Reported by: nh2 Owned by:
Priority: normal Milestone: 8.10.1
Component: Compiler Version: 8.6.1
Keywords: Simplifier Cc: nh2, simonpj
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

We are investigating various ways to make for_ and traverse_ not leak space by changing the type signature from (a -> f b) -> t a -> f () to (a -> f ()) -> t a -> f ().

While doing so, we noticed a regression from GHC 8.2.2 to 8.4.3 and 8.6.1.

The code:

https://gist.github.com/nh2/b8f9f8e60443bdb30c1cd7e0acb8c8eb/bb1cc1a4987091fcd41c07b0a6f0512f96a992ae

Run against 3 different GHC releases:

# 8.2.2
stack --resolver lts-11.22 ghc -- --make -O2 -rtsopts ./TraverseMaybePerformance.hs && /usr/bin/time ./TraverseMaybePerformance 8 +RTS -sstderr

  460,309,368 bytes allocated in the heap

# 8.4.3
stack --resolver lts-12.11 ghc -- --make -O2 -rtsopts ./TraverseMaybePerformance.hs && /usr/bin/time ./TraverseMaybePerformance 8 +RTS -sstderr

  860,301,736 bytes allocated in the heap

# 8.6.1
stack --resolver nightly-2018-10-06 ghc -- --make -O2 -rtsopts ./TraverseMaybePerformance.hs && /usr/bin/time ./TraverseMaybePerformance 8 +RTS -sstderr

  860,301,784 bytes allocated in the heap

Allocations doubled starting with 8.4.

All was run on Ubuntu 16.04 64-bit.

We haven't investigated in detail yet (also whether it's a GHC or libraries problem) since we're actually trying to do something else and this came out on the side, but it looks important enough to share already.

Change History (8)

comment:1 Changed 11 months ago by RyanGlScott

Cc: simonpj added

This regression was introduced in commit 71037b61597d8e80ba5acebc8ad2295e5266dc07 (Join-point refactoring).

comment:2 Changed 9 months ago by nh2

Milestone: 8.8.1

Is this something that could be tackled for 8.8?

Some of the real-world applications we'd really like to upgrade past 8.2 appear to get slower from this.

(Tentatively seting milestone)

comment:3 Changed 9 months ago by simonpj

Doubling sounds bad.

There are 9 tests in the repro case. I've lost track of what is what. If someone was able to distil out a simple before-and-after on one test, that would be helpful. Better still, get some insight into what is happening. Thanks!

Last edited 9 months ago by simonpj (previous) (diff)

comment:4 in reply to:  3 Changed 9 months ago by nh2

Replying to simonpj:

There are 9 tests in the repro case. I've lost track of what is what. If someone was able to distil out a simple before-and-after on one test

This part I can answer immediately: I use only ./TraverseMaybePerformance 8 in the issue description, so that's test8 form the repro.

But yes, a smaller test case will make things easier. Here it is:

#!/usr/bin/env stack
-- stack --resolver lts-11.22 script --optimize
-- The above one is fast. Slow is:
-- stack --resolver lts-12.11 script --optimize

{-# OPTIONS_GHC -Wall #-}

import           Control.Concurrent (yield)
import           Control.Monad (when)
import qualified Data.Text.Lazy as T
import qualified Data.Foldable as F

myfor_ :: (Foldable f, Applicative app) => f a -> (a -> app ()) -> app ()
myfor_ t f =
  case F.toList t of
    [] -> pure ()
    x -> go x
  where
    go [x] = f x
    go (x:xs) = f x *> go xs

printChars_myfor_ :: Int -> T.Text -> IO ()
printChars_myfor_ idx t = myfor_ (T.uncons t) $ \(c, t') -> do
  when (idx `mod` 100000 == 0) $ do
    -- Using putStrLn, I observe 2x more allocations with GHC 8.4.3 vs 8.2.2 (860 vs 460 M)
    --putStrLn $ "Character #" ++ show idx ++ ": " ++ show c
    -- Using yield,    I observe 4x more allocations with GHC 8.4.3 vs 8.2.2 (860 vs 220 M)
    yield -- the putStrLn isn't necessary, this is enough to trigger the regression
  printChars_myfor_ (idx + 1) t'

main :: IO ()
main = printChars_myfor_ 1 $ T.replicate 5000000 $ T.singleton 'x'

I even managed to make the regression 4x worse using yield.

comment:5 Changed 9 months ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of low-priority tickets.

comment:6 Changed 8 months ago by simonpj

Keywords: Simplifier added
Note: See TracTickets for help on using tickets.