Opened 3 years ago

Closed 18 months ago

#12790 closed bug (fixed)

GHC 8.0.1 uses copious amounts of RAM and time when trying to compile lambdabot-haskell-plugins

Reported by: clint Owned by:
Priority: normal Milestone: 8.4.1
Component: Compiler Version: 8.0.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D4412
Wiki Page:

Description (last modified by clint)

When building lambdabot-haskell-plugins, Lambdabot.Plugin.Haskell.Pl.Parser compiles without incident on the non-profiling run, but cripples the build machine on the profiling run. An excerpt of -v3 output is below to show the number of terms skyrocketing.

*** Common sub-expression [Lambdabot.Plugin.Haskell.Pl.Parser]:
Result size of Common sub-expression
  = {terms: 39,049, types: 30,851, coercions: 990}
!!! Common sub-expression [Lambdabot.Plugin.Haskell.Pl.Parser]: finished in 72.00 milliseconds, allocated 44.536 megabytes
*** Float inwards [Lambdabot.Plugin.Haskell.Pl.Parser]:
Result size of Float inwards
  = {terms: 39,049, types: 30,851, coercions: 990}
!!! Float inwards [Lambdabot.Plugin.Haskell.Pl.Parser]: finished in 68.00 milliseconds, allocated 60.337 megabytes
*** Simplifier [Lambdabot.Plugin.Haskell.Pl.Parser]:
Result size of Simplifier iteration=1
  = {terms: 977,045, types: 299,334, coercions: 1,058}
Result size of Simplifier iteration=2
  = {terms: 926,036, types: 263,306, coercions: 932}
Result size of Simplifier
  = {terms: 922,724, types: 262,202, coercions: 932}
!!! Simplifier [Lambdabot.Plugin.Haskell.Pl.Parser]: finished in 15480.00 milliseconds, allocated 9384.823 megabytes
*** CoreTidy [Lambdabot.Plugin.Haskell.Pl.Parser]:
Result size of Tidy Core
  = {terms: 922,623, types: 261,789, coercions: 865}
!!! CoreTidy [Lambdabot.Plugin.Haskell.Pl.Parser]: finished in 2060.00 milliseconds, allocated 613.585 megabytes
writeBinIface: 95 Names
writeBinIface: 259 dict entries
*** CorePrep [Lambdabot.Plugin.Haskell.Pl.Parser]:
Result size of CorePrep
  = {terms: 1,128,145, types: 347,063, coercions: 865}
!!! CorePrep [Lambdabot.Plugin.Haskell.Pl.Parser]: finished in 3468.00 milliseconds, allocated 1718.453 megabytes
*** Stg2Stg:
*** CodeGen [Lambdabot.Plugin.Haskell.Pl.Parser]:

Change History (20)

comment:1 Changed 3 years ago by clint

Description: modified (diff)

comment:2 Changed 3 years ago by rwbarton

I wasn't able to reproduce this with a cabal install -w ghc-8.0.1 lambdabot-haskell-plugins.

comment:3 in reply to:  2 Changed 3 years ago by nomeata

Replying to rwbarton:

I wasn't able to reproduce this with a cabal install -w ghc-8.0.1 lambdabot-haskell-plugins.

Have you tried cabal install -w ghc-8.0.1 lambdabot-haskell-plugins --enable-library-profiling?

comment:4 Changed 3 years ago by rwbarton

Oops, I misread a comment in IRC :/

It's sufficient to just try to build the module Lambdabot.Plugin.Haskell.Pl.Parser along with its dependency Lambdabot.Plugin.Haskell.Pl.Common (with -O -prof). Only package dependencies are mtl, text and parsec.

GHC 7.10.1 works okay.

comment:5 Changed 3 years ago by simonpj

I fell at the first fence:

cabal install --with-ghc=/home/simonpj/5builds/HEAD-4/inplace/bin/ghc-stage2 --enable-library-profiling
Resolving dependencies...
cabal: Could not resolve dependencies:
trying: hint-0.6.0 (dependency of mueval-0.9.3)
next goal: ghc-boot-th (dependency of template-haskell-2.11.0.0)
rejecting: ghc-boot-th-8.1/installed-8.1 (conflict: template-haskell =>
ghc-boot-th==8.0.*)
trying: ghc-boot-th-8.0.1
next goal: ghc (dependency of hint-0.6.0)
rejecting: ghc-8.1/installed-8.1 (conflict: ghc-boot-th==8.0.1, ghc =>
ghc-boot-th==8.1/installed-8.1)
Backjump limit reached (currently 2000, change with --max-backjumps or try to
run with --reorder-goals).

comment:6 Changed 3 years ago by RyanGlScott

Simon, can you run cabal update and try again? The template-haskell upper version bound for a downstream dependency (exceptions) of lambdabot-haskell-plugins was too restrictive, and I've since raised it to accommodate GHC HEAD.

comment:7 Changed 3 years ago by RyanGlScott

I've managed to reduce it down to something which just requires parsec:

module Lambdabot.Plugin.Haskell.Pl.Parser (list) where

import Data.Foldable (asum)
import Text.ParserCombinators.Parsec (Parser, sepBy, try)

data Expr
  = Var Fixity String
  | App Expr Expr

data Fixity = Pref | Inf

cons, nil :: Expr
cons = Var Inf  ":"
nil  = Var Pref "[]"

brackets :: Parser a -> Parser a
brackets = undefined

symbol :: String -> Parser String
symbol = undefined

list :: Parser Expr
list = asum (map (try . brackets) plist) where
  plist = [
    foldr (\e1 e2 -> cons `App` e1 `App` e2) nil `fmap`
      (myParser False `sepBy` symbol ","),
    do e <- myParser False
       _ <- symbol ".."
       return $ Var Pref "enumFrom" `App` e,
    do e  <- myParser False
       _  <- symbol ","
       e' <- myParser False
       _  <- symbol ".."
       return $ Var Pref "enumFromThen" `App` e `App` e',
    do e  <- myParser False
       _  <- symbol ".."
       e' <- myParser False
       return $ Var Pref "enumFromTo" `App` e `App` e',
    do e   <- myParser False
       _   <- symbol ","
       e'  <- myParser False
       _   <- symbol ".."
       e'' <- myParser False
       return $ Var Pref "enumFromThenTo" `App` e `App` e' `App` e''
    ]

myParser :: Bool -> Parser Expr
myParser = undefined

plist appears to be the culprit. It seems to have some sort of quadratic slowdown whenever new elements are added to plist. For example, commenting out the last element of plist makes it compile within a reasonable amount of time (but not instantly).

I think the Alternative instance for Parser might have something to do with it, too. Notably, if I comment out the import of asum and redefine it locally as:

asum :: [Parser a] -> Parser a
asum = undefined

Then it compiles instantly.

Last edited 3 years ago by RyanGlScott (previous) (diff)

comment:8 Changed 3 years ago by simonpj

That's very helpful. Alas, parsec-3.1.11 doesn't compile with HEAD. I get

Text\Parsec\Token.hs:524:27: error:
        Could not deduce (Stream s m t0) arising from a use of    option   
      from the context: Stream s m Char
        bound by the type signature for:
                   makeTokenParser :: Stream s m Char =>
                                      GenLanguageDef s u m -> GenTokenParser s u m
        at Text\Parsec\Token.hs:(351,1)-(352,63)
      The type variable    t0    is ambiguous

I have not investigated what the problem is. Does anyone know?

comment:9 Changed 3 years ago by RyanGlScott

Simon, I've opened #12936 to figure out that parsec issue.

Until then, you can work around it by checking out a local copy of parsec-3.1.11 with cabal get parsec-3.1.11, edit parsec-3.1.11/Text/Parsec/Token.hs, and change the definition to makeTokenParser = undefined. I don't believe its definition is used in this example anyways.

Moreover, after making that change, I can confirm that the example in https://ghc.haskell.org/trac/ghc/ticket/12790#comment:7 also exhibits the same slowdown in GHC HEAD (compiling it with -O -prof, of course).

comment:10 Changed 3 years ago by mpickering

Can the example be reduced further by inlining the relevant parts of parsec?

comment:11 Changed 3 years ago by RyanGlScott

Weirdly enough, I tried inlining the relevant parsec bits (which isn't an easy task, by the way—there's a surprising amount of things you have to bring in!). But after inlining them, I couldn't reproduce the issue anymore!

If you don't believe me, here's a "reduced" example that you can try for yourself:

-- Parsec.hs
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE RankNTypes #-}
module Parsec (Parser, sepBy, try) where

import Control.Applicative (Alternative(empty, (<|>)))
import Control.Monad (MonadPlus(..), ap)
import Data.Functor.Identity (Identity)

------------------------
-- Copied from parsec --
------------------------

type Parser = Parsec String ()
type Parsec s u = ParsecT s u Identity
newtype ParsecT s u m a
    = ParsecT {unParser :: forall b .
                 State s u
              -> (a -> State s u -> ParseError -> m b) -- consumed ok
              -> (ParseError -> m b)                   -- consumed err
              -> (a -> State s u -> ParseError -> m b) -- empty ok
              -> (ParseError -> m b)                   -- empty err
              -> m b
             }

data State s u = State {
      stateInput :: s,
      statePos   :: !SourcePos,
      stateUser  :: !u
    }
data Message = SysUnExpect !String -- @ library generated unexpect
             | UnExpect    !String -- @ unexpected something
             | Expect      !String -- @ expecting something
             | Message     !String -- @ raw message
data ParseError = ParseError !SourcePos [Message]
data SourcePos  = SourcePos SourceName !Line !Column
  deriving (Eq, Ord)

type SourceName = String
type Line       = Int
type Column     = Int

instance Functor (ParsecT s u m) where
    fmap f p = parsecMap f p

parsecMap :: (a -> b) -> ParsecT s u m a -> ParsecT s u m b
parsecMap f p
    = ParsecT $ \s cok cerr eok eerr ->
      unParser p s (cok . f) cerr (eok . f) eerr

instance Applicative (ParsecT s u m) where
    pure = return
    (<*>) = ap -- TODO: Can this be optimized?

instance Alternative (ParsecT s u m) where
    empty = mzero
    (<|>) = mplus

instance Monad (ParsecT s u m) where
    return x = parserReturn x
    p >>= f  = parserBind p f
    fail msg = parserFail msg

parserReturn :: a -> ParsecT s u m a
parserReturn x
    = ParsecT $ \s _ _ eok _ ->
      eok x s (unknownError s)

parserBind :: ParsecT s u m a -> (a -> ParsecT s u m b) -> ParsecT s u m b
{-# INLINE parserBind #-}
parserBind m k
  = ParsecT $ \s cok cerr eok eerr ->
    let
        -- consumed-okay case for m
        mcok x s err =
            let
                 -- if (k x) consumes, those go straigt up
                 pcok = cok
                 pcerr = cerr

                 -- if (k x) doesn't consume input, but is okay,
                 -- we still return in the consumed continuation
                 peok x s err' = cok x s (mergeError err err')

                 -- if (k x) doesn't consume input, but errors,
                 -- we return the error in the 'consumed-error'
                 -- continuation
                 peerr err' = cerr (mergeError err err')
            in  unParser (k x) s pcok pcerr peok peerr

        -- empty-ok case for m
        meok x s err =
            let
                -- in these cases, (k x) can return as empty
                pcok = cok
                peok x s err' = eok x s (mergeError err err')
                pcerr = cerr
                peerr err' = eerr (mergeError err err')
            in  unParser (k x) s pcok pcerr peok peerr
        -- consumed-error case for m
        mcerr = cerr

        -- empty-error case for m
        meerr = eerr

    in unParser m s mcok mcerr meok meerr

parserFail :: String -> ParsecT s u m a
parserFail msg
    = ParsecT $ \s _ _ _ eerr ->
      eerr $ newErrorMessage (Message msg) (statePos s)

instance MonadPlus (ParsecT s u m) where
    mzero = parserZero
    mplus p1 p2 = parserPlus p1 p2

parserZero :: ParsecT s u m a
parserZero
    = ParsecT $ \s _ _ _ eerr ->
      eerr $ unknownError s

parserPlus :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
{-# INLINE parserPlus #-}
parserPlus m n
    = ParsecT $ \s cok cerr eok eerr ->
      let
          meerr err =
              let
                  neok y s' err' = eok y s' (mergeError err err')
                  neerr err' = eerr $ mergeError err err'
              in unParser n s cok cerr neok neerr
      in unParser m s cok cerr eok meerr

newErrorUnknown :: SourcePos -> ParseError
newErrorUnknown pos
    = ParseError pos []

unknownError :: State s u -> ParseError
unknownError state        = newErrorUnknown (statePos state)

newErrorMessage :: Message -> SourcePos -> ParseError
newErrorMessage msg pos
    = ParseError pos [msg]

mergeError :: ParseError -> ParseError -> ParseError
mergeError e1@(ParseError pos1 msgs1) e2@(ParseError pos2 msgs2)
    -- prefer meaningful errors
    | null msgs2 && not (null msgs1) = e1
    | null msgs1 && not (null msgs2) = e2
    | otherwise
    = case pos1 `compare` pos2 of
        -- select the longest match
        EQ -> ParseError pos1 (msgs1 ++ msgs2)
        GT -> e1
        LT -> e2

try :: ParsecT s u m a -> ParsecT s u m a
try p =
    ParsecT $ \s cok _ eok eerr ->
    unParser p s cok eerr eok eerr

class (Monad m) => Stream s m t | s -> t where
    uncons :: s -> m (Maybe (t,s))

instance (Monad m) => Stream [tok] m tok where
    uncons []     = return $ Nothing
    uncons (t:ts) = return $ Just (t,ts)
    {-# INLINE uncons #-}

-- | @sepBy p sep@ parses /zero/ or more occurrences of @p@, separated
-- by @sep@. Returns a list of values returned by @p@.
--
-- >  commaSep p  = p `sepBy` (symbol ",")

sepBy :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
sepBy p sep         = sepBy1 p sep <|> return []

-- | @sepBy1 p sep@ parses /one/ or more occurrences of @p@, separated
-- by @sep@. Returns a list of values returned by @p@.

sepBy1 :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
sepBy1 p sep        = do{ x <- p
                        ; xs <- many (sep >> p)
                        ; return (x:xs)
                        }

many :: ParsecT s u m a -> ParsecT s u m [a]
many p
  = do xs <- manyAccum (:) p
       return (reverse xs)

manyAccum :: (a -> [a] -> [a])
          -> ParsecT s u m a
          -> ParsecT s u m [a]
manyAccum acc p =
    ParsecT $ \s cok cerr eok eerr ->
    let walk xs x s' err =
            unParser p s'
              (seq xs $ walk $ acc x xs)  -- consumed-ok
              cerr                        -- consumed-err
              manyErr                     -- empty-ok
              (\e -> cok (acc x xs) s' e) -- empty-err
    in unParser p s (walk []) cerr manyErr (\e -> eok [] s e)

manyErr = error "Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string."
module Lambdabot.Plugin.Haskell.Pl.Parser (list) where

import Data.Foldable (asum)
import Parsec (Parser, sepBy, try)

data Expr
  = Var Fixity String
  | App Expr Expr

data Fixity = Pref | Inf

cons, nil :: Expr
cons = Var Inf  ":"
nil  = Var Pref "[]"

brackets :: Parser a -> Parser a
brackets = undefined

symbol :: String -> Parser String
symbol = undefined

list :: Parser Expr
list = asum (map (try . brackets) plist) where
  plist = [
    foldr (\e1 e2 -> cons `App` e1 `App` e2) nil `fmap`
      (myParser False `sepBy` symbol ","),
    do e <- myParser False
       _ <- symbol ".."
       return $ Var Pref "enumFrom" `App` e,
    do e  <- myParser False
       _  <- symbol ","
       e' <- myParser False
       _  <- symbol ".."
       return $ Var Pref "enumFromThen" `App` e `App` e',
    do e  <- myParser False
       _  <- symbol ".."
       e' <- myParser False
       return $ Var Pref "enumFromTo" `App` e `App` e',
    do e   <- myParser False
       _   <- symbol ","
       e'  <- myParser False
       _   <- symbol ".."
       e'' <- myParser False
       return $ Var Pref "enumFromThenTo" `App` e `App` e' `App` e''
    ]

myParser :: Bool -> Parser Expr
myParser = undefined
Last edited 3 years ago by RyanGlScott (previous) (diff)

comment:12 Changed 3 years ago by mpickering

If I turn off the worker wrapper transformation then compilation is much faster. The simpl-stats file shows that a lot of time is being spent inlining ww functions.

comment:13 Changed 3 years ago by RyanGlScott

Thankfully, Simon has fixed #12936, so now we can compile parsec on GHC HEAD without issue.

Another interesting tidbit: when compiling with -O -prof, it exhibits slowdown, but with -O -prof -fprof-auto, it doesn't!

comment:14 Changed 3 years ago by RyanGlScott

There's quite a difference between the generated Core for list depending on whether -fprof-auto is on or not. With -fprof-auto, we have:

-- RHS size: {terms: 2, types: 0, coercions: 0}
list :: Parser Expr
list =
  Lambdabot.Plugin.Haskell.Pl.Parser.list_go1
    Lambdabot.Plugin.Haskell.Pl.Parser.list1

-- RHS size: {terms: 3, types: 11, coercions: 10}
Lambdabot.Plugin.Haskell.Pl.Parser.list1
  :: [Text.Parsec.Prim.ParsecT
        [Char] () Data.Functor.Identity.Identity Expr]

But without -fprof-auto, we have:

-- RHS size: {terms: 1, types: 0, coercions: 7}
list :: Parser Expr
list =
  Lambdabot.Plugin.Haskell.Pl.Parser.list1
  `cast` (Sym
            (Text.Parsec.Prim.N:ParsecT[0]
               <[Char]>_R <()>_R <Data.Functor.Identity.Identity>_R <Expr>_R)
          :: ((forall b.
               Text.Parsec.Prim.State [Char] ()
               -> (Expr
                   -> Text.Parsec.Prim.State [Char] ()
                   -> Text.Parsec.Error.ParseError
                   -> Data.Functor.Identity.Identity b)
               -> (Text.Parsec.Error.ParseError
                   -> Data.Functor.Identity.Identity b)
               -> (Expr
                   -> Text.Parsec.Prim.State [Char] ()
                   -> Text.Parsec.Error.ParseError
                   -> Data.Functor.Identity.Identity b)
               -> (Text.Parsec.Error.ParseError
                   -> Data.Functor.Identity.Identity b)
               -> Data.Functor.Identity.Identity b) :: *)
             ~R#
             (Text.Parsec.Prim.ParsecT
                [Char] () Data.Functor.Identity.Identity Expr :: *))

-- RHS size: {terms: 910,139, types: 246,618, coercions: 0}
Lambdabot.Plugin.Haskell.Pl.Parser.list1
  :: forall b.
     Text.Parsec.Prim.State [Char] ()
     -> (Expr
         -> Text.Parsec.Prim.State [Char] ()
         -> Text.Parsec.Error.ParseError
         -> Data.Functor.Identity.Identity b)
     -> (Text.Parsec.Error.ParseError
         -> Data.Functor.Identity.Identity b)
     -> (Expr
         -> Text.Parsec.Prim.State [Char] ()
         -> Text.Parsec.Error.ParseError
         -> Data.Functor.Identity.Identity b)
     -> (Text.Parsec.Error.ParseError
         -> Data.Functor.Identity.Identity b)
     -> Data.Functor.Identity.Identity b

That's quite the code explosion!

comment:15 Changed 3 years ago by RyanGlScott

To elaborate on the observation that enabling -fprof-auto fixes the slowdown, it's because if you add an SCC pragma for list:

{-# SCC list #-}
list :: Parser Expr
list = ...

Then it also fixes the slowdown. (-fprof-auto was just doing this under the hood).

comment:16 Changed 18 months ago by George

Not sure if compiling the code in comment 7 with -O -prof is still a valid test case. If so, it seems to be fixed on 8.4.1 alpha 3 with parsec 3.1.13

comment:17 Changed 18 months ago by RyanGlScott

Great catch. Indeed, I cannot reproduce the issue anymore with GHC 8.2.2, GHC 8.4.1, or HEAD with the program in comment:7, and moreover, I can compile lambdabot-haskell-plugins on GHC 8.2.2 in a reasonable amount of time with profiling enabled.

Here's what you get for the Core for list on GHC HEAD:

-- RHS size: {terms: 1, types: 0, coercions: 7, joins: 0/0}
list :: Parser Expr
list
  = Parser.list1
    `cast` (Sym (Text.Parsec.Prim.N:ParsecT[0]
                     <[Char]>_R <()>_R <Data.Functor.Identity.Identity>_R <Expr>_R)
            :: (forall b.
                Text.Parsec.Prim.State [Char] ()
                -> (Expr
                    -> Text.Parsec.Prim.State [Char] ()
                    -> Text.Parsec.Error.ParseError
                    -> Data.Functor.Identity.Identity b)
                -> (Text.Parsec.Error.ParseError
                    -> Data.Functor.Identity.Identity b)
                -> (Expr
                    -> Text.Parsec.Prim.State [Char] ()
                    -> Text.Parsec.Error.ParseError
                    -> Data.Functor.Identity.Identity b)
                -> (Text.Parsec.Error.ParseError
                    -> Data.Functor.Identity.Identity b)
                -> Data.Functor.Identity.Identity b :: *)
               ~R# (Text.Parsec.Prim.ParsecT
                      [Char] () Data.Functor.Identity.Identity Expr :: *))

-- RHS size: {terms: 8, types: 32, coercions: 0, joins: 0/0}
Parser.list1
  :: forall b.
     Text.Parsec.Prim.State [Char] ()
     -> (Expr
         -> Text.Parsec.Prim.State [Char] ()
         -> Text.Parsec.Error.ParseError
         -> Data.Functor.Identity.Identity b)
     -> (Text.Parsec.Error.ParseError
         -> Data.Functor.Identity.Identity b)
     -> (Expr
         -> Text.Parsec.Prim.State [Char] ()
         -> Text.Parsec.Error.ParseError
         -> Data.Functor.Identity.Identity b)
     -> (Text.Parsec.Error.ParseError
         -> Data.Functor.Identity.Identity b)
     -> Data.Functor.Identity.Identity b
Parser.list1
  = \ (@ b_X3Bv)
      _ [Occ=Dead]
      _ [Occ=Dead]
      _ [Occ=Dead]
      _ [Occ=Dead]
      _ [Occ=Dead] ->
      case Parser.list2 of wild_00 { }

Hooray!

I'll add this as a test case, since GHC HEAD includes parsec as a boot library nowadays.

comment:18 Changed 18 months ago by RyanGlScott

Differential Rev(s): Phab:D4412
Status: newpatch

comment:19 Changed 18 months ago by Ben Gamari <ben@…>

In 125d1518/ghc:

Add regression test for #12790

Test Plan: make test TEST=T12790

Reviewers: bgamari, mpickering

Reviewed By: mpickering

Subscribers: mpickering, dfeuer, rwbarton, thomie, carter

GHC Trac Issues: #12790

Differential Revision: https://phabricator.haskell.org/D4412

comment:20 Changed 18 months ago by bgamari

Milestone: 8.4.1
Resolution: fixed
Status: patchclosed

It looks like this is probably fixed.

Note: See TracTickets for help on using tickets.