Opened 10 years ago

Closed 9 years ago

#3972 closed bug (fixed)

ghc 6.12.1 and 6.13.20090922 consume a lot more memory than 6.10.4 when compiling language-python package

Reported by: bjpop Owned by: igloo
Priority: high Milestone: 7.0.1
Component: Compiler Version: 6.12.1
Keywords: memory usage Cc: mle+hs@…, dons@…, overture2112@…, petersen@…, red5_2@…
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Compile-time performance bug Test Case: T3972
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

When compiling the language-python package ghc 6.12.1 and 6.13.20090922 consume a lot more memory than 6.10.4 does.

OS: x86_64 GNU/Linux, Ubuntu 9.10 karmic koala gcc: 4.4.1

All versions of GHC taken from the generic binary downloads provided by the GHC web page.

language-python: 0.2 (http://projects.haskell.org/language-python/language-python-0.2.tar.gz)

cabal-install version 0.8.0 using version 1.8.0.2 of the Cabal library

Steps taken to witness bug:

  1. wget http://projects.haskell.org/language-python/language-python-0.2.tar.gz
  2. tar xvf language-python-0.2.tar.gz
  3. cd language-python-0.2/
  4. cabal configure
  5. cabal build

The build processes 10 files successfully before hitting Language.Python.Common.ParserUtils. At that point memory consumption grows to at least 1.5 GB resident, and I have to kill the process to prevent my machine from thrashing.

Curiously ParserUtils is only a relatively small source file, 11975 bytes, 301 lines.

Turing off optimisation on the compilation of ParserUtils allows it to be compiled very quickly without memory use explosion. But then the build process gets to Language/Python/Version3/Parser/Parser.hs, and ghc consumes lots of memory again (well past 1.5 GB).

This memory consumption behaviour has been independently verified by one other user of language-python on ghc 6.12.1 on a different machine to mine.

However, ghc 6.10.4 can successfully build the entire language-python package in just under 1 GB of ram.

Change History (13)

comment:1 Changed 10 years ago by erikd

Cc: mle+hs@… added

comment:2 Changed 10 years ago by igloo

Milestone: 6.12.3
Priority: normalhigh

With AST.hs:

module AST (Expr(..), Token(..), spanning, getSpan) where

class Span a where
   getSpan :: a -> SrcSpan

spanning :: (Span a, Span b) => a -> b -> SrcSpan
spanning x y = combineSrcSpans (getSpan x) (getSpan y)

instance Span a => Span [a] where
   getSpan [] = error "[]"
   getSpan [x] = getSpan x
   getSpan list@(x:_) = combineSrcSpans (getSpan x) (getSpan (last list))

data SrcSpan

  = SpanMultiLine
    { span_start_row    :: !Int
    , span_start_column :: !Int
    , span_end_row      :: !Int
    , span_end_column   :: !Int
    }

combineSrcSpans :: SrcSpan -> SrcSpan -> SrcSpan
combineSrcSpans start end
 = case row1 `compare` row2 of
     LT -> SpanMultiLine row1 col1 row2 col2
     _  -> SpanMultiLine row2 col2 row1 col1
  where
  row1 = startRow start
  col1 = startCol start
  row2 = endRow end
  col2 = endCol end


startRow :: SrcSpan -> Int
startRow (SpanMultiLine { span_start_row = row }) = row

endRow :: SrcSpan -> Int
endRow (SpanMultiLine { span_end_row = row }) = row

startCol :: SrcSpan -> Int
startCol (SpanMultiLine { span_start_column = col }) = col

endCol :: SrcSpan -> Int
endCol (SpanMultiLine { span_end_column = col }) = col

data Token
   = T10 { token_SrcSpan :: SrcSpan }
   | T11 { token_SrcSpan :: SrcSpan }
   | T12 { token_SrcSpan :: SrcSpan }
   | T13 { token_SrcSpan :: SrcSpan }
   | T14 { token_SrcSpan :: SrcSpan }
   | T15 { token_SrcSpan :: SrcSpan }
   | T16 { token_SrcSpan :: SrcSpan }
   | T17 { token_SrcSpan :: SrcSpan }
   | T18 { token_SrcSpan :: SrcSpan }
   | T19 { token_SrcSpan :: SrcSpan }
   | T20 { token_SrcSpan :: SrcSpan }
   | T21 { token_SrcSpan :: SrcSpan }
   | T22 { token_SrcSpan :: SrcSpan }
   | T23 { token_SrcSpan :: SrcSpan }
   | T24 { token_SrcSpan :: SrcSpan }
   | T25 { token_SrcSpan :: SrcSpan }
   | T26 { token_SrcSpan :: SrcSpan }
   | T27 { token_SrcSpan :: SrcSpan }
   | T28 { token_SrcSpan :: SrcSpan }
   | T29 { token_SrcSpan :: SrcSpan }
   | T30 { token_SrcSpan :: SrcSpan }
   | T31 { token_SrcSpan :: SrcSpan }
   | T32 { token_SrcSpan :: SrcSpan }
   | T33 { token_SrcSpan :: SrcSpan }
   | T34 { token_SrcSpan :: SrcSpan }
   | T35 { token_SrcSpan :: SrcSpan }
   | T36 { token_SrcSpan :: SrcSpan }
   | T37 { token_SrcSpan :: SrcSpan }
   | T38 { token_SrcSpan :: SrcSpan }
   | T39 { token_SrcSpan :: SrcSpan }
   | T40 { token_SrcSpan :: SrcSpan }
   | T41 { token_SrcSpan :: SrcSpan }
   | T42 { token_SrcSpan :: SrcSpan }
   | T43 { token_SrcSpan :: SrcSpan }
   | T44 { token_SrcSpan :: SrcSpan }
   | T45 { token_SrcSpan :: SrcSpan }
   | T46 { token_SrcSpan :: SrcSpan }
   | T47 { token_SrcSpan :: SrcSpan }
   | T48 { token_SrcSpan :: SrcSpan }
   | T49 { token_SrcSpan :: SrcSpan }

instance Span Token where
  getSpan = token_SrcSpan

data Expr
   = E10 { expr_SrcSpan :: SrcSpan }
   | E11 { expr_SrcSpan :: SrcSpan }
   | E12 { expr_SrcSpan :: SrcSpan }
   | E13 { expr_SrcSpan :: SrcSpan }
   | E14 { expr_SrcSpan :: SrcSpan }
   | E15 { expr_SrcSpan :: SrcSpan }
   | E16 { expr_SrcSpan :: SrcSpan }
   | E17 { expr_SrcSpan :: SrcSpan }
   | E18 { expr_SrcSpan :: SrcSpan }
   | E19 { expr_SrcSpan :: SrcSpan }
   | E20 { expr_SrcSpan :: SrcSpan }
   | E21 { expr_SrcSpan :: SrcSpan }
   | E22 { expr_SrcSpan :: SrcSpan }
   | E23 { expr_SrcSpan :: SrcSpan }
   | E24 { expr_SrcSpan :: SrcSpan }
   | E25 { expr_SrcSpan :: SrcSpan }
   | E26 { expr_SrcSpan :: SrcSpan }
   | E27 { expr_SrcSpan :: SrcSpan }
   | E28 { expr_SrcSpan :: SrcSpan }
   | E29 { expr_SrcSpan :: SrcSpan }
   | E30 { expr_SrcSpan :: SrcSpan }
   | E31 { expr_SrcSpan :: SrcSpan }
   | E32 { expr_SrcSpan :: SrcSpan }
   | E33 { expr_SrcSpan :: SrcSpan }
   | E34 { expr_SrcSpan :: SrcSpan }
   | E35 { expr_SrcSpan :: SrcSpan }
   | E36 { expr_SrcSpan :: SrcSpan }
   | E37 { expr_SrcSpan :: SrcSpan }
   | E38 { expr_SrcSpan :: SrcSpan }
   | E39 { expr_SrcSpan :: SrcSpan }
   | E40 { expr_SrcSpan :: SrcSpan }
   | E41 { expr_SrcSpan :: SrcSpan }
   | E42 { expr_SrcSpan :: SrcSpan }
   | E43 { expr_SrcSpan :: SrcSpan }
   | E44 { expr_SrcSpan :: SrcSpan }
   | E45 { expr_SrcSpan :: SrcSpan }
   | E46 { expr_SrcSpan :: SrcSpan }
   | E47 { expr_SrcSpan :: SrcSpan }
   | E48 { expr_SrcSpan :: SrcSpan }
   | E49 { expr_SrcSpan :: SrcSpan }

instance Span Expr where
   getSpan = expr_SrcSpan

and ParserUtils.hs:

module ParserUtils where

import AST (Expr(E10), Token, spanning, getSpan)

makeTupleOrExpr :: [Expr] -> Maybe Token -> Expr
makeTupleOrExpr [e] Nothing = e
makeTupleOrExpr es@(_:_) (Just t) = E10 (spanning es t)
makeTupleOrExpr es@(_:_) Nothing  = E10 (getSpan es)

6.12.1 and stable branch give (with a 1G memory limit):

$ ulimit -v
1024000
$ ghc -O -c AST.hs
$ ghc -O -c ParserUtils.hs
ghc: out of memory (requested 1048576 bytes)

while HEAD has no problem. I don't know if this is due to luck, or if HEAD has already been fixed, though.

comment:3 Changed 10 years ago by igloo

It looks like the main difference is that in the stable branch the getSpan functions are getting inlined:

==================== Occurrence analysis ====================
$dSpan_al3 [ALWAYS OnceL* Nothing] :: AST.Span [AST.Expr]
LclId
[Arity 1]
$dSpan_al3 = AST.$fSpan[] @ AST.Expr AST.$fSpanExpr

lvl_smf [ALWAYS OnceL Nothing] :: AST.Expr
LclId
[]
lvl_smf =
  Control.Exception.Base.patError
    @ AST.Expr "ParserUtils.hs:(6,0)-(8,51)|function makeTupleOrExpr"

ParserUtils.makeTupleOrExpr :: [AST.Expr]
                               -> Data.Maybe.Maybe AST.Token
                               -> AST.Expr
LclIdX
[Arity 2]
ParserUtils.makeTupleOrExpr =
  \ (ds_dla [ALWAYS Once! Nothing] :: [AST.Expr])
    (ds_dlb :: Data.Maybe.Maybe AST.Token) ->
    case ds_dla of wild_B1 {
      [] -> lvl_smf;
      : e_ag9 [ALWAYS Once Nothing] ds_dlc [ALWAYS Once! Nothing] ->
        let {
          ds_dla [ALWAYS Once* Nothing] :: [AST.Expr]
          LclId
          []
          ds_dla = wild_B1 } in
        let {
          fail_sme [ALWAYS Once*! Nothing] :: GHC.Prim.State#
                                                GHC.Prim.RealWorld
                                              -> AST.Expr
          LclId
          [Arity 1]
          fail_sme =
            \ _ ->
              case ds_dlb of _ {
                Data.Maybe.Nothing ->
                  let {
                    lvl_smg [ALWAYS Once Nothing] :: AST.Expr
                    LclId
                    []
                    lvl_smg = AST.E10 (AST.getSpan @ [AST.Expr] $dSpan_al3 ds_dla) } in
                  lvl_smg;
                Data.Maybe.Just t_agk [ALWAYS Once Nothing] ->
                  AST.E10
                    (AST.spanning
                       @ [AST.Expr] @ AST.Token $dSpan_al3 AST.$fSpanToken ds_dla t_agk)
              } } in
        case ds_dlc of _ {
          [] ->
            case ds_dlb of _ {
              Data.Maybe.Nothing -> e_ag9;
              Data.Maybe.Just _ -> fail_sme GHC.Prim.realWorld#
            };
          : _ _ -> fail_sme GHC.Prim.realWorld#
        }
    }




==================== Simplifier mode 2 [main], iteration 1 out of 4 ====================
Total ticks: NON-ZERO!


    Result size = 2999

==================== Simplifier mode 2 [main], iteration 1 out of 4 ====================
a_spy :: [AST.Expr] -> AST.SrcSpan
LclId
[Arity 1]
a_spy =
  AST.$fSpan[]_getSpan
    @ AST.Expr
    (AST.expr_SrcSpan
     `cast` (sym (AST.NTCo:T:Span AST.Expr)
             :: (AST.Expr -> AST.SrcSpan) ~ AST.T:Span AST.Expr))

lvl_smf :: AST.Expr
LclId
[]
lvl_smf =
  Control.Exception.Base.patError
    @ AST.Expr "ParserUtils.hs:(6,0)-(8,51)|function makeTupleOrExpr"

ParserUtils.makeTupleOrExpr :: [AST.Expr]
                               -> Data.Maybe.Maybe AST.Token
                               -> AST.Expr
LclIdX
[Arity 2]
ParserUtils.makeTupleOrExpr =
  \ (ds_dla :: [AST.Expr]) (ds_dlb :: Data.Maybe.Maybe AST.Token) ->
    case ds_dla of wild_B1 {
      [] -> lvl_smf;
      : e_ag9 ds_dlc ->
        let {
          fail_sme :: GHC.Prim.State# GHC.Prim.RealWorld -> AST.Expr
          LclId
          [Arity 1]
          fail_sme =
            \ _ ->
              case ds_dlb of _ {
                Data.Maybe.Nothing ->
                  AST.E10
                    (AST.$fSpan[]_getSpan
                       @ AST.Expr
                       (AST.expr_SrcSpan
                        `cast` (sym (AST.NTCo:T:Span AST.Expr)
                                :: (AST.Expr -> AST.SrcSpan) ~ AST.T:Span AST.Expr))
                       wild_B1);
                Data.Maybe.Just t_agk ->
                  AST.E10
                    (let {
                       $j_srC :: GHC.Types.Int -> GHC.Types.Int -> AST.SrcSpan
                       LclId
                       [Arity 2]
                       $j_srC =
                         \ (ds_alw [ALWAYS Nothing OneShot] :: GHC.Types.Int)
                           (ds1_alx [ALWAYS Nothing OneShot] :: GHC.Types.Int) ->
                           case ds_alw of wild1_alB { GHC.Types.I# x#_alD ->
                           let {
                             $j_srA :: GHC.Types.Int -> GHC.Types.Int -> AST.SrcSpan
                             LclId
                             [Arity 2]
                             $j_srA =
                               \ (ds6_alJ [ALWAYS Nothing OneShot] :: GHC.Types.Int)
                                 (ds7_alK [ALWAYS Nothing OneShot] :: GHC.Types.Int) ->
                                 case ds6_alJ of wild11_alM { GHC.Types.I# y#_alO ->
                                 case GHC.Prim.<# x#_alD y#_alO of _ {
                                   GHC.Bool.False ->
                                     AST.SpanMultiLine wild11_alM ds7_alK wild1_alB ds1_alx;
                                   GHC.Bool.True ->
                                     AST.SpanMultiLine wild1_alB ds1_alx wild11_alM ds7_alK
                                 }
                                 } } in
                           case t_agk of _ {
                             AST.T10 ds1_apD ->
                               case ds1_apD of _ { AST.SpanMultiLine _ _ ds6_alJ ds7_alK ->
                               case ds6_alJ of wild11_alM { GHC.Types.I# y#_alO ->
                               case GHC.Prim.<# x#_alD y#_alO of _ {
                                 GHC.Bool.False ->
                                   AST.SpanMultiLine wild11_alM ds7_alK wild1_alB ds1_alx;
                                 GHC.Bool.True ->
                                   AST.SpanMultiLine wild1_alB ds1_alx wild11_alM ds7_alK
                               }
                               }
                               };
                             AST.T11 ds1_apG ->
                               case ds1_apG of _ { AST.SpanMultiLine _ _ ds6_alJ ds7_alK ->
                               case ds6_alJ of wild11_alM { GHC.Types.I# y#_alO ->
                               case GHC.Prim.<# x#_alD y#_alO of _ {
                                 GHC.Bool.False ->
                                   AST.SpanMultiLine wild11_alM ds7_alK wild1_alB ds1_alx;
                                 GHC.Bool.True ->
                                   AST.SpanMultiLine wild1_alB ds1_alx wild11_alM ds7_alK
                               }
                               }
                               };
[...]
                             AST.T49 ds1_arw ->
                               case ds1_arw of _ { AST.SpanMultiLine _ _ ds6_alJ ds7_alK ->
                               case ds6_alJ of wild11_alM { GHC.Types.I# y#_alO ->
                               case GHC.Prim.<# x#_alD y#_alO of _ {
                                 GHC.Bool.False ->
                                   AST.SpanMultiLine wild11_alM ds7_alK wild1_alB ds1_alx;
                                 GHC.Bool.True ->
                                   AST.SpanMultiLine wild1_alB ds1_alx wild11_alM ds7_alK
                               }
                               }
                               }
                           }
                           } } in
                     case ds_dlc of wild1_amq {
                       [] ->
                         case e_ag9 of _ {
                           AST.E10 ds1_anA ->
                             case ds1_anA of _ { AST.SpanMultiLine ds_alw ds1_alx _ _ ->
                             $j_srC ds_alw ds1_alx
                             };
                           AST.E11 ds1_anD ->
                             case ds1_anD of _ { AST.SpanMultiLine ds_alw ds1_alx _ _ ->
                             $j_srC ds_alw ds1_alx
                             };
[...]
                           AST.E49 ds1_apt ->
                             case ds1_apt of _ { AST.SpanMultiLine ds_alw ds1_alx _ _ ->
                             $j_srC ds_alw ds1_alx
                             }
                         };
                       : ipv_amu ipv1_amv ->
                         let {
                           $j_srG :: GHC.Types.Int -> GHC.Types.Int -> AST.SrcSpan
                           LclId
                           [Arity 2]
                           $j_srG =
                             \ (ds1_amz [ALWAYS Nothing OneShot] :: GHC.Types.Int)
                               (ds2_amA [ALWAYS Nothing OneShot] :: GHC.Types.Int) ->
                               case ds1_amz of wild3_amE { GHC.Types.I# x#_amG ->
                               let {
                                 $j_srE :: GHC.Types.Int -> GHC.Types.Int -> AST.SrcSpan
                                 LclId
                                 [Arity 2]
                                 $j_srE =
                                   \ (ds7_anl [ALWAYS Nothing OneShot] :: GHC.Types.Int)
                                     (ds8_anm [ALWAYS Nothing OneShot] :: GHC.Types.Int) ->
                                     case ds7_anl of wild11_ano { GHC.Types.I# y#_anq ->
                                     case GHC.Prim.<# x#_amG y#_anq of _ {
                                       GHC.Bool.False -> $j_srC wild11_ano ds8_anm;
                                       GHC.Bool.True -> $j_srC wild3_amE ds2_amA
                                     }
                                     } } in
                               case GHC.List.last_last' @ AST.Expr e_ag9 wild1_amq of _ {
                                 AST.E10 ds1_anA ->
                                   case ds1_anA of _ { AST.SpanMultiLine _ _ ds7_anl ds8_anm ->
                                   case ds7_anl of wild11_ano { GHC.Types.I# y#_anq ->
                                   case GHC.Prim.<# x#_amG y#_anq of _ {
                                     GHC.Bool.False -> $j_srC wild11_ano ds8_anm;
                                     GHC.Bool.True -> $j_srC wild3_amE ds2_amA
                                   }
                                   }
                                   };
                                 AST.E11 ds1_anD ->
                                   case ds1_anD of _ { AST.SpanMultiLine _ _ ds7_anl ds8_anm ->
                                   case ds7_anl of wild11_ano { GHC.Types.I# y#_anq ->
                                   case GHC.Prim.<# x#_amG y#_anq of _ {
                                     GHC.Bool.False -> $j_srC wild11_ano ds8_anm;
                                     GHC.Bool.True -> $j_srC wild3_amE ds2_amA
                                   }
                                   }
                                   };
[...]
                                 AST.E49 ds1_apt ->
                                   case ds1_apt of _ { AST.SpanMultiLine _ _ ds7_anl ds8_anm ->
                                   case ds7_anl of wild11_ano { GHC.Types.I# y#_anq ->
                                   case GHC.Prim.<# x#_amG y#_anq of _ {
                                     GHC.Bool.False -> $j_srC wild11_ano ds8_anm;
                                     GHC.Bool.True -> $j_srC wild3_amE ds2_amA
                                   }
                                   }
                                   }
                               }
                               } } in
                         case e_ag9 of _ {
                           AST.E10 ds1_anA ->
                             case ds1_anA of _ { AST.SpanMultiLine ds1_amz ds2_amA _ _ ->
                             $j_srG ds1_amz ds2_amA
                             };
                           AST.E11 ds1_anD ->
                             case ds1_anD of _ { AST.SpanMultiLine ds1_amz ds2_amA _ _ ->
                             $j_srG ds1_amz ds2_amA
                             };
[...]
                           AST.E49 ds1_apt ->
                             case ds1_apt of _ { AST.SpanMultiLine ds1_amz ds2_amA _ _ ->
                             $j_srG ds1_amz ds2_amA
                             }
                         }
                     })
              } } in
        case ds_dlc of _ {
          [] ->
            case ds_dlb of _ {
              Data.Maybe.Nothing -> e_ag9;
              Data.Maybe.Just _ -> fail_sme GHC.Prim.realWorld#
            };
          : _ _ -> fail_sme GHC.Prim.realWorld#
        }
    }

comment:4 Changed 9 years ago by igloo

Owner: set to simonpj

comment:5 Changed 9 years ago by dons

Cc: dons@… added

comment:6 Changed 9 years ago by jre2

Cc: overture2112@… added

comment:7 Changed 9 years ago by juhpetersen

Cc: petersen@… added

comment:8 Changed 9 years ago by bjpop

Someone told me that they succeeded in building language-python with 6.12 as follows (I quote):

"For language-python, I've put {-# OPTIONS -O0 -v #-} pragma on two Parser.y files. -O0 is obvious, and someone mentioned that since it is most likely a strictness problem, writing verbose messages will help by forcing thunks. After that it builds under 1.1 GB RES."

I'm trying to get more information about the exact affect of -v (and hence maybe space leak evidence).

comment:9 Changed 9 years ago by simonmar

This workaround seems to do the trick. To the top of Language.Python.Common.AST, add

{-# OPTIONS_GHC -fomit-interface-pragmas #-}

You might also want to add -O0 option to each Parser.y file, but without them GHC tops out at 1GB on a 64-bit machine.

The problem seems to be inlining of record selectors. I tried working around it with NOINLINE pragmas on the instance Span methods, to no avail.

All the inlining machinery has had an overhaul in 6.14, so I don't imagine we'll be investing time in a fix for this in the 6.12 branch unless it's easy (I defer to Simon PJ on that one).

comment:10 Changed 9 years ago by simonmar

Milestone: 6.12.36.14.1

Punting to 6.14.1 for a real fix; for 6.12 use one of the workarounds (probably the -fomit-interface-pragmas one is better).

comment:11 Changed 9 years ago by heatsink

Cc: red5_2@… added

comment:12 Changed 9 years ago by simonpj

Owner: changed from simonpj to igloo

I claim this bug is fixed in the HEAD. At least, compiling Ian's example produces every reasonable code.

Ian: could you add a test using the two-module example you so helpfully put togehter (AST and ParserUtils), that checks that the ParserUtils.o file is of reasonable size? Or that -dshow-passes produces intermediates of reasonable size.

And bjpop: could you try compiling your python package again? Thanks.

Simon

comment:13 Changed 9 years ago by igloo

Resolution: fixed
Status: newclosed
Test Case: http://projects.haskell.org/language-python/language-python-0.2.tar.gzT3972

Test added.

Note: See TracTickets for help on using tickets.