Opened 18 months ago

Closed 16 months ago

Last modified 14 months ago

#14928 closed bug (fixed)

TH eats 50 GB memory when creating ADT with multiple constructors

Reported by: YitzGale Owned by:
Priority: normal Milestone: 8.6.1
Component: Template Haskell Version: 8.2.2
Keywords: Cc: snoyberg
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by RyanGlScott)

When TH creates a data type with multiple constructors, GHC consumes huge amounts of memory in what appears to be a highly superlinear manner.

A common use case: in the Yesod web framework, localized strings are represented by constructors of a Messages data type, created by a TH splice. There is one constructor for each localized string on the site, possibly hundreds. The splice also creates a class instance for the data type whose method matches against all the constructors for each language for which localizations are provided; this may or may not play a role in the memory leak. This Trac ticket corresponds to this Yesod issue:

https://github.com/yesodweb/yesod/issues/1487

Here are two reproductions, and one NON-reproduction:

  1. A blank "hello world" Yesod web site, with 500 messages defined for about 30 languages. The single page displays the messages in the user's language. Compiling this program in GHC 8.2.2 (stackage lts-10.5) on Ubuntu 16.04 eats over 50 GB of memory.

https://github.com/ygale/yesod-bug1487

  1. @snoyberg has cut down this reproduction to avoid using any libraries not included with GHC. It is in the same repo, on the snoyberg-master branch.

NON-reproduction: The code in this gist, which is similar to what is generated by the TH in the above reproductions, is compiled by GHC without the bat of an eyelash. This demonstrates that the bug requires TH to reproduce. https://gist.github.com/92347aa93d226e31f977a0b62b443aa7

Change History (7)

comment:1 Changed 18 months ago by RyanGlScott

Description: modified (diff)

comment:2 Changed 18 months ago by bgamari

Milestone: 8.6.1
Type of failure: None/UnknownCompile-time performance bug

comment:3 Changed 16 months ago by bgamari

While I don't doubt that we could (and should!) do a better job of compiling code like this, I would suggest that regular structures like this should really be encoded as a data structure, not code as is done here. Not only does compiling all of these large cases produce a great deal of work for the compiler but it's also inefficient. For instance, the Core produced by the snoyberg-master essentially implements an linear search over languages, doing a string comparison for every language for which localizations are available:

          case ds_doaz of {
            MsgTestLocalizationMessage001 ->
              case ==
                     @ I18N.Lang
                     Data.Text.$fEqText
                     lang_a53E
                     (Data.Text.pack (GHC.CString.unpackCString# "da"#))
              of {
                False ->
                  case ==
                         @ I18N.Lang
                         Data.Text.$fEqText
                         lang_a53E
                         (Data.Text.pack (GHC.CString.unpackCString# "da-DK"#))
                  of {
                    False -> ...
                    True ->
                      Data.Text.pack
                        (GHC.CString.unpackCString#
                           "This is test localization message number 1 in da-DK."#)
                  };
                True ->
                  Data.Text.pack
                    (GHC.CString.unpackCString#
                       "This is test localization message number 1 in da."#)
              };

Everything about this code is terrible: asymptotically it's suboptimal, your instruction caches will be thrashed into oblivion, the executable will be large (in my cut-down version of the test Foundation.o alone is 18 MBytes, yes contains only a few hundred kBytes of messages), and on top of this, it takes forever to compile. Surely this would be better implemented as a hash-map or some other structure with sublinear lookup.

Last edited 16 months ago by bgamari (previous) (diff)

comment:4 Changed 16 months ago by bgamari

The biggest allocator while compiling this program with -O0 appears to be CodeGen:

!!! Renamer/typechecker [Foundation]: finished in 22503.88 milliseconds, allocated 2682.425 megabytes
!!! Desugar [Foundation]: finished in 396.31 milliseconds, allocated 647.120 megabytes                                                                                                                      
!!! Simplifier [Foundation]: finished in 485.76 milliseconds, allocated 675.006 megabytes                                                                                                                    
!!! CoreTidy [Foundation]: finished in 45.90 milliseconds, allocated 88.678 megabytes                                                                                                                       
!!! CorePrep [Foundation]: finished in 168.78 milliseconds, allocated 252.445 megabytes
!!! CodeGen [Foundation]: finished in 17228.95 milliseconds, allocated 24917.560 megabytes                                                                                                                

In particular, we spend a significant amount of time in register allocation and producing assembler.

COST CENTRE         MODULE      SRC                                                 %time %alloc

hscCompileCoreExpr' HscMain     compiler/main/HscMain.hs:(1805,1)-(1827,24)          63.6    0.1
pprNativeCode       AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    5.5   18.5
RegAlloc-linear     AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    4.2   12.5
regLiveness         AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    3.5   10.1
tc_rn_src_decls     TcRnDriver  compiler/typecheck/TcRnDriver.hs:(491,4)-(555,7)      2.8    8.0
StgCmm              HscMain     compiler/main/HscMain.hs:(1463,13)-(1464,62)          2.7    7.8
genMachCode         AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    2.4    6.6
NativeCodeGen       CodeOutput  compiler/main/CodeOutput.hs:166:18-78                 1.8    4.0
layoutStack         CmmPipeline compiler/cmm/CmmPipeline.hs:(98,13)-(100,40)          1.6    4.3
fixStgRegisters     AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:566:17-42            1.3    1.2
cmmToCmm            AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:571:17-50            1.0    2.4
sequenceBlocks      AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:699:17-49            0.8    1.7
doSRTs              CmmPipeline compiler/cmm/CmmPipeline.hs:47:46-71                  0.6    1.1
Digraph.scc         Digraph     compiler/utils/Digraph.hs:277:44-67                   0.5    2.2
cmmCfgOpts(1)       CmmPipeline compiler/cmm/CmmPipeline.hs:64:13-62                  0.5    1.6
revPostorder        CmmUtils    compiler/cmm/CmmUtils.hs:561:5-47                     0.5    1.0
deSugar             HscMain     compiler/main/HscMain.hs:544:7-44                     0.4    1.4
simplExprF1-App     Simplify    compiler/simplCore/Simplify.hs:(866,34)-(883,62)      0.3    1.3
occAnalBind.assoc   OccurAnal   compiler/simplCore/OccurAnal.hs:819:13-60             0.3    1.1

Compiling with -O1 tells a very similar story; each simplifier pass only allocates a gigabyte or two, with codegen allocating several tens of GB.

comment:5 Changed 16 months ago by bgamari

Milestone: 8.6.18.4.1
Resolution: fixed
Status: newclosed

I can reproduce the high residency when building snoyman-master with GHC 8.2.1 (or GHC 8.4.1) and optimisation enabled, but have been unable to do so with GHC master. I believe the memory leak present in 8.2/8.4 has since been fixed. For what it's worth, I was also able to reproduce the issue using a Nix ghcHEAD snapshot from 20180118, so it was fixed relatively recently.

comment:6 Changed 16 months ago by bgamari

Milestone: 8.4.18.6.1

Whoops, wrong milestone.

comment:7 Changed 14 months ago by YitzGale

@bgamari Thanks very much for that. Confirming your results: I retested using 8.4.3 and 8.6.0.20180714 from hvr's PPA. On 8.4.3 the bug reproduced with -O1 but not with -O0. On 8.6.0.20180714 the bug did not reproduce at all, neither with -O0 nor with -O1.

And also thanks for your observations about the code generated by mkMessages. I'll report that as a separate issue.

Note: See TracTickets for help on using tickets.