Opened 12 years ago

Closed 11 years ago

Last modified 9 years ago

#2002 closed bug (wontfix)

problems with very large (list) literals

Reported by: Isaac Dupree Owned by: simonmar
Priority: high Milestone: 6.10.2
Component: Compiler Version: 7.0.1
Keywords: Cc: cheecheeo@…
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I could have sworn parts of this had been reported before, but I can't find it, and further investigating makes it look more complicated.

A file with about 100000 elements in a literal list (see attached "longlist.hs") causes a stack overflow in ghc-6.8.2 after 20 seconds. For comparison, ghc-6.6.1 took a few minutes on my fast machine before I got bored and quit. ghc-6.8.2 +RTS -K300M succeeded in the span of a minute.

So the only way I know to reproduce it is: by running alex (not alex -g) on some of GHC's .x files (cmm/CmmLex, parser/Lexer) and compiling them as part of the GHC build process. Alex has some bugs without -g that I'll upgrade mine and/or send darcs patches later, and my build process is quite hacky, so I don't have a good reproducible case right now, but I'll try to make one later. This caused ghc not to terminate in a reasonable amount of time, even without -O0, and to use a lot of memory. When I replaced the commas in the lists with ":" and the end with ": []" to make a non-sugared list, ghc still didn't terminate very soon, and it used a lot more memory, so it was eventually killed by the OOM-killer :-) This is actually affecting my ghc-hacking work a little, so I would be pleased to see it fixed... tell if there's something particular I could say that would help tracking this down.

Change History (22)

comment:1 Changed 12 years ago by igloo

Milestone: 6.8.3
Priority: normalhigh

It sounds like this could be a regression, so I've marked it as high priority.

comment:2 Changed 12 years ago by Isaac Dupree

Could be. Compared to 6.6, though, it's not really a regression, in fact it's an improvement (in 6.6.1 no amount of memory will allow it to finish -- I think that might have been a quadratic-behavior bug that was fixed. I don't think "stack overflow" is really a regression from "fails to terminate on valid code"?)

As for the second part of my report (large alex files), I haven't tested it with 6.6.1, but I'm sure it wouldn't terminate, since in 6.6.1 the large list literals alone would ensure that.

comment:3 Changed 12 years ago by simonmar

Type: bugcompile-time performance bug

comment:4 Changed 12 years ago by guest

I also had experiences with large literal lists. The source files were about 300k-500k, the list sizes are probably somewhere in the 20000-30000 range (definitely less than 100000). I used both 6.6.1 and 6.8.2, both takes incredibly long time (10 minutes is not unheard of, on a 2ghz machine with 768mb ram), and sometimes fails spontaneously (usually memory allocation problems, if I recall it correctly).

I should note though that a significant part of the compilation time is spent by gcc, not ghc (no idea about the exact ratio).

comment:5 Changed 12 years ago by igloo

Looking at modules like this:

module W where
w :: [Int]
w = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100]

one problem is in OccAnal.

As we go down the desugared list syntax tree we go around this loop:

occAnal env app@(App _ _) = occAnalApp env (collectArgs app)

occAnalApp env (Var fun, args)
    = case occAnalArgs env args of
          (args_uds, args') -> (fun_uds +++ f args_uds, _)

occAnalArgs _env args
    = case mapAndUnzip (occAnal arg_env) args of
           (arg_uds_s, args') -> (f arg_uds_s, _)

As written, we build up a huge closure of +++s, which gives a stack overflow when we try to evaluate it. If we rewrite it along the lines of

occAnalApp env (Var fun, args)
    = case occAnalArgs env args of
          (args_uds, args') ->
              let uds = fun_uds +++ f args_uds
              in uds `seq` (uds, _)

then we have to go all the way along the spine to do the top-level seq, so we get a stack overflow again.

It's not immediately obvious whether or not we can make the occurence analyser pass the usage details around as an accumulating parameter, along with a little more state, to avoid the stack usage.


When compiling with

ghc -cpp -fglasgow-exts -fforce-recomp -fasm -funbox-strict-fields -c W.hs +RTS -p -hyTSO -h -xt

This table shows the fewest list elements needed for the stack to need to grow to a given size:

number of list elements     Stack size (kB)     bytes per list element
300                         250                 833
600                         500                 833
1200                        1000                833
2300                        2000                870
4500                        4000                889
8900                        8000                899

(where the stack size is the peak usage of the heap graph).

So that looks linear to me, and about 100 words per stack element. That's an order of magnitude more than I'd have expected, but it's not completely inplausible. However, even if we made it an order of magnitude smaller, I think people would still run into the limit.

comment:6 Changed 12 years ago by simonmar

IMO, we should just bump GHC's stack limit. Given that GHC may reasonably use stack that is linear in the size of the program, having a low fixed limit will arbitrarily reject some programs.

comment:7 Changed 11 years ago by simonmar

Owner: set to simonmar

comment:8 Changed 11 years ago by guest

comment:9 Changed 11 years ago by igloo

Milestone: 6.8.36.10 branch

I've merged

Mon May 19 05:53:33 PDT 2008  Simon Marlow <marlowsd@gmail.com>
  * bump GHC's maximum stack size to 64Mb (see #2002)

I'm not sure if there isn't more going on here, though, so I'm leaving the bug open but in the 6.10 milestone.

comment:10 Changed 11 years ago by Isaac Dupree

There is still an issue here, with HEAD self-compiling its parser/Lexer.{hs|x} generated by alex without -g. I have it here with 15 minutes of 2GHz CPU time and about 250MB memory used constantly according to top, and no evidence of finishing.

../compiler/stage1/ghc-inplace -no-user-package-conf -H128m -O0 -fasm  -istage2/utils  -istage2/basicTypes  -istage2/types  -istage2/hsSyn  -istage2/prelude  -istage2/rename  -istage2/typecheck  -istage2/deSugar  -istage2/coreSyn  -istage2/vectorise  -istage2/specialise  -istage2/simplCore  -istage2/stranal  -istage2/stgSyn  -istage2/simplStg  -istage2/codeGen  -istage2/main  -istage2/profiling  -istage2/parser  -istage2/cprAnalysis  -istage2/iface  -istage2/cmm  -istage2/nativeGen  -istage2/ghci -Wall -fno-warn-name-shadowing -fno-warn-orphans -Istage2  -package hpc -package bytestring -DGHCI -package template-haskell -DGHCI_TABLES_NEXT_TO_CODE -I../libffi/build/include -cpp -fglasgow-exts -fno-generics -Rghc-timing -I. -Iparser -Iutil -package unix -package Cabal -ignore-package lang -recomp -Rghc-timing -O0 -fasm -DDEBUG -H16M '-#include "cutils.h"' -package-name  ghc-6.9.20080602   -fgenerics  -funbox-strict-fields  -c parser/Lexer.hs -o stage2/parser/Lexer.o  -ohi stage2/parser/Lexer.hi

Same with -O as -O0 and -fvia-C instead of -fasm (gcc & friends never start, presumably because GHC hasn't gotten that far)... Oh wait, -fvia-C after 2 minutes GHC CPU, it goes into cc1 (gcc) which is taking all my RAM memory. Hang on a moment, I'll report back when it's done or crashed, I need to free up memory :-)

Hmm... it appears some of the settings in my mk/build.mk (-H128M) are being overridden by compiler/Makefile (-H16M). Odd.

It's a bit tricky to reproduce, because we obviously we don't expect 6.8 and previous to succeed. I first got a working stage1, then proceeded to modify mk/config.mk{,.in} to delete the -g from the GHC_ALEX_OPTS line, then removed the file compiler/parser/Lexer.hs (to make sure it would be regenerated with the new settings), then cd compiler and make stage=2. Obviously simpler test cases could exist, but it's not simple to see how to simplify this instance of GHC getting overwhelmed without wrecking it

comment:11 Changed 11 years ago by Isaac Dupree

update: with my 2 GB RAM and 3 GB swap space, gcc took a long time swapping, but got in a good 2 minutes of CPU time according to top (cc1 achieving about 80% of my RAM used at any one time, and between 0 and 50 percent of my 2GHz CPU) before dying due to lack of virtual memory.

Therefore, it seems that while there is a moderate performance issue with GHC through STG (2 minutes for a single file?), the assembly is the really difficult part: GCC needs an obscene amount of memory, and GHC's -fasm either needs an obscene or infinite amount of time. But let me check back again... that might have been the optimizations and my cutting corners...

Makefile:1004: LIBRARY is libHSghc.a
../compiler/stage1/ghc-inplace -no-user-package-conf -H128m -O0 -fvia-C  -istage2/utils  -istage2/basicTypes  -istage2/types  -istage2/hsSyn  -istage2/prelude  -istage2/rename  -istage2/typecheck  -istage2/deSugar  -istage2/coreSyn  -istage2/vectorise  -istage2/specialise  -istage2/simplCore  -istage2/stranal  -istage2/stgSyn  -istage2/simplStg  -istage2/codeGen  -istage2/main  -istage2/profiling  -istage2/parser  -istage2/cprAnalysis  -istage2/iface  -istage2/cmm  -istage2/nativeGen  -istage2/ghci -Wall -fno-warn-name-shadowing -fno-warn-orphans -Istage2  -package hpc -package bytestring -DGHCI -package template-haskell -DGHCI_TABLES_NEXT_TO_CODE -I../libffi/build/include -cpp -fglasgow-exts -fno-generics -Rghc-timing -I. -Iparser -Iutil -package unix -package Cabal -ignore-package lang -recomp -Rghc-timing -O0 -fvia-C -DDEBUG -H16M '-#include "cutils.h"' -package-name  ghc-6.9.20080602   -fgenerics  -funbox-strict-fields  -c parser/Lexer.hs -o stage2/parser/Lexer.o  -ohi stage2/parser/Lexer.hi
Binary: expanding to size: 6144
Binary: expanding to size: 6144
Binary: expanding to size: 6144
Binary: expanding to size: 6144
virtual memory exhausted: Cannot allocate memory
<<ghc: 25106288876 bytes, 47369 GCs, 25428014/262348800 avg/max bytes residency (89 samples), 612M in use, 0.00 INIT (0.00 elapsed), 61.75 MUT (1451.24 elapsed), 68.52 GC (69.86 elapsed) :ghc>>
make: *** [stage2/parser/Lexer.o] Error 1

comment:12 Changed 11 years ago by Isaac Dupree

-O0 -fasm succeeded within 3 minutes and 400 MB RAM or so.

./compiler/stage1/ghc-inplace -no-user-package-conf -H128m -O0 -fasm  -istage2/utils  -istage2/basicTypes  -istage2/types  -istage2/hsSyn  -istage2/prelude  -istage2/rename  -istage2/typecheck  -istage2/deSugar  -istage2/coreSyn  -istage2/vectorise  -istage2/specialise  -istage2/simplCore  -istage2/stranal  -istage2/stgSyn  -istage2/simplStg  -istage2/codeGen  -istage2/main  -istage2/profiling  -istage2/parser  -istage2/cprAnalysis  -istage2/iface  -istage2/cmm  -istage2/nativeGen  -istage2/ghci -Wall -fno-warn-name-shadowing -fno-warn-orphans -Istage2  -package hpc -package bytestring -DGHCI -package template-haskell -DGHCI_TABLES_NEXT_TO_CODE -I../libffi/build/include -cpp -fglasgow-exts -fno-generics -Rghc-timing -I. -Iparser -Iutil -package unix -package Cabal -ignore-package lang -recomp -Rghc-timing -O0 -fasm -DDEBUG -H16M '-#include "cutils.h"' -package-name  ghc-6.9.20080602   -fgenerics  -funbox-strict-fields  -c parser/Lexer.hs -o stage2/parser/Lexer.o  -ohi stage2/parser/Lexer.hi
Binary: expanding to size: 6144
Binary: expanding to size: 6144
Binary: expanding to size: 6144
Binary: expanding to size: 6144
<<ghc: 26187464736 bytes, 49463 GCs, 24378379/237682688 avg/max bytes residency (89 samples), 562M in use, 0.00 INIT (0.08 elapsed), 63.18 MUT (77.00 elapsed), 67.19 GC (68.44 elapsed) :ghc>>

new summary:

-O0 -fasm completely works, though it's slower than ideal.

-O0 -fvia-C dies because GCC isn't prepared for how much we're throwing at it. Seems hard for us to fix: lucky we're working on the NCG (though possibly ghc -O would simplify the generated C code enough for GCC to survive, but that'd be IMHO unlikely in this case). (though I wonder if the GCC people know, or should know, that their compiler behaves poorly in these artificial-but-real cases)

-O takes way too long (I only thoroughly tested -O with -fasm, so the slow optimizations could be Core, Cmm, anything; and maybe it succeeds in an hour's time, I don't know, would you like me to test on my computer for up to a whole day?). I'm not entirely surprised, but it's still a somewhat serious bug from my point of view.

-Isaac

comment:13 Changed 11 years ago by Isaac Dupree

okay, I tried it today with -O [*] (compiler: 6.9.20080619, a stage1 compiled by 6.8.2) and it succeeded after two hours, using up to 738 MB RAM. So it's "only" 40 times slower optimizing than the already-slow not-optimizing case. [*] First I tried compiling all the modules of GHC that Lexer.hs depends on, with -O0, then compiling Lexer.hs with -O; then I tried again compiling the dependent modules with -O too, and the compile time (not including dependent modules) of Lexer.hs was only a few minutes longer with a few more megabytes used.

I avoided ghc --make for testing because that might have an additional risk of wasted memory that I'm not trying to test.

P.S. is there a way to get a correct build order other than string-processing on the output of ghc --make? I tried ghc -M but that gave a makefile format and the modules weren't textually in the necessary order.

Anyway, I'm attaching a test-case extracted from the GHC source but hacked so it doesn't need to go through the configure process (since it suffers the same slowness being compiled).. use e.g.

./build_deps path_to_ghc -O
./build_lexer path_to_ghc -O

oh wait, trac won't let me attach that big a file (half a megabyte), so here it is: http://isaac.cedarswampstudios.org/2008/ghc_x_slowness.tar.gz

comment:14 Changed 11 years ago by simonmar

Milestone: 6.10 branch6.10.1

comment:15 Changed 11 years ago by igloo

Milestone: 6.10.16.10.2

comment:16 Changed 11 years ago by simonmar

Resolution: wontfix
Status: newclosed

Tested today with GHC HEAD. Compiling a 100k-element [Int] list takes just less than a minute with -O0, and needs up to 200M stack (I'll bump the default max stack size for GHC). The profile looks like this:

SimplTopBinds                  SimplCore             44.9   27.5
NativeCodeGen                  CodeOutput            13.7   12.4
CoreTidy                       HscMain                9.6    8.2
CorePrep                       HscMain                8.1    5.1
CodeGen                        HscMain                7.5   12.7
pprNativeCode                  AsmCodeGen             4.4   11.3

The object file was 22M, but I don't see any obvious ways to reduce that significantly - most of that size is the symbols. There is one word per static (:) that we could eliminate by generating a special version of the (:) info table with CONSTR_STATIC_NOCAF; I'm not sure whether that's worthwhile in general.

GHC needed 2.2G on this machine (x86_64/Linux).

Bottom line: there's nothing obviously bad here. The time and memory requirements increase roughly linearly with the length of the list, although we should bump the default max stack size. The simplifier is the obvious place to look to start optimising.

I didn't try compiling with -O, but I'm not at all surprised if it takes a lot longer - don't do that!

comment:17 Changed 11 years ago by guest

re: your last two paragraphs. I'd like to be able to compile with -O for the sake of the other parts of the module. Maybe GHC can detect when there's a large literal that it shouldn't try to optimize with? Or maybe a code writer could annotate it with NOINLINE or some other pragma that tells GHC not to try to waste time optimizing it? (perhaps as if it were in a separate module that were compiled with -O0... Actually putting it in a separate module would be an especial nuisance for a tool like Alex to generate). But I guess I don't have an urgent problem here.

-Isaac Dupree

comment:18 Changed 11 years ago by igloo

Applied to HEAD and 6.10:

Fri Dec 19 03:22:11 PST 2008  Simon Marlow <marlowsd@gmail.com>
  * bump GHC's max stack size to 512M
  To accomodate compiling very long static lists (#2002)

comment:19 Changed 10 years ago by simonmar

Type of failure: Compile-time performance bug

comment:20 in reply to:  16 ; Changed 9 years ago by cheecheeo

Architecture: x86x86_64 (amd64)
Cc: cheecheeo@… added
Version: 6.8.27.0.1

Replying to simonmar:

Tested today with GHC HEAD. Compiling a 100k-element [Int] list takes just less than a minute ...

I tried compiling a 100k element [Int] today with ghc 7.0.1 (-O0) and was unable to get past the Renamer/typechecker phase without running out of memory.

comment:21 in reply to:  20 Changed 9 years ago by simonmar

Replying to cheecheeo:

I tried compiling a 100k element [Int] today with ghc 7.0.1 (-O0) and was unable to get past the Renamer/typechecker phase without running out of memory.

That's most likely due to #4801

comment:22 Changed 9 years ago by simonpj

Things should be a lot better with 7.0.2.

Simon

Note: See TracTickets for help on using tickets.