Opened 10 years ago

Closed 10 years ago

Last modified 21 months ago

#3273 closed bug (duplicate)

memory leak due to optimisation

Reported by: sebf Owned by:
Priority: normal Milestone: 6.12 branch
Component: Compiler Version: 6.10.1
Keywords: FloatOut Cc: sebf@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Short summary:

The attached programs use lots of space when compiled with -O and run in constant space without -O. condensed.hs runs in constant space with -O too, if you remove any occurrence of 'id'. original.hs uses a lot of space without -O too, if an alternative definition of 'iterDepth' is used (see comment there).

I could reproduce this behaviour with ghc-6.8.2 under Linux as well as ghc-6.10.1 and ghc-6.11.20090331 under Mac OS X 10.5.7.

Full description:

This ticket has an attached tar file with two source files -- original.hs and condensed.hs -- that can be compiled with and without -O to reproduce the bug:

$ ghc -fforce-recomp --make original
[1 of 1] Compiling Main             ( original.hs, original.o )
Linking original ...

$ ./original 
^C <constant space usage>

$ ghc -v -dcore-lint -O -fforce-recomp --make original
Glasgow Haskell Compiler, Version 6.10.1, for Haskell 98, stage 2 booted by GHC version 6.8.3
Using package config file: /Library/Frameworks/GHC.framework/Versions/610/usr/lib/ghc-6.10.1/./package.conf
Using package config file: /Users/sebf/.ghc/i386-darwin-6.10.1/package.conf
hiding package HTTP-3001.1.3 to avoid conflict with later version HTTP-3001.1.4
hiding package haddock-2.3.0 to avoid conflict with later version haddock-2.4.1
hiding package base-3.0.3.0 to avoid conflict with later version base-4.0.0.0
hiding package old-time-1.0.0.1 to avoid conflict with later version old-time-1.0.0.2
hiding package filepath-1.1.0.1 to avoid conflict with later version filepath-1.1.0.2
hiding package process-1.0.1.0 to avoid conflict with later version process-1.0.1.1
hiding package Cabal-1.6.0.1 to avoid conflict with later version Cabal-1.6.0.2
hiding package regex-base-0.72.0.2 to avoid conflict with later version regex-base-0.93.1
hiding package regex-posix-0.72.0.3 to avoid conflict with later version regex-posix-0.93.2
hiding package regex-compat-0.71.0.1 to avoid conflict with later version regex-compat-0.92
hiding package network-2.2.0.1 to avoid conflict with later version network-2.2.1.2
hiding package QuickCheck-1.2.0.0 to avoid conflict with later version QuickCheck-2.1.0.1
hiding package time-1.1.2.2 to avoid conflict with later version time-1.1.2.3
hiding package zlib-0.4.0.4 to avoid conflict with later version zlib-0.5.0.0
hiding package HTTP-3001.1.4 to avoid conflict with later version HTTP-3001.1.5
hiding package regex-posix-0.93.2 to avoid conflict with later version regex-posix-0.94.1
hiding package utf8-string-0.3.3 to avoid conflict with later version utf8-string-0.3.4
hiding package binary-0.4.3.1 to avoid conflict with later version binary-0.4.4
hiding package zip-archive-0.1.1.1 to avoid conflict with later version zip-archive-0.1.1.3
hiding package binary-0.4.4 to avoid conflict with later version binary-0.5
hiding package haddock-2.4.1 to avoid conflict with later version haddock-2.4.2
hiding package HTTP-3001.1.5 to avoid conflict with later version HTTP-4000.0.0
hiding package hscolour-1.10.1 to avoid conflict with later version hscolour-1.11
hiding package HTTP-4000.0.0 to avoid conflict with later version HTTP-4000.0.1
hiding package haskell-src-exts-0.4.6 to avoid conflict with later version haskell-src-exts-0.4.8
hiding package HTTP-4000.0.1 to avoid conflict with later version HTTP-4000.0.4
hiding package digest-0.0.0.1 to avoid conflict with later version digest-0.0.0.2
hiding package level-monad-0.2.1 to avoid conflict with later version level-monad-0.3
hiding package HTTP-4000.0.4 to avoid conflict with later version HTTP-4000.0.7
hiding package digest-0.0.0.2 to avoid conflict with later version digest-0.0.0.4
hiding package terminfo-0.2.2.1 to avoid conflict with later version terminfo-0.3.0.2
hiding package stream-monad-0.1 to avoid conflict with later version stream-monad-0.1.1.0
hiding package tree-monad-0.1 to avoid conflict with later version tree-monad-0.1.1
hiding package parallel-tree-search-0.2.1 to avoid conflict with later version parallel-tree-search-0.4
hiding package explicit-sharing-0.2 to avoid conflict with later version explicit-sharing-0.3.1.1
hiding package tree-monad-0.1.1 to avoid conflict with later version tree-monad-0.2
hiding package tree-monad-0.2 to avoid conflict with later version tree-monad-0.2.1
hiding package parallel-tree-search-0.4 to avoid conflict with later version parallel-tree-search-0.4.1
hiding package level-monad-0.3 to avoid conflict with later version level-monad-0.3.1
hiding package explicit-sharing-0.3.1.1 to avoid conflict with later version explicit-sharing-0.3.1.2
hiding package explicit-sharing-0.3.1.2 to avoid conflict with later version explicit-sharing-0.3.1.3
hiding package explicit-sharing-0.3.1.3 to avoid conflict with later version explicit-sharing-0.4.0
hiding package explicit-sharing-0.4.0 to avoid conflict with later version explicit-sharing-0.5.0
wired-in package ghc-prim mapped to ghc-prim-0.1.0.0
wired-in package integer mapped to integer-0.1.0.0
wired-in package base mapped to base-4.0.0.0
wired-in package rts mapped to rts-1.0
wired-in package haskell98 mapped to haskell98-1.0.1.0
wired-in package syb mapped to syb-0.1.0.0
wired-in package template-haskell mapped to template-haskell-2.3.0.0
wired-in package dph-seq mapped to dph-seq-0.3
wired-in package dph-par mapped to dph-par-0.3
Hsc static flags: -static
*** Chasing dependencies:
Chasing modules from: *original.hs
Stable obj: [Main]
Stable BCO: []
Ready for upsweep
  [NONREC
      ModSummary {
         ms_hs_date = Thu Jun  4 13:23:20 CEST 2009
         ms_mod = main:Main,
         ms_imps = [Control.Monad.Cont]
         ms_srcimps = []
      }]
compile: input file original.hs
Created temporary directory: /var/folders/uZ/uZjvUnuPHNmMwDWLlrsGT++++TI/-Tmp-//ghc13029_0
*** Checking old interface for main:Main:
[1 of 1] Compiling Main             ( original.hs, original.o )
*** Parser:
*** Renamer/typechecker:
*** Desugar:
    Result size = 753
*** Core Linted result of Desugar:
*** Simplify:
    Result size = 290
*** Core Linted result of Simplifier phase gentle, iteration 1 out of 4:
    Result size = 280
*** Core Linted result of Simplifier phase gentle, iteration 2 out of 4:
    Result size = 280
*** Core Linted result of Simplify phase gentle done:
*** Specialise:
    Result size = 331
*** Core Linted result of Specialise:
*** Float out (not lambdas, not constants):
    Result size = 351
*** Core Linted result of Float out (not lambdas, not constants):
*** Float inwards:
    Result size = 351
*** Core Linted result of Float inwards:
*** Simplify:
    Result size = 587
*** Core Linted result of Simplifier phase 2 [main], iteration 1 out of 4:
    Result size = 390
*** Core Linted result of Simplifier phase 2 [main], iteration 2 out of 4:
    Result size = 304
*** Core Linted result of Simplify phase 2 [main] done:
*** Simplify:
    Result size = 287
*** Core Linted result of Simplifier phase 1 [main], iteration 1 out of 4:
    Result size = 287
*** Core Linted result of Simplify phase 1 [main] done:
*** Simplify:
    Result size = 330
*** Core Linted result of Simplifier phase 0 [main], iteration 1 out of 4:
    Result size = 310
*** Core Linted result of Simplifier phase 0 [main], iteration 2 out of 4:
    Result size = 304
*** Core Linted result of Simplifier phase 0 [main], iteration 3 out of 4:
    Result size = 304
*** Core Linted result of Simplify phase 0 [main] done:
*** Demand analysis:
    Result size = 304
*** Core Linted result of Demand analysis:
*** Worker Wrapper binds:
    Result size = 330
*** Core Linted result of Worker Wrapper binds:
*** GlomBinds:
*** Simplify:
    Result size = 345
*** Core Linted result of Simplifier phase 0 [post-worker-wrapper], iteration 1 out of 4:
    Result size = 345
*** Core Linted result of Simplify phase 0 [post-worker-wrapper] done:
*** Float out (not lambdas, constants):
    Result size = 353
*** Core Linted result of Float out (not lambdas, constants):
*** Common sub-expression:
    Result size = 352
*** Core Linted result of Common sub-expression:
*** Float inwards:
    Result size = 352
*** Core Linted result of Float inwards:
*** Simplify:
    Result size = 353
*** Core Linted result of Simplifier phase 0 [final], iteration 1 out of 4:
    Result size = 353
*** Core Linted result of Simplify phase 0 [final] done:
*** Tidy Core:
    Result size = 353
*** Core Linted result of Tidy Core:
writeBinIface: 49 Names
writeBinIface: 98 dict entries
*** CorePrep:
    Result size = 430
*** Core Linted result of CorePrep:
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
gcc -I. -c /var/folders/uZ/uZjvUnuPHNmMwDWLlrsGT++++TI/-Tmp-//ghc13029_0/ghc13029_0.s -o original.o
*** Deleting temp files:
Deleting: /var/folders/uZ/uZjvUnuPHNmMwDWLlrsGT++++TI/-Tmp-//ghc13029_0/ghc13029_0.s
Upsweep completely successful.
*** Deleting temp files:
Deleting: 
link: linkables are ...
LinkableM (Thu Jun  4 13:55:23 CEST 2009) main:Main
   [DotO original.o]
Linking original ...
*** Linker:
gcc -v -o original -DDONT_WANT_WIN32_DLL_SUPPORT original.o -L/Library/Frameworks/GHC.framework
/Versions/610/usr/lib/ghc-6.10.1/mtl-1.1.0.2 -L/Library/Frameworks/GHC.framework/Versions/610/usr
/lib/ghc-6.10.1/base-4.0.0.0 -L/Library/Frameworks/GHC.framework/Versions/610/usr/lib/ghc-6.10.1/
integer-0.1.0.0 -L/Library/Frameworks/GHC.framework/Versions/610/usr/lib/ghc-6.10.1/ghc-prim-0.1.0.0
 -L/Library/Frameworks/GHC.framework/Versions/610/usr/lib/ghc-6.10.1 -lHSmtl-1.1.0.2 -lHSbase-4.0.0.0
 -lHSinteger-0.1.0.0 -lHSghc-prim-0.1.0.0 -lHSrts -lm -lffi -lgmp -ldl -u _ghczmprim_GHCziTypes_Izh_static_info
 -u _ghczmprim_GHCziTypes_Czh_static_info -u _ghczmprim_GHCziTypes_Fzh_static_info 
-u _ghczmprim_GHCziTypes_Dzh_static_info -u _base_GHCziPtr_Ptr_static_info -u _base_GHCziWord_Wzh_static_info
 -u _base_GHCziInt_I8zh_static_info -u _base_GHCziInt_I16zh_static_info -u _base_GHCziInt_I32zh_static_info
 -u _base_GHCziInt_I64zh_static_info -u _base_GHCziWord_W8zh_static_info -u _base_GHCziWord_W16zh_static_info
 -u _base_GHCziWord_W32zh_static_info -u _base_GHCziWord_W64zh_static_info -u _base_GHCziStable_StablePtr_static_info
 -u _ghczmprim_GHCziTypes_Izh_con_info -u _ghczmprim_GHCziTypes_Czh_con_info -u _ghczmprim_GHCziTypes_Fzh_con_info
 -u _ghczmprim_GHCziTypes_Dzh_con_info -u _base_GHCziPtr_Ptr_con_info -u _base_GHCziPtr_FunPtr_con_info
 -u _base_GHCziStable_StablePtr_con_info -u _ghczmprim_GHCziBool_False_closure -u _ghczmprim_GHCziBool_True_closure
 -u _base_GHCziPack_unpackCString_closure -u _base_GHCziIOBase_stackOverflow_closure
 -u _base_GHCziIOBase_heapOverflow_closure -u _base_ControlziExceptionziBase_nonTermination_closure
 -u _base_GHCziIOBase_blockedOnDeadMVar_closure -u _base_GHCziIOBase_blockedIndefinitely_closure
 -u _base_ControlziExceptionziBase_nestedAtomically_closure -u _base_GHCziWeak_runFinalizzerBatch_closure
 -u _base_GHCziTopHandler_runIO_closure -u _base_GHCziTopHandler_runNonIO_closure
 -u _base_GHCziConc_ensureIOManagerIsRunning_closure -Wl,-search_paths_first
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5488~2/src/configure --disable-checking -enable-werror --prefix=/usr
 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/
 --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple
 --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5488)
 /usr/libexec/gcc/i686-apple-darwin9/4.0.1/collect2 -dynamic -arch i386 -macosx_version_min 10.5.7
 -weak_reference_mismatches non-weak -u _ghczmprim_GHCziTypes_Izh_static_info 
-u _ghczmprim_GHCziTypes_Czh_static_info -u _ghczmprim_GHCziTypes_Fzh_static_info
 -u _ghczmprim_GHCziTypes_Dzh_static_info -u _base_GHCziPtr_Ptr_static_info -u _base_GHCziWord_Wzh_static_info
 -u _base_GHCziInt_I8zh_static_info -u _base_GHCziInt_I16zh_static_info -u _base_GHCziInt_I32zh_static_info
 -u _base_GHCziInt_I64zh_static_info -u _base_GHCziWord_W8zh_static_info -u _base_GHCziWord_W16zh_static_info
 -u _base_GHCziWord_W32zh_static_info -u _base_GHCziWord_W64zh_static_info -u _base_GHCziStable_StablePtr_static_info
 -u _ghczmprim_GHCziTypes_Izh_con_info -u _ghczmprim_GHCziTypes_Czh_con_info -u _ghczmprim_GHCziTypes_Fzh_con_info
 -u _ghczmprim_GHCziTypes_Dzh_con_info -u _base_GHCziPtr_Ptr_con_info -u _base_GHCziPtr_FunPtr_con_info
 -u _base_GHCziStable_StablePtr_con_info -u _ghczmprim_GHCziBool_False_closure -u _ghczmprim_GHCziBool_True_closure
 -u _base_GHCziPack_unpackCString_closure -u _base_GHCziIOBase_stackOverflow_closure
 -u _base_GHCziIOBase_heapOverflow_closure -u _base_ControlziExceptionziBase_nonTermination_closure
 -u _base_GHCziIOBase_blockedOnDeadMVar_closure -u _base_GHCziIOBase_blockedIndefinitely_closure
 -u _base_ControlziExceptionziBase_nestedAtomically_closure -u _base_GHCziWeak_runFinalizzerBatch_closure
 -u _base_GHCziTopHandler_runIO_closure -u _base_GHCziTopHandler_runNonIO_closure
 -u _base_GHCziConc_ensureIOManagerIsRunning_closure -o original -lcrt1.10.5.o -L/Library/Frameworks/GHC.framework/
Versions/610/usr/lib/ghc-6.10.1/mtl-1.1.0.2 -L/Library/Frameworks/GHC.framework/Versions/610/usr/lib/ghc-6.10.1/base-4.0.0.0
 -L/Library/Frameworks/GHC.framework/Versions/610/usr/lib/ghc-6.10.1/integer-0.1.0.0 -L/Library/Frameworks/GHC.framework
/Versions/610/usr/lib/ghc-6.10.1/ghc-prim-0.1.0.0 -L/Library/Frameworks/GHC.framework/Versions/610/usr/lib/ghc-6.10.1
 -L/usr/lib/i686-apple-darwin9/4.0.1 -L/usr/lib/gcc/i686-apple-darwin9/4.0.1 -L/usr/lib/gcc/i686-apple-darwin9/4.0.1
 -L/usr/lib/gcc/i686-apple-darwin9/4.0.1/../../../i686-apple-darwin9/4.0.1 -L/usr/lib/gcc/i686-apple-darwin9/4.0.1
/../../.. original.o -lHSmtl-1.1.0.2 -lHSbase-4.0.0.0 -lHSinteger-0.1.0.0 -lHSghc-prim-0.1.0.0 -lHSrts -lm 
-lffi -lgmp -ldl -search_paths_first -lgcc_s.10.5 -lgcc -lSystem
link: done
*** Deleting temp files:
Deleting: 
*** Deleting temp dirs:
Deleting: /var/folders/uZ/uZjvUnuPHNmMwDWLlrsGT++++TI/-Tmp-//ghc13029_0

$ ./original 
^C <increasing space usage>

$ ghc -fforce-recomp --make condensed
[1 of 1] Compiling Main             ( condensed.hs, condensed.o )
Linking condensed ...

$ ./condensed 
^C <constant space>

13:50 memleak$ ghc -O -fforce-recomp --make condensed
[1 of 1] Compiling Main             ( condensed.hs, condensed.o )
Linking condensed ...

$ ./condensed 
^C <increasing space usage>

I have checked with ghc-6.8.2 on "Linux 2.6.27-11-generic 1 SMP Wed Apr 1 20:53:41 UTC 2009 x86_64 GNU/Linux" and with ghc-6.10.1 and 6.11.200903331 on "Darwin 9.7.0 Darwin Kernel Version 9.7.0: Tue Mar 31 22:52:17 PDT 2009; root:xnu-1228.12.14~1/RELEASE_I386 i386"

Here are some more runs with -fno-full-laziness and/or -fno-cse. Neither switch affects the memory requirements with -O:

$ ghc -fno-full-laziness -O -fforce-recomp --make condensed
[1 of 1] Compiling Main             ( condensed.hs, condensed.o )
Linking condensed ...

$ ./condensed 
^C <increasing memory usage>

$ ghc -fno-cse -O -fforce-recomp --make condensed
[1 of 1] Compiling Main             ( condensed.hs, condensed.o )
Linking condensed ...

$ ./condensed 
^C <increasing memory usage>

$ ghc -fno-full-laziness -fno-cse -O -fforce-recomp --make condensed
[1 of 1] Compiling Main             ( condensed.hs, condensed.o )
Linking condensed ...

$ ./condensed 
^C <increasing memory usage>

The program original.hs contains an alternative implementation of 'iterDepth' that seems to hint at the problem. I think, the argument of 'runDepthBound' is held in memory for a good reason there.

The program condensed.hs contains four superflous occurrences of 'id' that seem to play together to cause the memory leak. If all of them are present then the program uses lots of space with -O, if either one is missing it runs in constant space. The program always runs in constant space without -O. I think, the space leak is caused by holding the arguments of 'id' in memory. If that is true, however, I don't see why *all* occurrences of 'id' are necessary to cause the leak.

I think the original program and all versions of the condensed program should run in constant space with and without -O. It would be nice if the alternative version of the original program would run in constant space too but I see that there is sharing that may prevent this.

Attachments (1)

sources.tgz (1.3 KB) - added by sebf 10 years ago.
tar file with mentioned program sources

Download all attachments as: .zip

Change History (9)

Changed 10 years ago by sebf

Attachment: sources.tgz added

tar file with mentioned program sources

comment:1 Changed 10 years ago by int-e

-fno-full-laziness helps actually, but the option must come after the -O option:

# ghc_version 6.8.3
# ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.8.3
# ghc -fforce-recomp condensed.hs -o condensed && /usr/bin/time ./condensed +RTS -M20M
49995000
51.25user 0.18system 0:51.68elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+775minor)pagefaults 0swaps

# ghc -O -fforce-recomp condensed.hs -o condensed && /usr/bin/time ./condensed +RTS -M20M
Heap exhausted;
Current maximum heap size is 19996672 bytes (19 Mb);
use `+RTS -M<size>' to increase it.
Command exited with non-zero status 251
1.88user 0.01system 0:01.92elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5423minor)pagefaults 0swaps

# ghc -fno-cse -O -fforce-recomp condensed.hs -o condensed && /usr/bin/time ./condensed +RTS -M20M
Heap exhausted;
Current maximum heap size is 19996672 bytes (19 Mb);
use `+RTS -M<size>' to increase it.
Command exited with non-zero status 251
1.89user 0.00system 0:01.91elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5423minor)pagefaults 0swaps

# ghc -O -fno-full-laziness -fforce-recomp condensed.hs -o condensed && /usr/bin/time ./condensed +RTS -M20M
49995000
34.84user 0.14system 0:38.53elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+772minor)pagefaults 0swaps

# uname -a
Linux zombie 2.6.28.9 #1 PREEMPT Thu Apr 30 14:58:05 CEST 2009 i686 AMD Athlon(TM) XP 2500+ AuthenticAMD GNU/Linux

ghc-6.10.3 and HEAD of a week ago exhibit similar behaviour.

comment:2 Changed 10 years ago by int-e

# ghc -fno-cse -O should have been # ghc -O -fno-cse of course. But it makes no difference to the result -- -fno-cse does not help here.

comment:3 Changed 10 years ago by simonmar

difficulty: Unknown
Milestone: 6.12 branch
Type: bugrun-time performance bug

We already have a ticket for the full-laziness issue: #917. Do you think there's anything else going on here?

comment:4 in reply to:  3 Changed 10 years ago by sebf

Replying to simonmar:

After examining a core dump I am now pretty sure that the leak is caused by the full-laziness optimization and the original program could benefit from a mechanism to selectively disable it like mentioned by simonpj in that other ticket.

The behaviour of the condensed program is still a bit weird because it is unclear (to me at least) why removing an 'id' call affects the full laziness optimization in this example. However, section 4.9.2 of the GHC user guide says explicitly that "GHC doesn't consistently apply full-laziness" so it may be fine to apply it if all 'id's are there and not if one is missing.

If so, then there is nothing to fix.

comment:5 Changed 10 years ago by int-e

It's a rather more interesting case than #917 - only the first 10k elements of the list [1..] are evaluated, which don't account for the huge memory blowup that we observed.

I've looked at the core for condensed.hs. Basically what happens is that plus gets turned into

plus a b   = S (\c -> let unSac = unS a c
                          unSbc = unS b c
                      in  \d -> if d==0 then []
                                  else unSac (d-1) ++
                                       unSbc (d-1))

and runS gets turned into

runS a = let runa = run a in concatMap runa [10000]

With these two changes the program blows up even without optimizations. It's still quite amazing to me, but apparently there is just enough explicit sharing to hold onto the complete natSum computation in some form.

I'm curious how the id functions affect this - maybe they influence arity analysis and that in turn has an effect on the candidates for floating out?

comment:6 Changed 10 years ago by simonpj

Resolution: duplicate
Status: newclosed

OK, so the conclusion seems to be that, while this case is more interesting than #917, it is simply a manifestation of the full laziness space leak. And alas we don't know a good way to get the benefits of full laziness (sharing computation) without occasionally paying its costs (in terms of space).

I'm going to close this one as a duplicate, and cross-link from #917.

Simon

comment:7 Changed 10 years ago by simonmar

Type of failure: Runtime performance bug

comment:8 Changed 21 months ago by simonpj

Keywords: FloatOut added
Note: See TracTickets for help on using tickets.