Opened 3 years ago

Last modified 10 months ago

#13586 new bug

ghc --make seems to leak memory

Reported by: MikolajKonarski Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.0.1
Keywords: Cc: watashi
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #13379 #13564 Differential Rev(s):
Wiki Page:

Description (last modified by MikolajKonarski)

(This is probably not reproducible with a small example.) When I build this project with cabal build

https://github.com/LambdaHack/LambdaHack/commit/138123ab13edd4db6c8143720af68b6ec4a1726e

the peek memory, as observed with top, is 10G*. When I instead interrupt the compilation with ^C at the following point (compilation of this file take a couple of minutes, so it's easy to interrupt):

[123 of 123] Compiling Game.LambdaHack.SampleImplementation.SampleMonadServer ( Game/LambdaHack/SampleImplementation/SampleMonadServer.hs, dist/build/Game/LambdaHack/SampleImplementation/SampleMonadServer.o )

and then restart and continue to the end, peek memory usage in either of the two compilation parts is 5G*. So it seems ghc --make keeps some data that is either eventually not used or could as well be read on demand instead of kept in memory. Confirmed with 8.2.1-rc1 as well, but it's not trivial to compile due to restrictive upper bounds of many packages.

*exact numerical values are made up

Edit: this is a regression, GHC 7.10.3 uses < 3G for compilation without even interrupting

Edit2: which actually doesn't prove the --make leak is a regression. There may just be some other regression that makes the difference between interrupted and non-interrupted compilation under 7.10.3 much smaller and harder to measure.

Change History (13)

comment:1 Changed 3 years ago by jstolarek

Mikołaj, does that problem happen with earlier versions of GHC? Or: to what extent does this happen with earlier versions? I have run into the same problem on my old laptop with 2GB of RAM. With larger projects I often had to kill the build because the system ran out of memory, but then restarting the build lead to successful completion.

One symptom I also experienced was a very long linking time, something that did not happen with GHC 7.10 or 7.8. I speculate that the cause might have been due to cluttering the memory with unnecessary data left from compilation and then running the linker lead to swapping. Again, restarting the build to just finish linking would solve the problem, ie. linking finishing in reasonable time.

comment:2 Changed 3 years ago by MikolajKonarski

Description: modified (diff)

Jan, that was a great hunch --- indeed, this is a regression, GHC 7.10.3 uses < 3G for compilation without even interrupting. It's possible, different versions of some packages I use under 7.10.3 may contribute, but I'd be surprised if it wasn't almost completely the change of GHC version.

comment:3 Changed 3 years ago by MikolajKonarski

Description: modified (diff)

comment:4 Changed 3 years ago by rwbarton

GHC certainly retains information about modules it has finished compiling in --make mode, by design--the information it wrote to the interface file, and would read back from the interface file if needed. It has "always" worked this way, though of course it's possible that the space usage of this retained data has increased, or that there is other data being retained unintentionally.

comment:5 Changed 3 years ago by MikolajKonarski

comment:6 Changed 3 years ago by MikolajKonarski

comment:7 Changed 2 years ago by MikolajKonarski

I've just made a change that lowered RAM usage a lot (it no longer trashes my computer; perhaps it uses the same total amount of RAM+swap, I didn't measure). Most of specialization was occurring originally in a single module and now I'm already specializing some of that in another module.

This would indicate RAM usage of the compiler is not linear in the number of specializations occurring in a single module. Perhaps it really uses non-linear amounts of memory and perhaps it just looks through the list of specializations from the current module from time to time and so they can't just get swapped out and stay that way, but are brought to back to RAM too often, causing swap trashing.

Last edited 2 years ago by MikolajKonarski (previous) (diff)

comment:8 Changed 2 years ago by bgamari

Do you have an small-ish example which exhibits this? We should profile it if so.

comment:9 Changed 2 years ago by MikolajKonarski

Nope, and reducing the example reduces the problem, because it needs lots of specializations, so lots of code, to trigger. However, I guess one could construct a cheaper, artificial example with n simple functions specialized to m types and thus get n*m specializations (I only have 1--2 types for each functions in my example, so I need lots of code). I wonder if we already have such example in GHC test suite. If so, we'd only need a variant where specializations is split between 2 modules and compare the time/heap as n and m grow.

comment:10 Changed 18 months ago by MikolajKonarski

Just one more data point, showing (I guess) how big .hi files loaded into memory and never freed can lock up lots of RAM during compilation: this commit

https://github.com/LambdaHack/LambdaHack/commit/9adc5ee93ab32a9a1ba949362371a6a28abf446a

lowers maximum resident set size of GHC during compilation from 4.5G to 2.8G, as measured with /usr/bin/time -v cabal build -j1on Ubuntu with GHC 8.4.3. As reported in other comments, it's also the case that interrupting the compilation and then restarting it lowers resident set size considerably.

Before the commit, two RAM usage peaks coincide when compiling the library section of the .cabal file --- one peak from 120 large .hi files loaded into memory (I guess ~2G) and another from an excessive amount of specialisations performed when compiling a single module that provides a concrete implementation of a certain monad. The commit just moves the specialization to executable section of the .cabal file thus separating the peaks.

comment:11 Changed 11 months ago by watashi

Cc: watashi added

comment:12 Changed 11 months ago by ulysses4ever

GHC certainly retains information about modules it has finished compiling in --make mode, by design--the information it wrote to the interface file

Question: would it be reasonable to add a flag of how much memory is allowed for this kind of buffering? Upon reaching the limit, GHC could clear the buffer up and resort to fetching hi-files on demand.

Last edited 11 months ago by ulysses4ever (previous) (diff)

comment:13 Changed 10 months ago by bgamari

Would it be reasonable? Perhaps. Would it be easy to implement? I don't believe so. You would need to somehow walk the heap looking for references to freed interfaces so that they can be GC'd (or make any reference that might refer to something in another module weak, which would come at quite some cost).

Note: See TracTickets for help on using tickets.