Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#9332 closed bug (fixed)

Memory blowing up for strict sum/strict foldl in ghci

Reported by: artella.coding Owned by:
Priority: high Milestone: 7.10.1
Component: GHCi Version: 7.8.3
Keywords: ghci, strict, memory Cc: hvr, oerjan
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by artella.coding)

(http://stackoverflow.com/questions/24838982/memory-blowing-up-for-strict-sum-strict-foldl-in-ghci)

Suppose we have

--Test.hs
import Data.List

longList::[Int]
longList = [1 .. ]

result :: Int
result = foldl' (+) 0 $ takeWhile (< 10000000) longList

main = do
  print $ result

If one deletes all the ".hi" and ".o" files and then loads into ghci via :

ghci -fobject-code Test.hs

then upon running "main" the memory blows up.

Change History (14)

comment:1 Changed 5 years ago by artella.coding

Description: modified (diff)

comment:2 Changed 5 years ago by artella.coding

Priority: normalhigh

comment:3 Changed 5 years ago by artella.coding

Milestone: 7.8.4

comment:4 Changed 5 years ago by artella.coding

Resolution: fixed
Status: newclosed

comment:5 Changed 5 years ago by nomeata

I’m very confused. You report a bug, raise the priority and then close it. Is the bug not a bug afer all?

comment:6 Changed 5 years ago by artella.coding

Hi, I am a bit confused myself.

When I marked it to high, I did not know a way around the issue. However later I found a solution (see http://stackoverflow.com/questions/24838982/memory-blowing-up-for-strict-sum-strict-foldl-in-ghci) and I closed it (because I didn't want a ticket to be marked as an open bug if it wasn't a bug).

Following the comments in http://stackoverflow.com/questions/24838982/memory-blowing-up-for-strict-sum-strict-foldl-in-ghci I found that the only way that I could run the code without it blowing up in ghci is to run it as follows :

ghci -fobject-code Test

and to have the header :

module Main (main) where

What would be your suggestion? Is it a bug? Thanks

comment:7 Changed 5 years ago by artella.coding

OK things are a bit clearer now, but I am still not certain as to whether it is a bug or not. If we have :

--Test.hs
module Main (main) where
import Data.List

longList::[Int]
longList = [1 .. ]

result :: Int
result = foldl' (+) 0 $ takeWhile (< 10000000) longList

main = do
  print $ result

Now if we do :

a)First clean all .o and .hi files. b)Now compile as "ghc Test". This creates .hi and .o files. c)Now load into ghci via "ghci Test" d)"longList" is in scope. Therefore unsurprisingly running main blows up.

Now if we do the following : a)Clean all .o and .hi files. b)Load into ghci via "ghci -fobject-code Test" c)Now "longList" is not in scope. This time running main does not blow up.

I think that Orjan was trying to explain this in http://stackoverflow.com/a/24840786/917635

comment:8 Changed 5 years ago by nomeata

Not a bug.

longList is a CAF (e.g. a global not-yet-evaluated constant). When you load it in GHCi and evaluate result, GHC doesn’t know whether you are going to use longList again, and has to keep it in memory.

But if you compile it is a program, or as a module only exporting main, GHC will make longList a local definition that, after the evaluation of Result is started, can be garbage collected. (With -O it will probably fuse it away even, but that is unrelated here.)

But thanks for reporting!

comment:9 Changed 5 years ago by oerjan

Cc: oerjan added

It's still a little fishy that it's treating a missing module declaration differently from the officially equivalent module Main (main) where.

comment:10 Changed 5 years ago by artella.coding

If I modify the code above to the following, it blows up and this time using the methods above do not help :

--Sort.hs
module Sort (result) where
import Data.List

longList::[Int]
longList = [1 .. ]

result :: Int
result = foldl' (+) 0 $ takeWhile (< 10000000) longList
--Main.hs
module Main (main)  where
import Sort (result)

main = do
  print $ result

Then running with :

ghci -fobject-code Main.hs

results in memory blowing up.

comment:11 Changed 5 years ago by simonpj

There is a difficult tension here, described in Chapter 23 of my 1987 book, namely the tension between sharing computation and saving space. I do not know of any comprehensive answer.

The problem tends to bite less when (as is commonly the case) numbers like 1000000 don't appear in your program but rather are computed from the input or the command line flags. But it's still definitely a problem (e.g. with [1..].)

A possible solution might be to revert any CAFs that are retaining a great deal of space, by turning them back into their un-computed form. (Of course, their uncomputed form might retain a great deal of space too!)

Simon

comment:12 in reply to:  11 Changed 5 years ago by artella.coding

"namely the tension between sharing computation and saving space." - but in this case there doesn't seem to be any sharing of computation (unless I am mistaken). The key things I noted were :

a)The result of takeWhile (< 10000000) longList seems to be persisting, and not longList itself

b)The result of takeWhile (< 10000000) longList is only used once in subsequent invocations of main, and therefore it is not shared between computations

My way that I arrived at my deductions were as follows :

a)Firstly we note that it is the list resulting from takeWhile (< 10000000) longList which seems to be persisting, and not longList itself. This is supported by the fact that if you replace it with takeWhile (< 10) longList the program no longer blows up. If it was longList that was persisting, then the program would blow up in both cases.

b)So given that takeWhile (< 10000000) longList persists (and not longList), the question then remains as to whether takeWhile (< 10000000) longList is shared between subsequent computations. In order to test this I put a trace command to see whether takeWhile (< 10000000) longList is invoked twice when main is run twice from within ghci. That is if we modify the program in comment 10 (https://ghc.haskell.org/trac/ghc/ticket/9332#comment:10) to the following :

--Sort.hs
module Sort (result) where
import Data.List
import Debug.Trace

longList::[Int]
longList = [1 .. ]

result :: Int
result = foldl' (+) 0 $ (trace "test") takeWhile (< 10) longList

and :

--Main.hs
module Main (main)  where
import Sort (result)

main = do
  print $ result

Then we find that upon running main twice in ghci we get :

> main
test
45
> main
45

So we note that "test" is only printed out once, which indicates that once result is calculated its value is cached, which means that the list arising from takeWhile (< 10) longList is only used once, and hence is not shared between subsequent computations. Therefore if the result of takeWhile (< 10) longList is not shared between subsequent computations, why does it persist in memory?

Thanks

Replying to simonpj:

There is a difficult tension here, described in Chapter 23 of my 1987 book, namely the tension between sharing computation and saving space. I do not know of any comprehensive answer.

The problem tends to bite less when (as is commonly the case) numbers like 1000000 don't appear in your program but rather are computed from the input or the command line flags. But it's still definitely a problem (e.g. with [1..].)

A possible solution might be to revert any CAFs that are retaining a great deal of space, by turning them back into their un-computed form. (Of course, their uncomputed form might retain a great deal of space too!)

Simon

comment:13 Changed 5 years ago by rwbarton

artella, your observations seem correct to me. I am not sure, but I think that perhaps GHCi does not garbage collect CAFs, ever. Perhaps a feature request?

comment:14 Changed 5 years ago by thoughtpolice

Milestone: 7.8.47.10.1
Note: See TracTickets for help on using tickets.