Opened 13 years ago

Last modified 3 years ago

#947 new bug

ghc -O space leak: CSE between different CAFs

Reported by: int-e@… Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.5
Keywords: CSE Cc:…, jwlato@…
Operating System: Linux Architecture: x86
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


Consider the following program for generating the 1,000,000th prime:

module Main (main) where

sieve0 (n:ns) (p:ps)
    | p*p == n  = sieve0 (filter (\n -> n`mod`p /= 0) ns) ps
    | otherwise = n:sieve0 ns (p:ps)

primes0 :: [Int]
primes0 = 3:sieve0 [5,7..] primes0

primes :: [Int]
primes = 2:3:sieve0 [5,7..] primes0

main = print $ primes !! 1000000

The intention of the separate primes0 function is to limit the number of primes that need to be held in memory. Unfortunately, ghc -O combines the common subexpressions in primes0 and primes so this effort is wasted. The resulting program needs noticably more memory than required, as can be seen by replacing the definition of primes by

primes = 2:3:sieve0 (iterate (2+) 5) primes0

which prevents the CSE from happening.

Excerpt of +RTS -s output, original version:

12,099,221,160 bytes allocated in the heap
279,720,116 bytes copied during GC
 15,834,912 bytes maximum residency (6 sample(s))

modified version that prevents CSE:

127,736,408 bytes copied during GC (scavenged)
    233,388 bytes copied during GC (not scavenged)
     30,624 bytes maximum residency (1 sample(s))

Tested with ghc 6.4.2 and a current (as of 2006-10-17) darcs ghc 6.5.

Attachments (2)

0001-Don-t-CSE-things-with-recursive-type-constructors-94.patch (1.6 KB) - added by michalt 7 years ago.
nofib (154.2 KB) - added by michalt 7 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 Changed 13 years ago by simonpj

Milestone: 6.8

I don't know any way to solve this problem automatically. But I do think it should be under your control.

For now, you can say -fno-cse.

However, I recently changed GHC to do no CSE in functions marked INLINE or NOINLINE. However that does not work in this case, because the sub-expressions get floated out. I'll try to fix that when I next visit the simplifier. That would allow per-definition control of CSE, which is more what you want.

comment:2 Changed 12 years ago by simonpj

Milestone: 6.8 branch_|_

Actually, it turns out that you do get per-definition control using INLINE, because INLINE switches off all floating-out and floating-in on the enclosed expression. I'll document this behaviour. Meanwhile I won't close this bug, becuase we still have not Truly Glorious Solution, but I'll milestone it "bottom".


comment:3 Changed 7 years ago by michalt

Type of failure: None/Unknown

Unless I'm missing something, I think there is a simple way to avoid doing CSE on recursive types (in the sense of isRecursiveTyCon) -- one can simply avoid adding variables with such types to the CSE mapping (see attached patch). The problem here is that, according to nofib, this limits CSE quite a bit and hurts code size. Another possibility (although quite ugly) would be to limit the above to lists -- they are probably the main source of space leaks cause by CSE..

The patch "almost" validates -- the 5644 fails. Apparently the test is based on the assumption that the program overflows heap, which does not happen with the patch. I'm not sure what 5644 actually tests and so don't really know how to fix it.

So all in all, I'm not sure if this patch is worth it. What do you think?

Changed 7 years ago by michalt

Attachment: nofib added

comment:4 Changed 7 years ago by michalt

Status: newpatch

comment:5 Changed 7 years ago by simonpj

I doubt it's worth it. What about a non-recursive type that contains a recursive one, such as ([Int], Bool). And sometimes you might WANT sharing.

I'm disinclined to fiddle with this at the moment, esp as there is a way to control it manually.

comment:6 Changed 7 years ago by simonpj

Status: patchnew

comment:7 Changed 7 years ago by liyang

Cc:… added

comment:8 Changed 7 years ago by jwlato

Cc: jwlato@… added

comment:9 Changed 7 years ago by simonpj

John: presumably you've added yourself to the cc because this has bitten you?

I have one other very simple suggestion: do no CSE for top-level CAFs. Rationale:

  • It's not a good plan to rely on CSE for major sharing of really expensive expressions; if it really matters you'll want to do it yourself by let-binding the shared thing. CSE is really intended for picking up nickles and dimes in inner loops.
  • Top-level things are evaluated at most once anyway, so the gain from sharing just dime's worth of work at top level is not worth having.

I suspect that this simple change would deal with this ticket nicely. What do others think?


comment:10 in reply to:  9 Changed 7 years ago by int-e

Replying to simonpj:

I have one other very simple suggestion: do no CSE for top-level CAFs.

Sounds like a great idea to me. An added benefit is that it would lend more safety to constructing top-level mutable variables (like IORefs). The loss of sharing should be negligible for most purposes, but I can think of a few cases where giving names to shared parts would be awkward:

  • repeated string constants when doing text processing
  • repeated numerical values, in particular large constants containing repetitions (the worst of this can probably be avoided by just doing CSE before floating stuff out so that [0,1,0,1] ends up with the zeros and ones shared).

comment:11 Changed 7 years ago by jwlato

simonpj: we do get better results with -fno-cse, but haven't looked into it much so I can't say if this is the exact issue.

No CSE for top-level CAFs sounds good to me too.

comment:12 Changed 3 years ago by simonpj

Keywords: CSE added
Note: See TracTickets for help on using tickets.