Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#10289 closed bug (worksforme)

compiling huge HashSet hogs memory

Reported by: zudov Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.10.1
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Compiling a huge (~2.5k elements) set with GHC-7.10.1 or GHC-head and unordered-containers-0.2.5.1 or unordered-containers-head takes way too much memory. Here is a file which I am trying to compile. I've also set up a travis build which demonstrates the behaviour with different versions of GHC and unordered-containers. Further I would be referring to this build-job which uses GHC-7.10.1 and unordered-containers-0.2.5.1.

When the build uses GHC-7.8.4, neither of this hogging occurs. Another interesting observation is that compiling HashMap of the same size, doesn't cause memory hogging even with -O2. This attracted my attention as HashSet is implemented in terms of HashMap.

I reported the issue to unordered-containers as well: link.

Change History (12)

comment:1 Changed 4 years ago by simonpj

What happens if you don't have the call to S.fromList? So your file looks like

data :: [Text]
data = [ pack "foo", pack "bar", ...etc... ]

Another thing to try is to cut it down a lot, use -dshow-passes and compare HashSet with HashMap. With luck you'll see that the latter is much larger on small examples too. Then you can use ddump-simpl to see what the expanded code looks like. Using -ddump-inlinings shows you what is being inlined.

The fact that HashSet and HashMap differ here makes me think that this is to do with over-zealous inlining or rule-rewriting in HashSet.

(-ddump-rule-firings to see which rewrite rules are firing.)

Simon

comment:2 Changed 4 years ago by zudov

What happens if you don't have the call to S.fromList?

This compiles fine with -O2. However if I use -O0 -fno-ignore-interface-pragmas then memory consumption goes high again.

That seems a little strange to me. How do you think, is it a right way to reproduce the problem without depending on unordered-containers (if it is, then we probably can just experiment with list and come back to HashSet later)?

I tried to observe the behaviour of a smaller list compilation with -ddump-inlinings:

  • Compilation with -O2 -ddump-inlinings reports inlining of Data.Text.pack and GHC.Base.build on each element
  • Compilation with -O0 -ddump-inlinings reports inlining of:
    • Data.Text.pack
    • Data.Text.Internal.Fusion.unstream
    • Data.Text.Internal.Fusion.Common.map
    • Data.Text.Internal.Fusion.Common.streamList
    • Data.Text.Internal.safe
    • Data.Text.Internal.Fusion.Types.$WYield

Please suggest if it makes sense to dig into that direction.

Version 0, edited 4 years ago by zudov (next)

comment:3 Changed 4 years ago by simonpj

Well -O0 -fno-ignore-interface-pragmas is a bit of a funny combination. I can't say why memory use should be high then, but it's probably better to focus on -O0 or -O or -O2, which are what people actually use.

The difficulty here is it's hard to tell whether GHC is at fault, or the INLINE pragmas and/or RULEs in the libraries. I just don't have time to investigate at the moment. Maybe someone else does?

I made a couple of suggestions in my previous comment. If I was investigating, those are the things I'd try first.

Simon

comment:4 Changed 4 years ago by thoughtpolice

After looking back at my tests, the situation should be significantly better with GHC 7.10.2 based on my quick examination; the resident memory usage for me at least looks to be closer to 2GB on my machine. However, the total build time seems to be worse (1m52s to compile EntrySet at -O2 vs your Travis machines ~30 seconds, but only a maximum residency of 2GB).

So there's still more to be done here, but enabling -O2 shouldn't cripple you anymore at least.

Would you mind giving this a go with the latest ghc-7.10 branch (or the 7.10.2 RC, which will be out soon?) You can use Herbert's PPA in combination with travis to get automated testing.

comment:5 in reply to:  4 Changed 4 years ago by zudov

I've just tried to run the build, and it still runs OOM. I guess travis just doesn't have enough memory.

https://travis-ci.org/zudov/html5-entity/jobs/69424559#L519

Replying to thoughtpolice:

After looking back at my tests, the situation should be significantly better with GHC 7.10.2 based on my quick examination; the resident memory usage for me at least looks to be closer to 2GB on my machine. However, the total build time seems to be worse (1m52s to compile EntrySet at -O2 vs your Travis machines ~30 seconds, but only a maximum residency of 2GB).

So there's still more to be done here, but enabling -O2 shouldn't cripple you anymore at least.

Would you mind giving this a go with the latest ghc-7.10 branch (or the 7.10.2 RC, which will be out soon?) You can use Herbert's PPA in combination with travis to get automated testing.

comment:6 Changed 4 years ago by bgamari

Status: newinfoneeded

zudov, I'm having trouble reproducing this. With ghc 7.10.2, unordered-containers-0.2.5.1, and text-1.2.1.3 I find the following,

$ ghc -O EntitySet.hs -fforce-recomp -ddump-inlinings +RTS -s 
[1 of 1] Compiling Text.Html.Entity.Data.EntitySet ( EntitySet.hs, EntitySet.o )
Inlining done: Data.HashSet.fromList
Inlining done: Data.HashMap.Base.empty
Inlining done: Data.Text.pack
Inlining done: GHC.Base.build
Inlining done: Data.Text.pack
Inlining done: GHC.Base.build
...                                  # goes on for a few thousand lines
Inlining done: Data.Text.pack
Inlining done: GHC.Base.build
Inlining done: GHC.Base.foldr
Inlining done: GHC.Base.id

   2,541,860,520 bytes allocated in the heap
     412,058,088 bytes copied during GC
      57,559,000 bytes maximum residency (11 sample(s))
       3,015,736 bytes maximum slop
             140 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       776 colls,     0 par    0.405s   0.406s     0.0005s    0.0141s
  Gen  1        11 colls,     0 par    0.261s   0.261s     0.0237s    0.0615s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.003s  (  0.003s elapsed)
  MUT     time    1.533s  (  1.660s elapsed)
  GC      time    0.666s  (  0.667s elapsed)
  EXIT    time    0.018s  (  0.018s elapsed)
  Total   time    2.232s  (  2.349s elapsed)

  Alloc rate    1,658,595,906 bytes per MUT second

  Productivity  70.0% of total user, 66.5% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

The inlinings you are observing sounds quite reminiscent of #10528, which should be fixed with text-1.2.1.3. Could you test this?

comment:7 Changed 4 years ago by zudov

bgamari, great I've just tested it on my local machine with latest text and unordered-containers and the problem is gone.

Additionally, here is a travis build report https://travis-ci.org/zudov/html5-entity/builds/87314690 It still runs out of memory on ghc-7.10.1, but on ghc-7.10.2 everything is good. (compare with the previous build which used older text https://travis-ci.org/zudov/html5-entity/builds/69424551)

Thanks a lot.

I think we can close this issue now.

comment:8 Changed 4 years ago by zudov

Resolution: fixed
Status: infoneededclosed

comment:9 Changed 4 years ago by thomie

Milestone: 8.0.1

comment:10 Changed 4 years ago by thomie

Resolution: fixed
Status: closednew

comment:11 Changed 4 years ago by thomie

Milestone: 8.0.1
Resolution: worksforme
Status: newclosed

comment:12 Changed 4 years ago by thomie

Type of failure: Runtime performance bugCompile-time performance bug
Note: See TracTickets for help on using tickets.