Opened 8 years ago

Closed 8 years ago

#5102 closed bug (worksforme)

ghc struggles to compile a large case statement

Reported by: espringe Owned by: igloo
Priority: normal Milestone: 7.4.1
Component: Compiler Version: 6.12.3
Keywords: Cc: michal.terepeta@…
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I'm trying to compile a pronunciation dictionary (cmudict) into an application. I'm first converting the dictionary into haskell source, and then trying to compile it.

After about 7 minutes of compiling, ghc's memory usage was in the gigabytes and causing the computer to start heavily swap.

Attached is self contained generated source file.

Attachments (1)

out.hs.zip (251.3 KB) - added by espringe 8 years ago.
Due to this bug trackers tiny attachment limit, I had to snip around ~100k lines out of the switch statement

Download all attachments as: .zip

Change History (8)

Changed 8 years ago by espringe

Attachment: out.hs.zip added

Due to this bug trackers tiny attachment limit, I had to snip around ~100k lines out of the switch statement

comment:1 Changed 8 years ago by espringe

Compiling the heavily pruned version, takes about 34 seconds and uses considerable memory -- so it might be enough to see the problem. I can upload the full version somewhere else if needed

comment:2 Changed 8 years ago by espringe

For comparison, when converted to c -- gcc compiles the FULL version in 2.43 seconds.

(Granted that it's not a super fair comparison, as the c version is just array of (char*, char*) and does a binary search)

comment:3 Changed 8 years ago by espringe

Summary: ghc struggles to compile a large switch statementghc struggles to compile a large case statement

comment:4 Changed 8 years ago by simonpj

Owner: set to igloo

Ian: could you pls investigate where the time and space is going? Thanks!

comment:5 Changed 8 years ago by michalt

Cc: michal.terepeta@… added
Type of failure: None/UnknownCompile-time performance bug

Not sure if this helps, but I can reproduce this for GHC 7.0.3:

~/bugs/ghc/5102 0 > time ghc --make -fforce-recomp out.hs
[1 of 1] Compiling Main             ( out.hs, out.o )
Linking out ...

real    0m29.649s
user    0m29.320s
sys     0m0.250s

But for HEAD (7.1.20110415) I'm getting:

~/bugs/ghc/5102 0 > time ~/dev/ghc/inplace/bin/ghc-stage2 --make -fforce-recomp out.hs
[1 of 1] Compiling Main             ( out.hs, out.o )
Linking out ...

real    0m8.883s
user    0m8.640s
sys     0m0.190s

The memory usage is similar in both cases though --- around 500 MB.

comment:6 Changed 8 years ago by igloo

Milestone: 7.4.1

comment:7 Changed 8 years ago by igloo

Resolution: worksforme
Status: newclosed

I'm not sure what I'm looking for exactly. There are 33797 branches in the case expression, and the peak memory (according to +RTS -t) is 200M. It seems to be roughly linear in the number of branches.

That's about 6k bytes per branch, which is under 1000 words per branch. That doesn't look too unreasonable for a branch like

"AANCOR" -> Just $ Pronunciation [AA1, N, K, AO2, R]

I don't think there's a specific problem here, although constant factor improvements may well be possible. Please reopen if you disagree.

Note: See TracTickets for help on using tickets.