Opened 2 years ago

Closed 2 years ago

#14161 closed bug (fixed)

Performance Problems on AST Dump

Reported by: h4ck3rm1k3 Owned by: dfeuer
Priority: low Milestone: 8.4.1
Component: Compiler Version: 8.3
Keywords: Cc:
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D3894
Wiki Page:

Description

Using the latest self compiler ghc

ghc -O -ddump-to-file -ddump-parsed-ast

with the file https://raw.githubusercontent.com/h4ck3rm1k3/gcc-ontology/master/tests/example_python_ast_in_haskell.hs

After 2 hours I stopped it : PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

20 0 1.000t 0.014t 0 D 3.3 94.0 108:29.80 ghc

Change History (14)

comment:1 Changed 2 years ago by h4ck3rm1k3

/usr/local/bin/ghc --version The Glorious Glasgow Haskell Compilation System, version 8.3.20170819

git log : commit 1cdceb9fa3bc3ad01b2d840caad8e735513e14ed Author: Ben Gamari <ben@…> Date: Sat Aug 19 07:44:13 2017 -0400

comment:2 Changed 2 years ago by dfeuer

Milestone: 8.2.2
Owner: set to dfeuer

I think I have a decent guess about what might be going on here. It looks like there's lots and lots of recursive string concatenation going on. Let me see if I can fix it.

comment:3 Changed 2 years ago by dfeuer

Actually, that's only one piece of the puzzle. Another major problem is that when showAstData is called in HscMain, its (String) result is converted to an SDoc using text. In order to even think about rendering that result, Pretty calculates its length. This forces the entire dump to be built in memory as a String. I don't have enough memory to do that on my system!

I'm working on a patch to make showAstData produce a Doc right from the start rather than a String. Initial experiments suggest that this has a much better chance of working. One question: when should one produce a Doc and when should one produce an SDoc? I'm pretty unclear on that bit.

comment:4 Changed 2 years ago by dfeuer

Yep, my patch fixes this problem. I just need to figure out the Doc/SDoc business and I can upload a differential. My patched version produces somewhat different formatting (I'm no pretty printing expert), but it doesn't look horrible or anything.

comment:5 Changed 2 years ago by dfeuer

Differential Rev(s): Phab:D3894

comment:6 Changed 2 years ago by h4ck3rm1k3

Noob question, I cannot see this branch on the git://git.haskell.org/ghc.git and for some reason I cannot fetch from the phab :

git fetch https://phabricator.haskell.org/diffusion/GHC/glasgow-haskell-compiler.git fatal: unable to access 'https://phabricator.haskell.org/diffusion/GHC/glasgow-haskell-compiler.git/': The requested URL returned error: 500

comment:7 Changed 2 years ago by dfeuer

Status: newpatch

comment:8 Changed 2 years ago by dfeuer

Architecture: x86_64 (amd64)Unknown/Multiple

comment:9 in reply to:  6 Changed 2 years ago by dfeuer

Replying to h4ck3rm1k3:

Noob question, I cannot see this branch on the git://git.haskell.org/ghc.git and for some reason I cannot fetch from the phab :

git fetch https://phabricator.haskell.org/diffusion/GHC/glasgow-haskell-compiler.git fatal: unable to access 'https://phabricator.haskell.org/diffusion/GHC/glasgow-haskell-compiler.git/': The requested URL returned error: 500

The branch isn't on git.haskell.org because I haven't put it there. I suppose I could if you like. As for getting branches from Phabricator.... um ... I'm not the best at that. I think you can use arc patch or something, maybe? bgamari could say for sure.

comment:10 Changed 2 years ago by bgamari

Sadly Phabricator Differentials aren't automatically branches. dfeuer will need to push the branch to git.haskell.org in order for you to check it out with git. Alternatively you can use arcanist to apply the differential (e.g. arc patch D3894) or you can download the patch from Phabricator and apply it to your tree manually.

comment:11 Changed 2 years ago by David Feuer <David.Feuer@…>

In 29da01e0/ghc:

Make parsed AST dump output lazily

Previously, `showAstData` produced a `String`. That `String` would
then be converted to a `Doc` using `text` to implement
`-ddump-parsed-ast`. But rendering `text` calculates the length
of the `String` before doing anything else. Since the AST can be
very large, this was bad: the whole dump string (potentially hundreds
of millions of `Char`s) was accumulated in memory.

Now, `showAstData` produces a `Doc` directly, which seems to work
a lot better. As an extra bonus, the code is simpler and cleaner.
The formatting has changed a bit, as the previous ad hoc approach
didn't really match the pretty printer too well. If someone cares
enough to request adjustments, we can surely make them.

Reviewers: austin, bgamari, mpickering, alanz

Reviewed By: bgamari

Subscribers: mpickering, rwbarton, thomie

GHC Trac Issues: #14161

Differential Revision: https://phabricator.haskell.org/D3894

comment:12 Changed 2 years ago by dfeuer

Status: patchmerge

I suspect this is small enough to merge.

comment:13 Changed 2 years ago by bgamari

Out of curiosity, what are you using the output of this command for, h4ck3rm1k3?

Last edited 2 years ago by bgamari (previous) (diff)

comment:14 Changed 2 years ago by bgamari

Milestone: 8.2.28.4.1
Resolution: fixed
Status: mergeclosed

I have spoken with h4ck3rm1k3 and he says that he doesn't need this for 8.2.2.

Note: See TracTickets for help on using tickets.