Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#9400 closed bug (fixed)

poor performance when compiling modules with many Text literals at -O1

Reported by: rwbarton Owned by: xnyhps
Priority: normal Milestone: 7.10.1
Component: Compiler Version: 7.8.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #9370 Differential Rev(s): Phab:D199
Wiki Page:

Description (last modified by rwbarton)

(Spawned from #9370; see there for original discussion)

Unpack the xmlhtml package and edit src/Text/XmlHtml/HTML/Meta.hs and remove the OPTIONS_GHC line at the top of the file that disables optimizations and build with cabal buidl. Then GHC takes ~1.5GB and over a minute to build this single module.

Preliminary investigation indicates that Text's fromString and its constituent parts is being inlined repeatedly:

Inlining done: Data.String.fromString
Inlining done: Data.Text.$fIsStringText
Inlining done: Data.Text.pack
Inlining done: Data.Text.Internal.Fusion.unstream
Inlining done: Data.Text.Internal.Fusion.Common.map
Inlining done: Data.Text.Internal.Fusion.Common.streamList
Inlining done: Data.Text.Internal.safe
Inlining done: Data.Bits.$fBitsInt_$c.&.
Inlining done: Data.Text.Internal.Fusion.Types.$WYield
[ repeats ~4000 times ]

resulting in a very large intermediate program:

[ 1 of 10] Compiling Text.XmlHtml.HTML.Meta ( src/Text/XmlHtml/HTML/Meta.hs, dist/build/Text/XmlHtml/HTML/Meta.o )
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 26,806, types: 14,467, coercions: 0}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 31,052, types: 20,719, coercions: 0}
Result size of Simplifier
  = {terms: 31,052, types: 20,719, coercions: 0}
*** Specialise:
Result size of Specialise
  = {terms: 32,254, types: 22,696, coercions: 448}
*** Float out(FOS {Lam = Just 0, Consts = True, PAPs = False}):
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              PAPs = False})
  = {terms: 63,022, types: 59,045, coercions: 448}
*** Float inwards:
Result size of Float inwards
  = {terms: 63,022, types: 59,045, coercions: 448}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 28,537, types: 18,902, coercions: 654}
Result size of Simplifier iteration=2
  = {terms: 28,157, types: 18,257, coercions: 152}
Result size of Simplifier iteration=3
  = {terms: 28,074, types: 18,128, coercions: 140}
Result size of Simplifier iteration=4
  = {terms: 28,068, types: 18,096, coercions: 61}
Result size of Simplifier
  = {terms: 28,068, types: 18,096, coercions: 61}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 43,941, types: 38,325, coercions: 61}
Result size of Simplifier
  = {terms: 43,941, types: 38,325, coercions: 61}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 689,670, types: 461,960, coercions: 146,725}
...

[Edited: old output was not actually for `-O1` as described, but rather for the sequence described in #9370 of building with `-O0` after building another module with `-O1`.]

Change History (13)

comment:1 Changed 5 years ago by rwbarton

Replying to simonpj's comment on #9370:

I would guess so too. But WHY is 4000 copies of fromString getting inlined. I doubt GHC is doing that unaided. I bet it's an INLINE pragma or RULE in Text. And if so, it's a landmine waiting to kill new victims.

Well, there are 4000+ string literals in the module (which is using OverloadedStrings): 2000+ XML entity names and the corresponding Unicode strings that they represent. Does GHC have any way to understand that while inlining a definition in one or two or a dozen places may be good, perhaps inlining it in 4000 places may be less good?

At any rate, a module containing 4000 overloaded string literals is not inordinately pathological, so we should try to do better here. I will investigate in more detail.

comment:2 Changed 5 years ago by rwbarton

Description: modified (diff)

comment:3 Changed 5 years ago by rwbarton

Resolution: invalid
Status: newclosed

OK this is a bit funny.

Normally a Text literal "abc" gets desugared as

fromString $fIsStringText (unpackCString# "abc"#)

Now fromString $fIsStringText = pack, and pack = unstream . S.map safe . S.streamList, and there is a rule in Data.Text

{-# RULES "TEXT literal" forall a.
    unstream (S.map safe (S.streamList (GHC.unpackCString# a)))
      = unpackCString# a #-}

and Data.Text.unpackCString# has a NOINLINE pragma so we end up with the nice small code: Data.Text.unpackCString# "abc".

But, a single-character literal "a" is instead desugared as

fromString $fIsStringText (: (C# 'a') ([]))

and now there is no rule which matches this pattern. And unstream is marked INLINE [0], as Simon predicted; and it's rather large. And most XML entities represent single Unicode characters, so GHC generated around 2000 copies of unstream.

I don't know why there is an INLINE pragma on unstream. Perhaps no good reason. But anyways, there is a simple fix to the text package: add another rule to match the pattern unstream (S.map safe (S.streamList [c])). (And similarly for empty string literals.)

Edit: Reported as https://github.com/bos/text/issues/84.

Last edited 5 years ago by rwbarton (previous) (diff)

comment:4 Changed 5 years ago by simonpj

Another possibility might be to remove the special behaviour of desugaring strings from GHC. Why is it there? I think it's in case you do

case "x" of
  'x' : ys -> blah

or something like that. But perhaps a better plan would be to make unpackCString# "foo" respond to exprIsConApp_maybe with "yes, I'm a cons; my head is 'f' and my tail is unpackCString# "oo".

So before making Text more complicated, maybe we should explore making GHC less complicated!

Simon

comment:5 Changed 5 years ago by simonpj

Just to add a bit more detail:

  • The special handling for singleton string literals is in MkCore.mkStringExprFS. Omittign the lengthFS str == 1 case looks very plausible to me.
  • Notice that this same function sometimes generates unpackCStringUtf8 rather than unpackCString. Rules that recognise only the latter will stumble if the former is produced. I have no clue what to do about this -- Unicode experts may.
  • The function CoreSubst.exprIsConApp_maybe is the place where we say "does this expression look like a constructor application?". Adding an extra case to go, below the ones for dataConWorkId and DFunFunfolding should let us dynamically expand a call to unpackCString into a cons cell. And that will actually be better than now, because it should eliminate
      case "foo" of
        'f' : 'o' : xs -> ....
    

Would someone care to try?

Simon

comment:6 in reply to:  4 Changed 5 years ago by rwbarton

Replying to simonpj:

Another possibility might be to remove the special behaviour of desugaring strings from GHC. Why is it there?

I was guessing that perhaps the generated code is smaller for a one-character list than for a call to unpackCString#. But if that's true, it's not a very good reason to put that logic in the desugarer; we could rewrite unpackCString# "a" to 'a' : [] in a later optimizer pass, after RULES have fired.

comment:7 Changed 5 years ago by simonpj

Resolution: invalid
Status: closednew

Re-opening because I think we should fix GHC as comment:4 etc.

Simon

comment:8 Changed 5 years ago by simonpj

Would anyone be willing to try the fix I describe above? I can advise.

Simon

comment:9 Changed 5 years ago by xnyhps

Owner: set to xnyhps

This looked like quite a simple bug I could work on, so I decided to have a look.

  • Removing the lengthFS str == 1 case in mkStringExprFS does not appear to break anything.
  • I've managed to modify CoreSubst.exprIsConApp_maybe to split calls to unpackCString# or unpackCStringUtf8#. It'll also recognize when it's a single-character string and split it into (':', [c, nil]), to avoid ever creating calls to unpackCString*# ""#. Looking at the generated Core, I've verified that it now pushes unpackCString*# calls through case-statements.

This means that, even at -O0:

main = case "abc" of
    (x:xs) -> putStrLn xs

compiles to main = System.IO.putStrLn GHC.CString.unpackCString# "bc"#

  • I haven't timed it, but src/Text/XmlHtml/HTML/Meta.hs now compiles quickly and with only ~180MB RAM.
  • unpackCString# and unpackCStringUtf8# seem very similar, except that unpackCStringUtf8# parses UTF8. At least for mkStringExprFS it could work to only use unpackCStringUtf8#, with a slight run-time penalty (calling leChar# for each character in the string), but the only benefit would be making rewrite rules easier.

comment:10 Changed 5 years ago by xnyhps

Differential Rev(s): Phab:D199

comment:11 Changed 5 years ago by Austin Seipp <austin@…>

In fe9f7e40844802443315ef2238c4cdefda756b62/ghc:

Remove special casing of singleton strings, split all strings.

Summary:
exprIsConApp_maybe now detects string literals and correctly
splits them. This means case-statemnts on string literals can
now push the literal into the cases.

fix trac issue #9400

Test Plan: validate

Reviewers: austin, simonpj

Reviewed By: austin, simonpj

Subscribers: simonmar, ezyang, carter

Differential Revision: https://phabricator.haskell.org/D199

GHC Trac Issues: #9400

comment:12 Changed 5 years ago by thoughtpolice

Milestone: 7.10.1
Resolution: fixed
Status: newclosed

Awesome! Merged, thank you!

comment:13 Changed 5 years ago by rwbarton

By the way, bos has added the rules for Data.Text.fromString on 0- and 1-character literals in text-1.2.0.0, and I confirmed that the original issue of this ticket does not arise with that version.

Note: See TracTickets for help on using tickets.