Opened 22 months ago

Last modified 22 months ago

#14490 new bug

TTG Snags

Reported by: alanz Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.3
Keywords: Cc: bgamari, simonpj, Shayan-Najd, trommler
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #14482 Differential Rev(s):
Wiki Page: ImplementingTreesThatGrow

Description

Context

ImplementingTreesThatGrow

The intention is to get the first-pass implementation of Trees that Grow into GHC 8.4.1.

So far the following commits are on master

Unfortunately we have picked up compile-time performance regressions in compiling GHC itself, with these patches. The perf.haskell.org results are

Summary: each of the three patches caused GHC validate time to worsen by another 5% or so, cumulatively pushing compile time from 2011 to 2400 seconds.

It seems the major culprit is deriving the Data instances, probably related to the complex constraint sets required to ensure that all the extension points have Data instances too.

Some discussion around this is captured in ImplementingTreesThatGrow/Instances

Things tried so far

PLAN E (ImplementingTreesThatGrow/Instances#PLANE), in https://github.com/ghc/ghc/tree/wip/ttg5-data-one-file-2017-11-17.

This puts all the Data derivations for the hsSyn AST into a single file, hsSyn/HsInstances.hs.

This causes compile-time memory usage to spike up to around 7GB when compiling this file.

Gipedia reports a compilation time of 2468 secs for this version, which perhaps indicates that the instance derivation is not the main culprit. The module itself only takes in the order of 45secs to compile, on home.smart-cactus.org (http://lpaste.net/7779809210564345856)

Splitting the HsInstances.hs file in 2, as per https://github.com/ghc/ghc/tree/wip/ttg5-data-2017-11-17 reduces the memory requirement for each, but shows up #14482, where there is a linker failure due to incorrectly processing boot files.

Next Steps

I guess the first thing to do is to try to isolate what precisely is causing the slowdown. It may not be instance generation at all.

Also, it is probably better to revert the three patches from master, when cutting the 8.4 initial freeze.

Open to suggestions ...

Attachments (2)

differences.txt (48.8 KB) - added by alanz 22 months ago.
Main.hs (2.9 KB) - added by alanz 22 months ago.

Download all attachments as: .zip

Change History (48)

comment:1 Changed 22 months ago by Shayan-Najd

Splitting the HsInstances.hs file in 2, as per ​ https://github.com/ghc/ghc/tree/wip/ttg5-data-2017-11-17 reduces the memory requirement for each, but shows up #14482, where there is a linker failure due to incorrectly processing boot files.

It seems that #14482 is fixed. If yes, then what's the compile-time of GHC itself in this branch?

It may not be instance generation at all.

Can we somehow time the constraint solver, before and after the patches?

Can we try Plan B and time it? (That is: instead of one Data instance for the HsSyn traversals, make three.)

comment:2 Changed 22 months ago by simonpj

I think you are referring to Plan B on ImplementingTreesThatGrow/Instances.

Why not do Plan C? It generates one third of the amount of code and is really simple.

comment:3 Changed 22 months ago by alanz

I agree on Plan C.

But I am currently concerned that the major slowdown is not just from the instances.

I understand bgamari is going to revert the TTG commits, and there is now a detailed timing flag (https://phabricator.haskell.org/rGHC2da7813b771edcb3efb1b067b986d10f5de4beaf) which we will be able to use to compare the compilation time for the before and after versions.

This should at least give us an idea of where the major impacts are coming from, in terms of modules.

comment:4 Changed 22 months ago by Shayan-Najd

Yes, Plan B.

I only suggest Plan B to reduce the constraint sets by specialising them. If it doesn't affect the performance, I agree that the alternatives with less code is preferred.

comment:5 Changed 22 months ago by bgamari

I'd like to point out that I introduced -ddump-timings last week specifically to help in benchmarking liking that suggested by Shayan in comment:1.

comment:6 Changed 22 months ago by simonpj

OK. Sounds as if we are agreed on Plan C unless we discover a reason not to follow it. Of course that may or may not solve our problem. But it simplifies solving whatever remaining problem we have.

comment:7 Changed 22 months ago by alanz

And it probably makes sense to do it in the context of the HsInstances.hs file, as then it groups all the instances together.

Or should they happen in the original locations?

comment:8 Changed 22 months ago by alanz

The other concern is the 8.4 freeze.

We don't actually have any idea how long it is going to take to sort this out.

I know Ben is concerned about having to cherry-pick across TTG if it lands in the next month or two, while 8.4 is still settling down.

So, do we revert, and investigate on a long-lived branch, or carry on investigating?

Or revert and investigate offline, but hold off the 8.4 freeze?

comment:9 Changed 22 months ago by Shayan-Najd

I am not sure about the impact of grouping on the compile-time of GHC itself. But, I believe we eventually want to group Data (and others like Outputable) instances.

comment:10 Changed 22 months ago by simonpj

For 8.4, it's up to Ben our release manager, and (if he needs to consult) our devops group.

It would not be terrible to leave it out of 8.4; because 8.6 will be along in six months.

comment:11 Changed 22 months ago by alanz

I am getting

instance Data (XOverLit (GhcPass p)) where
----
compiler/hsSyn/HsLit.hs:143:10: error:
     Illegal type synonym family application in instance:
        XOverLit (GhcPass p)
     In the instance declaration for Data (XOverLit (GhcPass p))
    |
143 | instance Data (XOverLit (GhcPass p)) where
    |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Am I missing something?

comment:12 Changed 22 months ago by Shayan-Najd

A quick take: I am not sure if you are doing the right thing: typeclass instances for open type families? (I can now see that Plan C also has something similar to this).

comment:13 Changed 22 months ago by bgamari

Simonpj mentioned to me that Plans C and D are not feasible; they call for instances of type functions, which cannot be defined.

comment:14 Changed 22 months ago by Shayan-Najd

Then, back to comment 1:

can we try Plan B and time it? (That is: instead of one Data instance for the HsSyn traversals, make three.)

comment:15 Changed 22 months ago by Shayan-Najd

Simonpj mentioned to me that Plans C and D are not feasible; they call for instances of type functions, which cannot be defined.

Updated the wiki accordingly.

comment:16 Changed 22 months ago by alanz

Comparing compile performance for current master (with TTG) and after reverting.

  1. Check timing on home.smart-cactus.org, for current master at 25f36bd7ba6899be6c720292528c56bc35e0f089
  1. Make mk/validate.mk which overrides the mk/flavours/validate.mk one Set:

GhcStage2HcOpts = -ddump-timings -O -dcore-lint -dno-debug-output

  1. Run
       $ make maintainer-clean
       $ CPUS=6 ./validate >> master-validate.log 2>&1
    
  2. Revert the prior TTG commits

on branch wip/revert-ttg-2017-11-20

GHC/master

haddock/ghc-head

  1. Run the timing again
       $ make maintainer-clean
       $ CPUS=6 ./validate >> revert-validate.log 2>&1
    

Results to follow

comment:17 Changed 22 months ago by alanz

The raw log files from the previous measurements are

I am not sure of the master-validate, the box may have been busy doing other builds while it ran.

I will analyze in the morning. Feel free to dive in if you like, timezones are great.

comment:18 Changed 22 months ago by simonpj

What about Plan F: no Data instances at all? Easy to implement, very fast to compile.

GHC doesn't use them. (Or at least only in ways we could do some other way.) They are there only to support hypothetical clients.

Or maybe not. It'd be good to knonw who is using them.

Otherwise, disgusgintly, yes perhaps Plan B is best.

PS: sorry about the red herring of Plans C and D. Utterly bogus.

comment:19 Changed 22 months ago by alanz

comment:20 Changed 22 months ago by alanz

I made a very quick attempt yesterday at removing the Data instances completely, but they are in fact used, I introduced them for the dump AST stuff.

I want to first get an idea of what the major performance driver is, before focusing exclusively on instances.

comment:21 Changed 22 months ago by alanz

Having just leveled up on Turtle, the compile time differences for compiling stage2 (via -ddump-timings) are attached, together with the programme used to generate them.

The key result is the following

Module        :      master                  reverted                      diff
HsSyn         :    10154.82                   1550.70                   8604.12
HsImpExp      :    16266.68                   4703.24                  11563.44
DynFlags      :    70682.66                  59097.39                  11585.27
HsLit         :    21915.48                   4487.74                  17427.74
HsPat         :    26660.85                   7181.77                  19479.08
HsTypes       :   107182.37                  16066.12                  91116.25
HsBinds       :   115097.57                  19346.81                  95750.76
HsExpr        :   260980.84                  38087.04                 222893.79
HsDecls       :   320157.08                  45702.17                 274454.90

Changed 22 months ago by alanz

Attachment: differences.txt added

Changed 22 months ago by alanz

Attachment: Main.hs added

comment:22 Changed 22 months ago by alanz

And I guess I should do the same with the version where the deriving is all in one file, to isolate that too.

comment:23 Changed 22 months ago by alanz

I was also wondering if we could more easily derive Generic, and use that for any needed traversals.

comment:24 Changed 22 months ago by Ben Gamari <ben@…>

In 314bc314/ghc:

Revert "trees that grow" work

As documented in #14490, the Data instances currently blow up
compilation time by too much to stomach. Alan will continue working on
this in a branch and we will perhaps merge to 8.2 before 8.2.1 to avoid
having to perform painful cherry-picks in 8.2 minor releases.

Reverts haddock submodule.

This reverts commit 47ad6578ea460999b53eb4293c3a3b3017a56d65.
This reverts commit e3ec2e7ae94524ebd111963faf34b84d942265b4.
This reverts commit 438dd1cbba13d35f3452b4dcef3f94ce9a216905.
This reverts commit 0ff152c9e633accca48815e26e59d1af1fe44ceb.

comment:25 Changed 22 months ago by Shayan-Najd

but they are in fact used, I introduced them for the dump AST stuff.

Moving towards Plan F (no Data instances), does GHC itself use "the dump AST stuff", or is it to be used for debugging (and in GHC API)? If not used within GHC, we can temporarily disable remove them (and the related test cases) and time the build time, before and after applying the TTG patches in this Data-less branch. Does it sound reasonable?

comment:26 Changed 22 months ago by bgamari

There are a few uses other than the AST dumping. While discussing this with Simon I was able to find one in RdrHsSyn by grepping Data.Data and looking through some of the uses. However, on closer inspection it looks like I just got lucky as that is the only use I can see in compiler/.

However, I suspect haddock uses SYB.

comment:27 Changed 22 months ago by Shayan-Najd

I see. The goal is to simply see whether really Data instances are causing all performance issues.

Can we still build GHC without haddock working? If yes, then measuring compile-time of a data-less branch before and after TTG patches answers many of our questions.

comment:28 Changed 22 months ago by alanz

My current thoughts on this

As I understand it, the Data instances can only be generated for concrete types. As such the index type is a red herring, as we need the Data instances for the indexed type.

Effectively, we are tailoring the AST to be a number of concrete versions. This is the point of TTG.

So making a Data instance per concrete version may be the best way to go. Plan B.

So, we know we can do that with GHC 8.2, and do not want to complicate life in terms of cherry-picking in the interim.

So, perhaps either put this work on hold, or do it in a long-running branch until such time as GHC 8.2 is the minimal bootstrap compiler. Yay 6 months cycles.

comment:29 Changed 22 months ago by simonpj

Why does Plan B not work in 8.0.2? This works:

{-# LANGUAGE TypeFamilies, GADTs, FlexibleInstances, StandaloneDeriving, DeriveDataTypeable #-}

module Foo where

import Data.Data

data T a where
  MkT :: XT a -> a -> T a

type family XT a
type instance XT Bool = Int

deriving instance Data (T Bool)

comment:30 Changed 22 months ago by alanz

But this does not

{-# LANGUAGE TypeFamilies, GADTs, FlexibleInstances, StandaloneDeriving, DeriveDataTypeable #-}

module Foo where

import Data.Data

data T a where
  MkT :: XT a -> a -> T a

type family XT a
type instance XT Bool = Int
type instance XT Char = Int

deriving instance Data (T Bool)
deriving instance Data (T Char)

And this models the situation in GHC

comment:31 Changed 22 months ago by Shayan-Najd

So making a Data instance per concrete version may be the best way to go. Plan B.

Regardless of the GHC version we use, the question is whether Plan B really fixes the build-time problem. Can we test it?

comment:32 Changed 22 months ago by bgamari

I see. The goal is to simply see whether really Data instances are causing all performance issues.

Sure, I think this is a perfectly reasonable experiment. I was just pointing out that we may eventually still need to provide some Data instances somewhere.

comment:33 Changed 22 months ago by trommler

Cc: trommler added

comment:34 Changed 22 months ago by alanz

Initial result: Compiling HsInstances with all the GhcPs etc variants takes around 90s and 1GB on my machine.

GHC 8.2.1

comment:35 Changed 22 months ago by alanz

And here is an interesting result from validation with my branch wip/ttg6-unrevert-2017-11-22

=====> haddock.base(normal) 1 of 3 [0, 0, 0]
bytes allocated value is too low:
(If this is because you have improved GHC, please
update the test so that GHC doesn't regress again)
    Expected    haddock.base(normal) bytes allocated: 19694554424 +/-5%
    Lower bound haddock.base(normal) bytes allocated: 18709826702 
    Upper bound haddock.base(normal) bytes allocated: 20679282146 
    Actual      haddock.base(normal) bytes allocated: 17315438736 
    Deviation   haddock.base(normal) bytes allocated:       -12.1 %
*** unexpected stat test failure for haddock.base(normal)
=====> haddock.Cabal(normal) 2 of 3 [0, 0, 0]
bytes allocated value is too low:
(If this is because you have improved GHC, please
update the test so that GHC doesn't regress again)
    Expected    haddock.Cabal(normal) bytes allocated: 20104611952 +/-5%
    Lower bound haddock.Cabal(normal) bytes allocated: 19099381354 
    Upper bound haddock.Cabal(normal) bytes allocated: 21109842550 
    Actual      haddock.Cabal(normal) bytes allocated: 18233466752 
    Deviation   haddock.Cabal(normal) bytes allocated:        -9.3 %
*** unexpected stat test failure for haddock.Cabal(normal)
=====> haddock.compiler(normal) 3 of 3 [0, 0, 0]
bytes allocated value is too low:
(If this is because you have improved GHC, please
update the test so that GHC doesn't regress again)
    Expected    haddock.compiler(normal) bytes allocated: 102142130576 +/-10%
    Lower bound haddock.compiler(normal) bytes allocated:  91927917518 
    Upper bound haddock.compiler(normal) bytes allocated: 112356343634 
    Actual      haddock.compiler(normal) bytes allocated:  51327433048 
    Deviation   haddock.compiler(normal) bytes allocated:        -49.7 %
*** unexpected stat test failure for haddock.compiler(normal)

Unexpected results from:
TEST="haddock.Cabal haddock.base haddock.compiler"

SUMMARY for test run started at Thu Nov 23 07:28:46 2017 UTC
 0:00:00 spent to go through
       3 total tests, which gave rise to
       3 test cases, of which
       0 were skipped

       0 had missing libraries
       0 expected passes
       0 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       0 unexpected failures
       3 unexpected stat failures

Unexpected stat failures:
   perf/haddock/haddock.base.run      haddock.base [stat too good] (normal)
   perf/haddock/haddock.Cabal.run     haddock.Cabal [stat too good] (normal)
   perf/haddock/haddock.compiler.run  haddock.compiler [stat too good] (normal)

comment:36 Changed 22 months ago by Shayan-Najd

This sounds good!Thanks.

So in short, wip/ttg6-unrevert-2017-11-22 is a branch where (a) the three TTG patches are applied, (b) the Data instances are gathered together, and (c) Plan B is used.

And, the experiment shows less memory allocation when building GHC? What about the overall build-time performance?

comment:37 Changed 22 months ago by alanz

Gipedia takes the build time from 2063 to 2271 seconds with this approach. A 10% change.

https://perf.haskell.org/ghc/#compare/abdb5559b74af003a6d85f32695c034ff739f508/6a313b17191e77723368b380f51656faccdd8da2

comment:38 Changed 22 months ago by simonpj

I had a new idea: see Plan F in ImplementingTreesThatGrow/Instances.

comment:39 Changed 22 months ago by alanz

I am not sure if I misunderstand Plan F, but GHC insists on a Data instance for the x in

newtype GhcExt x = Ext x

instance (Data x) => Data (GhcExt x) where
  gmapM _f x = return x
  gunfold k z c = case constrIndex c of
                    1 -> k (z Ext)
                    _ -> panic "GhcExt.gunfold"
  dataTypeOf _ = mkNoRepType "HsExtension.GhcExt"

comment:40 Changed 22 months ago by simonpj

I was thinking abou this

{-# LANGUAGE TypeFamilies, GADTs, FlexibleInstances, StandaloneDeriving,  DeriveDataTypeable #-}

module Foo where

import Data.Data
import Data.Typeable

data Ext x = MkExt x

data T a where
  MkT :: XT a -> a -> T a

data GhcPass p

type family XT a
type instance XT (GhcPass p) = Ext ()

instance Typeable x => Data (Ext x) where
  gmapM f v = return v
  
deriving instance Typeable p => Data (T (GhcPass p))

instance Typeable p => Data (GhcPass p)

This compiles fine.

Last edited 22 months ago by simonpj (previous) (diff)

comment:41 Changed 22 months ago by alanz

Given the following module

{-# LANGUAGE TypeFamilies, GADTs, FlexibleInstances, StandaloneDeriving, DeriveDataTypeable #-}
{-# LANGUAGE DataKinds #-}

module Oddity where

import Data.Data

data GhcPass (p :: Pass)
data Pass = Parsed | Renamed | Typechecked
         deriving (Data)

deriving instance Typeable p => Data (GhcPass p)

type family XOverLit x

type instance XOverLit (GhcPass 'Parsed)      = Int
type instance XOverLit (GhcPass 'Renamed)     = Int
type instance XOverLit (GhcPass 'Typechecked) = Char

data Foo p = Foo (XOverLit p)
deriving instance (Typeable p) => Data (Foo (GhcPass p))

I do not understand why deriving the Data instance for Foo (GhcPass p) fails

Oddity.hs:21:1-56: error:
    • Could not deduce (Data (XOverLit (GhcPass p)))
        arising from a use of ‘k’

Is this a weakness in the deriving process? Because as I understand it, there is a concrete value for every (GhcPass p), given the way GhcPass is defined as being indexed by Pass.

comment:42 Changed 22 months ago by alanz

My current thoughts

  1. Plan F is complex, and requires special case treatment of traversals into extension points that are actually used. In the parts of TTG introduced so far, we are already making fairly substantial use of pass-specific data types in a seamless way.
  1. Plan B is simple and works, and does not change the GHC API for any existing client users of traversals over the AST. We are already looking at an earliest land of the feature in GHC 8.6 (second half of 2018?) so the required GHC 8.2 bootstrap compiler is not an issue.

comment:43 in reply to:  41 Changed 22 months ago by RyanGlScott

Replying to alanz:

Is this a weakness in the deriving process? Because as I understand it, there is a concrete value for every (GhcPass p), given the way GhcPass is defined as being indexed by Pass.

This is not how type families work. Type families only reduce when you provide arguments that match an instance in scope. XOverlit (GhcPass p) does not match any instances, and thus it doesn't reduce. If you want this to compile, you'll need something like this:

{-# LANGUAGE TypeFamilies, GADTs, FlexibleInstances, StandaloneDeriving, DeriveDataTypeable #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}
module Oddity where

import Data.Data

data GhcPass (p :: Pass)
data Pass = Parsed | Renamed | Typechecked
         deriving (Data)

deriving instance Typeable p => Data (GhcPass p)

type family XOverLit x

type instance XOverLit (GhcPass 'Parsed)      = Int
type instance XOverLit (GhcPass 'Renamed)     = Int
type instance XOverLit (GhcPass 'Typechecked) = Char

data Foo p = Foo (XOverLit p)
deriving instance (Typeable p, Data (XOverLit (GhcPass p))) => Data (Foo (GhcPass p))

comment:44 Changed 22 months ago by alanz

Thanks. That makes sense, if I think about it. In my example it just happens to be a closed set, this will not always be true.

comment:45 Changed 22 months ago by simonpj

I'm ok with Plan B. But can you explain what you mean by this?

Plan F is complex, and requires special case treatment of traversals into extension points that are actually used.

What is the special case treatment and why do we need it?

comment:46 Changed 22 months ago by alanz

What is the special case treatment and why do we need it?

My understanding is that we would still need to implement something like Plan D, which requires the IsGhcPass class.

It could well be that Plan F *is* a good option, but I just do not understand the mechanics of it.

Note: See TracTickets for help on using tickets.