Opened 3 years ago

Closed 3 years ago

#12191 closed task (fixed)

7% allocation regression in Haddock performance tests

Reported by: bgamari Owned by: niteria
Priority: normal Milestone:
Component: Compiler Version: 8.0.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #10482 Differential Rev(s):
Wiki Page:

Description

One of the commits below is responsible for a rather significant regression in the haddock.Cabal and haddock.compiler testcases,

d55a9b4fd5a3ce24b13311962bca66155b17a558
0497ee504cc9ac5d6babee9b98bf779b3fc50b98
586d55815401c54f4687d053fb033e53865e0bf1
7de776cfe7825fca6a71fe6b3854c3c86bf9ca12
5cee88d766723929f789ffcd2ef24d8b5ef62a16
1dcb32ddba605bced2e0e0ce3f52b58e8ff33f5b
921ebc9f0854d033cbafd43d3b2c5ba679c27b3c
e064f501d76c208ddab3c3be551ffe5167d7974f
8104f7c674d7ef2db0c25312f48763202dcef57f
599d912f0b85583e389661d85ed2f198e2621bb0
15fc52819c440f9e9b91ce92fcfda3c264cbe1c1
1f661281a23b6eab83a1144c43e464c0e2d2195a
35c9de7ca053eda472cb446c53bcd2007bfd8394
7afb7adf45216701e4f645676ecc0668f64b424d
c28dde37f3f274a2a1207dd4e175ea79769f5ead
15b9bf4ba4ab47e6809bf2b3b36ec16e502aea72

It would be nice to identify which of these is the culprit. The fix in d55a9b4fd5a3ce24b13311962bca66155b17a558 will need to be applied on top of each of these to allow things to build.

Change History (14)

comment:1 Changed 3 years ago by nomeata

I’m trying to make perf.haskell.org rerun these commits with d55a9b4fd5a3ce24b13311962bca66155b17a558 cherrypicked, and will report back. This should narrow it down.

comment:2 Changed 3 years ago by thomie

How about we just delete those 3 haddock performance tests?

From https://mail.haskell.org/pipermail/ghc-devs/2015-July/009434.html:

Simon Peyton Jones wrote: Could it be creeping up because Haddock is doing more than before? Ie not just because GHC is being bad?

Yes, it very much could be. Haddock is a moving target so it's not too weird that the numbers change. I want to say that the perf test should just be removed but it's still useful: if you're GHC hacking and not changing Haddock and numbers go awry then it serves its purpose.

OK, so I won't lose sleep over it!

Last edited 3 years ago by thomie (previous) (diff)

comment:3 Changed 3 years ago by nomeata

Owner: set to niteria

And we have a winner: The culprit is changeset:0497ee504cc9ac5d6babee9b98bf779b3fc50b98 as can be seen on https://perf.haskell.org/ghc/#revision/0497ee504cc9ac5d6babee9b98bf779b3fc50b98

So Simon is not only freed of any fault, but yet again had a the right intuition about what regressions are worth investigating.

Assigning this to niteria.

comment:4 Changed 3 years ago by niteria

@nomeata: Thanks for running these. I think I had to disable haddock before ./validate because it was broken and that's how this snuck in. I will take a look this week (but unfortunately not tomorrow).

comment:5 Changed 3 years ago by simonpj

Thanks so much Joachim. I'm mighty relieved!

I'll revert Bartosz's patch for now, to keep Phab happy.

comment:6 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In 70a4589/ghc:

Revert "Make the Ord Module independent of Unique order"

This reverts commit 0497ee504cc9ac5d6babee9b98bf779b3fc50b98.

Reason: See Trac #12191.  I'm reverting pending Bartosz's
investigation of what went wrong.

comment:7 Changed 3 years ago by niteria

This appears to be ordering related. Meaning that when things are ordered differently we do different kind of work and it costs more.

When I changed Ord UnitId from my original diff to

instance Ord UnitId where
  nm1 `compare` nm2 =
    -- do the same work, but return the old ordering
    if stableUnitIdCmp nm1 nm2 == LT
      then getKey (getUnique nm1) `compare` getKey (getUnique nm2) 
           -- this was enough to stop GHC from being smart and optimize it out
      else getUnique nm1 `compare` getUnique nm2

The difference went from 7% to 0.21%. I will try to narrow this down to functions that are ordering sensitive and see if I can make them stable.

comment:8 Changed 3 years ago by simonmar

@thomie, Note that the Haddock perf test is really a GHC perf test, because most of the work that Haddock does is loading source files and typechecking them using the GHC API.

comment:9 Changed 3 years ago by niteria

It appears that checkFamInstConsistency is sensitive to Module ordering. Making it a noop also reduces allocations by 9%.

comment:10 Changed 3 years ago by niteria

So what's happening in checkFamInstConsistency is suboptimal in multiple ways:

1) The algorithm itself is quadratic. It checks every module against every other module. It appears to me that the same thing can be done in a linear way by maintaining a consistent env of family instances so far and fold adding there when possible or reporting an error.

2) checkFamInstConsistency in Haddock appears to be called for every module (Haddock is equivalent to --make, right?) which is redundant and in principle could be done at the very end once.

3) determining what pairs to compare uses Set ModulePair. After my change Ord ModulePair is linear. The size of the set is quadratic. Furthermore the Ord ModulePair doesn't cache its canonicalization.

4) given these there still appear to be better and worse orders for the ModulePairs. I don't really know why.

comment:11 Changed 3 years ago by simonpj

When typechecking a particular module M, checkFamInstConsistency does indeed need to check that the imported family instances are consistent; and that requires checking any instance for F imported from I1 with all instances of F imported from some other module I2.

(Multiple instances all imported from I1 have already been checked, when compiling I1.)

I don't understand Haddock's use-case to know why the algorithm is behaving so badly, but I bet there is some low-hanging fruit.

comment:12 Changed 3 years ago by Bartosz Nitka <niteria@…>

In 12306294/ghc:

Make checkFamInstConsistency less expensive

Doing canonicalization on every comparison turned
out to be very expensive.

Caching the canonicalization through the smart `modulePair` constructor
gives `8%` reduction in allocations on `haddock.compiler` and
`8.5%` reduction in allocations on `haddock.Cabal`.
Possibly other things as well, but it's really visible in Haddock.

Test Plan: ./validate

Reviewers: jstolarek, simonpj, austin, simonmar, bgamari

Reviewed By: simonpj, simonmar

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2350

GHC Trac Issues: #12191

comment:13 Changed 3 years ago by Bartosz Nitka <niteria@…>

In 348f2dbb/ghc:

Make the Ord Module independent of Unique order (2nd try)

The `Ord Module` instance currently uses `Unique`s for comparison.
We don't want to use the `Unique` order because it can introduce
nondeterminism.
This switches `Ord ModuleName` and `Ord UnitId` to use lexicographic
ordering making `Ord Module` deterministic transitively.

I've run `nofib` and it doesn't make a measurable difference.

See also Note [ModuleEnv determinism and performance].

This fixes #12191 - the regression, that the previous version of this
patch had.

Test Plan:
./validate
run nofib: P112

Reviewers: simonmar, bgamari, austin

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2354

GHC Trac Issues: #4012, #12191

comment:14 Changed 3 years ago by niteria

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.