Opened 3 years ago

Closed 3 years ago

#12191 closed task (fixed)

7% allocation regression in Haddock performance tests

Reported by: bgamari Owned by: niteria
Priority: normal Milestone:
Component: Compiler Version: 8.0.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #10482 Differential Rev(s):
Wiki Page:


One of the commits below is responsible for a rather significant regression in the haddock.Cabal and haddock.compiler testcases,


It would be nice to identify which of these is the culprit. The fix in d55a9b4fd5a3ce24b13311962bca66155b17a558 will need to be applied on top of each of these to allow things to build.

Change History (14)

comment:1 Changed 3 years ago by nomeata

I’m trying to make rerun these commits with d55a9b4fd5a3ce24b13311962bca66155b17a558 cherrypicked, and will report back. This should narrow it down.

comment:2 Changed 3 years ago by thomie

How about we just delete those 3 haddock performance tests?


Simon Peyton Jones wrote: Could it be creeping up because Haddock is doing more than before? Ie not just because GHC is being bad?

Yes, it very much could be. Haddock is a moving target so it's not too weird that the numbers change. I want to say that the perf test should just be removed but it's still useful: if you're GHC hacking and not changing Haddock and numbers go awry then it serves its purpose.

OK, so I won't lose sleep over it!

Last edited 3 years ago by thomie (previous) (diff)

comment:3 Changed 3 years ago by nomeata

Owner: set to niteria

And we have a winner: The culprit is changeset:0497ee504cc9ac5d6babee9b98bf779b3fc50b98 as can be seen on

So Simon is not only freed of any fault, but yet again had a the right intuition about what regressions are worth investigating.

Assigning this to niteria.

comment:4 Changed 3 years ago by niteria

@nomeata: Thanks for running these. I think I had to disable haddock before ./validate because it was broken and that's how this snuck in. I will take a look this week (but unfortunately not tomorrow).

comment:5 Changed 3 years ago by simonpj

Thanks so much Joachim. I'm mighty relieved!

I'll revert Bartosz's patch for now, to keep Phab happy.

comment:6 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In 70a4589/ghc:

Revert "Make the Ord Module independent of Unique order"

This reverts commit 0497ee504cc9ac5d6babee9b98bf779b3fc50b98.

Reason: See Trac #12191.  I'm reverting pending Bartosz's
investigation of what went wrong.

comment:7 Changed 3 years ago by niteria

This appears to be ordering related. Meaning that when things are ordered differently we do different kind of work and it costs more.

When I changed Ord UnitId from my original diff to

instance Ord UnitId where
  nm1 `compare` nm2 =
    -- do the same work, but return the old ordering
    if stableUnitIdCmp nm1 nm2 == LT
      then getKey (getUnique nm1) `compare` getKey (getUnique nm2) 
           -- this was enough to stop GHC from being smart and optimize it out
      else getUnique nm1 `compare` getUnique nm2

The difference went from 7% to 0.21%. I will try to narrow this down to functions that are ordering sensitive and see if I can make them stable.

comment:8 Changed 3 years ago by simonmar

@thomie, Note that the Haddock perf test is really a GHC perf test, because most of the work that Haddock does is loading source files and typechecking them using the GHC API.

comment:9 Changed 3 years ago by niteria

It appears that checkFamInstConsistency is sensitive to Module ordering. Making it a noop also reduces allocations by 9%.

comment:10 Changed 3 years ago by niteria

So what's happening in checkFamInstConsistency is suboptimal in multiple ways:

1) The algorithm itself is quadratic. It checks every module against every other module. It appears to me that the same thing can be done in a linear way by maintaining a consistent env of family instances so far and fold adding there when possible or reporting an error.

2) checkFamInstConsistency in Haddock appears to be called for every module (Haddock is equivalent to --make, right?) which is redundant and in principle could be done at the very end once.

3) determining what pairs to compare uses Set ModulePair. After my change Ord ModulePair is linear. The size of the set is quadratic. Furthermore the Ord ModulePair doesn't cache its canonicalization.

4) given these there still appear to be better and worse orders for the ModulePairs. I don't really know why.

comment:11 Changed 3 years ago by simonpj

When typechecking a particular module M, checkFamInstConsistency does indeed need to check that the imported family instances are consistent; and that requires checking any instance for F imported from I1 with all instances of F imported from some other module I2.

(Multiple instances all imported from I1 have already been checked, when compiling I1.)

I don't understand Haddock's use-case to know why the algorithm is behaving so badly, but I bet there is some low-hanging fruit.

comment:12 Changed 3 years ago by Bartosz Nitka <niteria@…>

In 12306294/ghc:

Make checkFamInstConsistency less expensive

Doing canonicalization on every comparison turned
out to be very expensive.

Caching the canonicalization through the smart `modulePair` constructor
gives `8%` reduction in allocations on `haddock.compiler` and
`8.5%` reduction in allocations on `haddock.Cabal`.
Possibly other things as well, but it's really visible in Haddock.

Test Plan: ./validate

Reviewers: jstolarek, simonpj, austin, simonmar, bgamari

Reviewed By: simonpj, simonmar

Subscribers: thomie

Differential Revision:

GHC Trac Issues: #12191

comment:13 Changed 3 years ago by Bartosz Nitka <niteria@…>

In 348f2dbb/ghc:

Make the Ord Module independent of Unique order (2nd try)

The `Ord Module` instance currently uses `Unique`s for comparison.
We don't want to use the `Unique` order because it can introduce
This switches `Ord ModuleName` and `Ord UnitId` to use lexicographic
ordering making `Ord Module` deterministic transitively.

I've run `nofib` and it doesn't make a measurable difference.

See also Note [ModuleEnv determinism and performance].

This fixes #12191 - the regression, that the previous version of this
patch had.

Test Plan:
run nofib: P112

Reviewers: simonmar, bgamari, austin

Subscribers: thomie

Differential Revision:

GHC Trac Issues: #4012, #12191

comment:14 Changed 3 years ago by niteria

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.