Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#12754 closed bug (fixed)

Adding an explicit export list halves compilation time.

Reported by: mpickering Owned by: mpickering
Priority: normal Milestone: 8.2.1
Component: Compiler Version: 8.0.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): phab:D2657
Wiki Page:

Description

A file only containing a 10000 constructor enumeration type compiles twice as fast with an explicit export list compare to having no export list.

# print "module A where"
print "module A (T ( .. ) ) where"
print "\n"
print "data T = C"
for i in range(0,10000):
    print ("    | C" + str(i))

Without an export list:

[1 of 1] Compiling A                ( A.hs, A.o )

real	0m22.872s
user	0m19.653s
sys	0m0.868s

With the export list:

[1 of 1] Compiling A                ( A.hs, A.o )

real	0m11.249s
user	0m10.428s
sys	0m0.435s

The culprit here is exports_from_avail, in the case that no export list is provided, the things to export are gathered from the GlobalRdrEnv. Each constructor is placed in its own Avail which are then combined by plusAvail which is implemented using the expensive unionLists.

Change History (16)

comment:1 Changed 3 years ago by mpickering

Resolution: invalid
Status: newclosed

I don't know what I was doing but I can no longer reproduce this. Sorry for the noise, the code in exports_from_avail still looks like it could be more efficient.

comment:2 Changed 3 years ago by niteria

I get a difference here, but not as dramatic:

real    0m11.544s
user    0m11.210s
sys     0m0.243s

vs

real    0m14.845s
user    0m14.279s
sys     0m0.291s

comment:3 Changed 3 years ago by simonpj

Resolution: invalid
Status: closednew
Type of failure: None/UnknownCompile-time performance bug

It would be great to have some insignt into what is going on here. It's very unexpected to have such a large difference from such a small change.

Profiling would be very interesting. I'll re-open.

Simon

comment:4 Changed 3 years ago by mpickering

Bartosz game me the command he was using.

rm A.{hs,o,hi}; python gen12754.py > A.hs && time ghc A.hs

and I ran the test again, here are my 4 runs.

[1 of 1] Compiling A                ( A.hs, A.o )

real	0m26.460s
user	0m24.643s
sys	0m0.711s
~/Documents/haskell/ghc:vim gen.py 
~/Documents/haskell/ghc:rm A.{hs,o,hi}; python gen.py > A.hs && time ghc A.hs
[1 of 1] Compiling A                ( A.hs, A.o )

real	0m15.743s
user	0m15.100s
sys	0m0.402s
~/Documents/haskell/ghc:vim gen.py 
~/Documents/haskell/ghc:rm A.{hs,o,hi}; python gen.py > A.hs && time ghc A.hs
[1 of 1] Compiling A                ( A.hs, A.o )

real	0m24.858s
user	0m24.283s
sys	0m0.392s
~/Documents/haskell/ghc:vim gen.py 
~/Documents/haskell/ghc:rm A.{hs,o,hi}; python gen.py > A.hs && time ghc A.hs
[1 of 1] Compiling A                ( A.hs, A.o )

real	0m12.098s
user	0m11.672s
sys	0m0.321s

so I do see a big difference.

comment:5 Changed 3 years ago by niteria

The difference in allocations is significant:

$ rm A.{hs,o,hi}; python gen12754no.py > A.hs && inplace/bin/ghc-stage2 A.hs +RTS -s; rm A.{hs,o,hi}; python gen12754.py > A.hs && inplace/bin/ghc-stage2 A.hs +RTS -s;
[1 of 1] Compiling A                ( A.hs, A.o )
  13,116,210,736 bytes allocated in the heap
   1,154,829,688 bytes copied during GC
      92,896,592 bytes maximum residency (13 sample(s))
       2,925,816 bytes maximum slop
             225 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2658 colls,     0 par    1.433s   1.433s     0.0005s    0.0334s
  Gen  1        13 colls,     0 par    0.840s   0.840s     0.0646s    0.1918s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time   10.539s  ( 11.260s elapsed)
  GC      time    2.273s  (  2.273s elapsed)
  EXIT    time    0.045s  (  0.046s elapsed)
  Total   time   12.882s  ( 13.580s elapsed)

  Alloc rate    1,244,579,585 bytes per MUT second

  Productivity  82.3% of total user, 83.3% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
[1 of 1] Compiling A                ( A.hs, A.o )
   7,981,835,440 bytes allocated in the heap
     804,198,536 bytes copied during GC
      92,485,816 bytes maximum residency (13 sample(s))
       2,114,568 bytes maximum slop
             243 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       592 colls,     0 par    0.846s   0.846s     0.0014s    0.0385s
  Gen  1        13 colls,     0 par    0.883s   0.883s     0.0679s    0.1829s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    9.633s  ( 10.229s elapsed)
  GC      time    1.729s  (  1.728s elapsed)
  EXIT    time    0.044s  (  0.052s elapsed)
  Total   time   11.433s  ( 12.010s elapsed)

  Alloc rate    828,632,869 bytes per MUT second

  Productivity  84.9% of total user, 85.6% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

comment:6 Changed 3 years ago by niteria

I built profiled GHC and it confirms what Matthew already said, nubAvails makes up all the extra allocations.

comment:7 Changed 3 years ago by simonpj

I stupidly missed Matthew's original insight in the Description. He's absolutely right; the current thing is stupid.

There should be nothing non-linear about this. We have a [GRE] and we want a [AvailInfo]. Very well: do

gresToAvailInfo :: [GRE] -> [AvailInfo]
gresToAvailInfo gres
  = nameEnvElts avail_env
  where
    avail_env :: NameEnv AvailInfo   -- Keyed by the parent
    avail_env = foldr add emptyNameEnv gres

    add :: GRE -> NameEnv AvailInfo -> NameEnv AvailInfo
    add gre env = ... very like availFromGRE, but instead of returning
                      an AvailInfo, it extends the appropriate AvailInfo
                      in the env ...

Each GRE should take constant time to add. Each GRE deals with a different Name, so there is no need to check for duplicates.

Needed in two places

  • TcRnExports.exports_from_avail, the Nothing case.
  • Same function, the IEModuleContents case of exports_from_item

Hmm. The use of nubAvails is still needed though, in case you say

module M( T( D, D3 ), T( D, D2 ) )

when we should combine the two export items to T( D, D2, D3 ). So we might still get the non-linear cost if you have very many manually-written items, which is at least better.

So an alternative, perhaps better, would be to rewrite nubAvails to use code very like the above function, so that each Name in the [AvailInfo] takes linear time to add. That's be more robust I guess.

Finally, the other use of nubAvails is in RnNames,findImportUsage. I think extendImportMap could return a [GRE] as the second element of its result pair, instead of [AvailInfo], and then we could use gresToAvailInfo again; but this time we'd need to worry about duplicates.

comment:8 Changed 3 years ago by bgamari

simonpj, did you mean to use foldr in avail_env? As far as I know it's better to use foldl' in cases like these where the result can't be forced "incrementally". In the case of foldr you will first build up a chain of add applications on the stack before ending up with the final NameEnv AvailInfo.

comment:9 Changed 3 years ago by simonpj

maybe foldl' would be better, yes

comment:10 Changed 3 years ago by mpickering

Owner: set to mpickering

comment:11 Changed 3 years ago by niteria

Differential Rev(s): phab:D2657

comment:12 Changed 3 years ago by simonpj

Patch looks mostly good thanks!

You did not address my remarks about the other uses of nubAvails:

So an alternative, perhaps better, would be to rewrite nubAvails to use code very like the above function, so that each Name in the [AvailInfo] takes linear time to add. That's be more robust I guess.

Finally, the other use of nubAvails is in RnNames.findImportUsage. I think extendImportMap could return a [GRE] as the second element of its result pair, instead of [AvailInfo], and then we could use gresToAvailInfo again; but this time we'd need to worry about duplicates.

Indeed you added a brand-new call to nubAvails!

Couldn't we just apply the same medicine to nubAvails?

comment:13 Changed 3 years ago by mpickering

The call to nubAvails wasn't added, I pushed it into the branch which deals with an explicit export list, where we actually need to do the nubbing.

It doesn't seem like too much work to make the function deal with duplicates correctly. I'm not sure the other places are too critical but I will keep it in mind if I am ever working in that area.

comment:14 Changed 3 years ago by Matthew Pickering <matthewtpickering@…>

In 4b72f85/ghc:

Optimise whole module exports

We directly build up the correct AvailInfos rather than generating
lots of singleton instances and combining them with expensive calls to
unionLists.

There are two other small changes.

* Pushed the nubAvails call into the explicit export list
  branch as we construct them correctly and uniquely ourselves.
* fix_faminst only needs to check the first element of the export
  list as we maintain the (yucky) invariant that the parent is the
  first thing in it.

Reviewers: simonpj, austin, bgamari

Reviewed By: simonpj, bgamari

Subscribers: simonpj, thomie, niteria

Differential Revision: https://phabricator.haskell.org/D2657

GHC Trac Issues: #12754

comment:15 Changed 3 years ago by bgamari

Milestone: 8.2.1
Resolution: fixed
Status: newclosed

mpickering, is it safe to call this resolved despite the discussion in comment:13 and comment:12?

comment:16 Changed 3 years ago by mpickering

Yes.

Note: See TracTickets for help on using tickets.