Opened 12 years ago

Closed 4 years ago

Last modified 4 years ago

#2159 closed bug (fixed)

Use a more efficient representation than [DynFlag]

Reported by: igloo Owned by:
Priority: lowest Milestone:
Component: Compiler Version: 6.8.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

In compiler/typecheck/TcRnMonad.lhs, if I replace the existing definitions with

traceOptIf _ _ = return ()
traceOptTcRn _ _ = return ()
dumpTcRn _ = return ()

then we see an improvement in nofib's "Compile Times" sections:

-1 s.d.		-----	-8.5%
+1 s.d.		-----	+1.8%
Average		-----	-3.5%

By using some sort of bitmap representation, we ought to see a modest compile time improvement from all the checks that the trace functions make, as well as various other checks that are made (do we need to warn about missing sigs, are overlapping instances allowed, etc).

The tricky bit is designing it so that it is hard to shoot yourself in the foot. Lexer.x already has a bitmap (genericsBit, ffiBit etc) but the mapping of feature to bit number is done by hand, which isn't ideal. Perhaps this is even important enough to deserve a language extension?

Attachments (1)

bitmap.dpatch (220.5 KB) - added by igloo 9 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 11 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:2 Changed 11 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:3 Changed 10 years ago by igloo

Milestone: 6.10 branch6.12 branch

comment:4 Changed 10 years ago by simonmar

Type of failure: Compile-time performance bug

comment:5 Changed 9 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:6 Changed 9 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:7 Changed 9 years ago by igloo

With plain HEAD, and HEAD with bitmap.dpatch, running

time ghc -v0 --make -O Setup.hs

in a Cabal tree (which thus also builds most of Cabal) I get:

Run 1
    HEAD:           111.38 user 1.39 system 1:52.88 elapsed
    Bitmap:         109.01 user 1.38 system 1:50.41 elapsed
Run 2
    HEAD:           112.18 user 1.32 system 1:53.55 elapsed
    Bitmap:         110.58 user 1.38 system 1:51.98 elapsed
Run 3
    HEAD:           97.51 user 1.23 system 1:38.75 elapsed
    Bitmap:         110.95 user 1.26 system 1:52.24 elapsed
Run 4
    HEAD:           111.60 user 1.33 system 1:52.92 elapsed
    Bitmap:         109.93 user 1.30 system 1:51.31 elapsed
Run 5
    HEAD:           111.63 user 1.66 system 1:53.31 elapsed
    Bitmap:         88.90 user 1.19 system 1:30.41 elapsed
Run 6
    HEAD:           95.12 user 1.26 system 1:36.47 elapsed
    Bitmap:         111.27 user 1.48 system 1:52.88 elapsed
Run 7
    HEAD:           112.18 user 1.35 system 1:53.66 elapsed
    Bitmap:         109.51 user 1.27 system 1:50.85 elapsed
Run 8
    HEAD:           112.51 user 1.16 system 1:53.74 elapsed
    Bitmap:         110.33 user 1.26 system 1:51.71 elapsed
Run 9
    HEAD:           111.69 user 1.36 system 1:53.37 elapsed
    Bitmap:         109.26 user 1.40 system 1:50.65 elapsed
Run 10
    HEAD:           114.77 user 1.38 system 1:56.24 elapsed
    Bitmap:         110.08 user 1.47 system 1:51.54 elapsed

so with the patch it takes 109-110 rather than 111-112s (although occasionally a build is much quicker).

So not a huge performance increase, but it removes one of the slivers, and it would be nice to get the dopt noise out of profiles. I think some sort of extension (or generic program?) that makes bitmaps from enumerations would be generally useful.

Changed 9 years ago by igloo

Attachment: bitmap.dpatch added

comment:8 Changed 9 years ago by igloo

I don't think we should apply the patch, incidentally. It would be a maintenance nightmare.

comment:9 in reply to:  8 Changed 9 years ago by simonmar

Replying to igloo:

I don't think we should apply the patch, incidentally. It would be a maintenance nightmare.

That's what I thought too when I saw it. Have you tried just using an IntMap?

comment:10 Changed 9 years ago by igloo

Milestone: 7.0.17.0.2

comment:11 Changed 9 years ago by igloo

Milestone: 7.0.27.2.1

comment:12 Changed 8 years ago by igloo

Milestone: 7.2.17.4.1

comment:13 Changed 8 years ago by igloo

Milestone: 7.4.17.6.1
Priority: lowlowest

comment:14 Changed 7 years ago by igloo

Milestone: 7.6.17.6.2

comment:15 Changed 5 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:16 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:17 Changed 5 years ago by thoughtpolice

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:18 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:19 Changed 4 years ago by thomie

Milestone: 8.0.1

comment:20 Changed 4 years ago by bgamari

Resolution: fixed
Status: newclosed

Unless I am mistaken this has already been implemented. DynFlags now represents dump flags, extension flags, and general flags as IntSets, using the Enum instances of the respective types to map each flag to a bit. I'm not entirely sure when this change was made but it was quite a while ago.

comment:21 Changed 4 years ago by rwbarton

Heh, I was just looking at this too.

I imagine a bit array representation as in "bitmap.dpatch" would be a lot faster though. Especially if you added RULES to inline the case of checking for a known flag.

comment:22 Changed 4 years ago by rwbarton

Well the bit array was 4x faster than IntSet in my unscientific test. And it would probably be a bit faster still when the option to test is known and can be inlined. But there could be little left to gain after the switch to IntSet.

A pretty easy experiment to do, if we had an easy way to benchmark compiler performance...

comment:23 Changed 4 years ago by simonpj

Great! Bring it on!

comment:24 Changed 4 years ago by rwbarton

According to a statistical profiling run that I did some time ago GHC spends about 0.3% of its total time in the module libraries/containers/Data/IntSet/Base.hs. I don't know how much of that 0.3% is from DynFlag-checking versus other users of IntSet (another one is UnVarSet, whatever that is).

Unfortunately, reliably measuring a speedup that is expected to be at most ~0.25% is itself a somewhat nontrivial project.

Note: See TracTickets for help on using tickets.