Opened 3 years ago

Last modified 9 months ago

#13226 new bug

Compiler allocation regressions from top-level string literal patch

Reported by: dfeuer Owned by:
Priority: normal Milestone: 8.10.1
Component: Compiler Version: 8.1
Keywords: strings Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


d49b2bb21691892ca6ac8f2403e31f2a5e53feb3 leads to compiler allocation regressions in T12425 and T13035 when those tests are compiled with -O2. I took a look at T13035 tonight. -v reveals more terms and types. Interestingly, it appears that virtually all the string literals arise from the derived Typeable instance.

Change History (12)

comment:1 Changed 3 years ago by dfeuer

It appears the strings are also for Typeable in T12425.

comment:2 Changed 3 years ago by dfeuer

I've spent a bit more time looking at T212425. A couple things I've noticed:

One reason we're getting more terms and types is pretty obvious. Where we used to have

lvl3_r2p3 = unpackCString# "T12425.hs"#

we now have

lvl6_r2GO = "T12425.hs"#
lvl7_r2GP = unpackCString# lvl6_r2GO

Another reason is less obvious. We've always had multiple copies of some strings, but now we're getting more repeats. I don't know why that is.

I also don't really if any of this is actually responsible for the allocation change, but it could be responsible for some of it. It looks like there are more terms, types and allocations for every stage past parsing, but float out is (unsurprisingly) prominent.

My big question is why we're allowing these literals to be duplicated in the first place. I would think, naively perhaps, that we could assign each of them a unique, top-level identity from the start, and merely mark use-sites as inline or not, rather than actually inlining. When floating out, we could merge any two bindings in the same scope that refer to the same string identity. Those top-level identities would ultimately be resolved as either "inline-only" (not appearing in any unfoldings, etc., so they can be discarded) or otherwise. Of course we'd need to deal with the fact that built-in rules and such may be able to produce new literals, but I would think that relatively straightforward.

comment:3 Changed 3 years ago by dfeuer

Sorry, I meant T12425!

comment:4 Changed 3 years ago by simonpj

My big question is why we're allowing these literals to be duplicated in the first place

Good question. Can you show a repro case, please?

Returning to your first point, the extra binding is reasonable. After all, if we had

x = f "foo#"
y = g "foo#"

we might expect CSE to share those literals, and floating them to the top does just that. Moreover the final code is the same either way.

But if f and g were identical (say unpackCString#) then we'd CSE x and y anyway.

So we could choose not to float string literals unless they escaped a lambda. That would be easy to do if we wanted to try it. (We want to take them out of lambdas to allow the lambda to be inlined without duplicating the string.)

comment:5 Changed 2 years ago by bgamari


Given that 8.2.1-rc1 is imminent, I'm bumping these off to the 8.4

comment:6 Changed 2 years ago by Rufflewind

In the meantime, can the threshold be bumped up a bit to to avoid the constant complaints from Harbormaster?

comment:7 Changed 2 years ago by bgamari

I'm not sure what you mean. Which threshold are you referring to?

comment:8 Changed 2 years ago by bgamari

Keywords: strings added

comment:9 Changed 20 months ago by bgamari


This ticket won't be resolved in 8.4; remilestoning for 8.6. Do holler if you are affected by this or would otherwise like to work on it.

comment:10 Changed 15 months ago by bgamari

This will not be addressed in GHC 8.6.

comment:11 Changed 15 months ago by bgamari


These will not be addressed in GHC 8.6.

comment:12 Changed 9 months ago by osa1


Bumping milestones of low-priority tickets.

Note: See TracTickets for help on using tickets.