Opened 2 years ago

Last modified 2 years ago

#14239 new feature request

Let -fspecialise-aggressively respect NOINLINE (or NOSPECIALISABLE?)

Reported by: MikolajKonarski Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.2.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


First, let me explain the context. By defuault GHC specialises very few functions and the programmer is expected to enumerate additional functions to specialise (e.g., with INLINABLE). This is problematic if the functions are defined in libraries, not in the user code. Also, in some kinds of code it leads to INLINABLE on every functions (because missing even just one breaks the chain). The problems are described in other tickets.

With -fspecialise-aggressively and -fexpose-all-unfoldings, the default is reversed. Everything (even functions from libraries, as long as they have unfoldings available) is specialized and the user only needs to enumerate exceptions. Unfortunately, he can't.

I think (in particular from experiments with -Wall-missed-specialisations), in the presence of -fexpose-all-unfoldings, the -fspecialise-aggressively option happily specialises functions marked NOINLINE (and so with Inline:, in the .hi file). Consequently, the user has no way to be selective wrt specialisation when using the specialize-often default.

If we want to gradually disentangle INLINE and SPECIALIZE (e.g., rename INLINABLE to SPECIALISABLE), perhaps the pragma to use should be NOSPECIALISABLE. Perhaps it also makes sense to leave the current functionality, for experimenting with whole modules, but I'd rename it to -fforce-specialise-aggressively, because it overrides the obvious user intent.

Change History (5)

comment:1 Changed 2 years ago by mpickering

Keywords: Inlining added

comment:2 Changed 2 years ago by simonpj

Keywords: Inlining removed

I think you are suggesting that a user can write one (and only one) of

  • INLINE: please inline what I write, at every call site
  • SPECIALISABLE (currently written INLINABLE): please specialise what I write, at every call site
  • NOSPECIALISABLE: please do not specialise this function (even if it would otherwise be easy to do so)
  • NOINLINE: please do not inline or specialise this function (even if it would otherwise be easy to do so). That is, hide its implementation from the caller.

That would not be too hard.

comment:3 Changed 2 years ago by MikolajKonarski

Yes, I would be completely satisfied by that and a convention that all GHC options without -fforce- respect that (currently only -fspecialise-aggressively doesn't, I think).

In particular I'd leave untouched -fexpose-all-unfoldings, because it's not about inlining nor specialisation, so it doesn't need to respect the pragmas. It's about unfoldings, or rather it's close to a -fignore-the-split-into-modules-from-affecting-performance-which-puzzles-users option (and it's cheap and also useful for experimenting).

comment:4 Changed 2 years ago by simonpj

There is something odd here. In comment:2 I proposed:

  • NOSPECIALISABLE: please do not specialise this function (even if it would otherwise be easy to do so)
  • NOINLINE: please do not inline or specialise this function (even if it would otherwise be easy to do so). That is, hide its implementation from the caller.

But it seems odd to allow a function to be inlined, but not to allow it to be specialised, doesn't it? After all, inlining is really just a drastic form of specialisation: once per call site! You can think of specialisation as a way to economise on all these inlinings by sharing them among similar call sites.

So I wonder whether we should reverse the semantics thus:

  • NOINLINE: please do not inline this function (even if it would otherwise be easy to do so). But GHC is free to specialise it.
  • NOSPECIALISABLE: please do not inline or specialise this function (even if it would otherwise be easy to do so).

You could also argue for inlining and specialisation to be orthogonal, but that'd lead to strange cases where you (accidentally perhaps) say to inline but never specialise or some other odd combination. I'm inclined to stick with four mutually-exclusive settings for now.

Anyone else care to comment?

comment:5 Changed 2 years ago by MikolajKonarski

I think there are many way to interpret the negative flags. I can easily think about two: by trigger (passes) and by mechanism (micro-decisions). Let me explain by sketching the semantics of NOSPECIALISABLE pragma. For simplicity I assume GHC works by making separate passes over code, e.g., inlining pass and specialising pass. I think this is a good mental model for a programmer, as opposed to a tangled mess of iterated optimization micro-decisions, even if the latter is much closer to reality.

NOSPECIALISABLE by trigger: when GHC is looking for functions to specialise, ignore the function. The inlining pass may independently decide to inline it.

NOSPECIALISABLE by mechanism: in the specialisation pass, if the function meets the criteria for specialisation *and* specialisation-by-inlining, inline it, otherwise ignore it, never create a specialised copy. The inlining pass is free to make its own decisions, in particular, criteria for inlining may be different than for specialisation-by-inlining.

For my use case, for simplicity of the mental model, I'd prefer by-trigger semantics, because I want the NOSPECIALISABLE to just mark an exception to the -fspecialise-aggressively policy and I understand such global default-changing options to affect passes --- decisions whether a functions should be transformed at all, not micro-decisions how to best transform it (e.g., whether to specialize it by copy or by inlining). If one wants to risk forcing particular micro-decisions, there are other GHC flags to use, e.g., the thresholds for inlining and other fine-tuning, but their results are hard to predict and context-sensitive, so I'd keep them separate.

Examples of the by-trigger semantics: SPECIALISABLE+NOINLINE is free to perform specialisation-by-inlining, because NOINLINE just says to ignore the function during the inlining pass (but not during the specialising pass). INLINE+NOSPECIALISABLE is not equivalent to INLINE, I guess, because a recursive function can't be inlined, but might be specialized by copy.

The by-trigger semantics probably doesn't match the reality of GHC code and the current semantics of NOINLINE. For the trigger semantics there would need to be separate code that checks whether a function should be considered for inlining and a separate code that checks if a function we are specializing should instead be inlined. The NOINLINE pragma would only be inspected in the former. This also implies NOINLINE should not inhibit exposing of unfoldings.

The by-trigger semantics in full:

  • NOINLINE: please do not consider this function when deciding which functions to inline (even if it fits the criteria). But GHC is free to specialise it, even inlining it in the process.
  • NOSPECIALISABLE: please do not specialise this function (even if it would otherwise be easy to do so).

Edit: BTW, in the result, NOSPECIALISABLE is different than "don't apply -fspecialise-aggressively to this function, but use normal GHC mechanism instead". But this is OK, if I use -fspecialise-aggressively for a module, I'm a power-user and I don't need GHC to decide specialisation for me, I can force each case by hand.

Edit2: another (contrived) way I explain my preferred semantics of positive pragmas and of -fspecialise-aggressively is as original source code transformations, as opposed to options that affect optimization decisions further down the pipeline (where the source code the pragmas were affixed to may no longer be easy to delineate, due to splits and merges). Negative pragmas then mean "when GHC looks at original source code and decides its first major transformations, forbid some of them; don't intrude on what happens further down the pipeline".

Edit3: I inadvertently assumed the passes are somehow independent or that inlining "pass" happens before the specialisation "pass". If we model (via original source code transformation) the semantics of the initial steps of optimization the other way around, the results are different and probably better. In particular, NOINLINE mostly preserves its current semantics and prohibits inlining both of the polymorphic function and of its monomorphic copies (though it doesn't prohibit specialisation by copy).

Last edited 2 years ago by MikolajKonarski (previous) (diff)
Note: See TracTickets for help on using tickets.