Opened 11 months ago

Last modified 9 months ago

#15770 new bug

Missing optimisation opportunity in code gen for always-saturated applications?

Reported by: osa1 Owned by:
Priority: normal Milestone:
Component: Compiler Version:
Keywords: CodeGen Cc: simonmar, simonpj, sgraf
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D5232
Wiki Page:

Description

As a part of the work on moving some fragile analyses from Core to final phase before code gen (#9718) I found an unused analysis in CoreToStg which basically assigns each binder this:

data StgBinderInfo
  = NoStgBinderInfo
  | SatCallsOnly        -- All occurrences are *saturated* *function* calls
                        -- This means we don't need to build an info table and
                        -- slow entry code for the thing
                        -- Thunks never get this value

The idea is , as mentioned in the comment, for SatCallsOnly binder we can avoid generating code for slow entry. However this is currently not used (see Phab:D5232), and this ticket is to decide whether to use or remove this.

I found these comments in StgCmmClosure that explains how this became unused and how it was used before:

-----------------------------------------------------------------------------
--                staticClosureRequired
-----------------------------------------------------------------------------

{-  staticClosureRequired is never called (hence commented out)

    SimonMar writes (Sept 07) It's an optimisation we used to apply at
    one time, I believe, but it got lost probably in the rewrite of
    the RTS/code generator.  I left that code there to remind me to
    look into whether it was worth doing sometime

{- Avoiding generating entries and info tables
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At present, for every function we generate all of the following,
just in case.  But they aren't always all needed, as noted below:

[NB1: all of this applies only to *functions*.  Thunks always
have closure, info table, and entry code.]

[NB2: All are needed if the function is *exported*, just to play safe.]

* Fast-entry code  ALWAYS NEEDED

* Slow-entry code
        Needed iff (a) we have any un-saturated calls to the function
        OR         (b) the function is passed as an arg
        OR         (c) we're in the parallel world and the function has free vars
                       [Reason: in parallel world, we always enter functions
                       with free vars via the closure.]

* The function closure
        Needed iff (a) we have any un-saturated calls to the function
        OR         (b) the function is passed as an arg
        OR         (c) if the function has free vars (ie not top level)

  Why case (a) here?  Because if the arg-satis check fails,
  UpdatePAP stuffs a pointer to the function closure in the PAP.
  [Could be changed; UpdatePAP could stuff in a code ptr instead,
   but doesn't seem worth it.]

  [NB: these conditions imply that we might need the closure
  without the slow-entry code.  Here's how.

        f x y = let g w = ...x..y..w...
                in
                ...(g t)...

  Here we need a closure for g which contains x and y,
  but since the calls are all saturated we just jump to the
  fast entry point for g, with R1 pointing to the closure for g.]


* Standard info table
        Needed iff (a) we have any un-saturated calls to the function
        OR         (b) the function is passed as an arg
        OR         (c) the function has free vars (ie not top level)

        NB.  In the sequential world, (c) is only required so that the function closure has
        an info table to point to, to keep the storage manager happy.
        If (c) alone is true we could fake up an info table by choosing
        one of a standard family of info tables, whose entry code just
        bombs out.

        [NB In the parallel world (c) is needed regardless because
        we enter functions with free vars via the closure.]

        If (c) is retained, then we'll sometimes generate an info table
        (for storage mgr purposes) without slow-entry code.  Then we need
        to use an error label in the info table to substitute for the absent
        slow entry code.
-}

staticClosureRequired
        :: Name
        -> StgBinderInfo
        -> LambdaFormInfo
        -> Bool
staticClosureRequired binder bndr_info
                      (LFReEntrant top_level _ _ _ _)        -- It's a function
  = ASSERT( isTopLevel top_level )
        -- Assumption: it's a top-level, no-free-var binding
        not (satCallsOnly bndr_info)

staticClosureRequired binder other_binder_info other_lf_info = True
-}

Change History (14)

comment:1 Changed 11 months ago by simonmar

Even if there were saturated calls only, we'd still need the info table and slow entry code to support the heap/stack checks. And we'd need the info table for the closure, which might be referred to by SRTs. And the function has to be not externally visible. So it seems like only in very narrow circumstances could we save anything here.

comment:2 Changed 11 months ago by simonmar

Oh, and most of the time there isn't any slow entry code because we have built-in argument patterns that cover a good proportion of functions.

comment:3 Changed 11 months ago by osa1

Differential Rev(s): Phab:D5232

Great, because this means less work for #9718. Shall we merge Phab:D5232?

info table and slow entry code to support the heap/stack checks

Out of curiosity, why do we need entry code for heap/stack checks? Do we use slow entry code as continuation when we do GC? Or something else?

comment:4 Changed 11 months ago by simonmar

Yes, the slow entry code is used as the continuation for the heap check for a function entry point, which saves a lot of code. See stg_gc_fun in rts/HeapStackCheck.cmm.

comment:5 Changed 11 months ago by sgraf

I'm conveniently using satCallsOnly when performing late lambda lifting. It would not be terribly upsetting to see this go, but I found it convenient not having to compute the same analysis info myself.

The same arguments as for free variables apply, though: It's occurrence information which is probably fragile enough to just be generated on demand.

comment:6 Changed 11 months ago by sgraf

Cc: sgraf added

comment:7 Changed 11 months ago by simonpj

The same arguments as for free variables apply, though: It's occurrence information which is probably fragile enough to just be generated on demand.

I think so too. That is, let's not generate it at the beginning. If LLF wants it, it's probably best just to calculate it.

comment:8 Changed 11 months ago by simonpj

Commit this?

comment:9 Changed 11 months ago by osa1

You mean Phab:D5232? sgraf is using StgBinderInfo for late lambda lift so I thought we probably shouldn't remove it.

comment:10 Changed 11 months ago by sgraf

I'm convinced Phab:D5232 makes sense, so we should do this. Phab:D5224 can wait.

Will there be an analysis that recovers the functionality or do I have to write my own (occurrence analyser on STG)?

comment:11 Changed 11 months ago by simonpj

Will there be an analysis that recovers the functionality or do I have to write my own (occurrence analyser on STG)?

I think the latter. It's extremely easy of course. The only tricky bit is to decide how to record the results in the tree.

comment:12 Changed 11 months ago by sgraf

I continued discussion of the consequences of this ticket for LLF in https://ghc.haskell.org/trac/ghc/ticket/9476#comment:49.

comment:13 Changed 11 months ago by Ömer Sinan Ağacan <omeragacan@…>

In d30352a/ghc:

Remove StgBinderInfo and related computation in CoreToStg

- The StgBinderInfo type was never used in the code gen, so the type, related
  computation in CoreToStg, and some comments about it are removed. See #15770
  for more details.

- Simplified CoreToStg after removing the StgBinderInfo computation: removed
  StgBinderInfo arguments and mfix stuff.

The StgBinderInfo values were not used in the code gen, but I still run nofib
just to make sure: 0.0% change in allocations and binary sizes.

Test Plan: Validated locally

Reviewers: simonpj, simonmar, bgamari, sgraf

Reviewed By: sgraf

Subscribers: AndreasK, sgraf, rwbarton, carter

Differential Revision: https://phabricator.haskell.org/D5232

comment:14 Changed 9 months ago by simonpj

Keywords: CodeGen added
Note: See TracTickets for help on using tickets.