Opened 2 years ago

Last modified 12 months ago

#13861 new feature request

Take more advantage of STG representation invariance (follows up #9291)

Reported by: heisenbug Owned by: heisenbug
Priority: normal Milestone:
Component: Compiler Version: 8.0.2
Keywords: CodeGen Cc: maoe, nomeata, osa1
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Consider following case alternatives at the STG stage (i.e. untyped) :

case scrut of
  False -> []
  Just x -> Right x
  [] -> Nothing

The common theme of all these is that the scrutinee's memory layout and the result's memory layout coincide. So operationally no allocation needs to take place, and the whole case expression is simply a (strict) identity at the STG stage.

I propose to add a small STG analysis to:

  • for each case alternative check whether the assigned tag between scrutinee and result matches, and if so
  • check whether both have the underlying memory layout and contents.

If these conditions are met, the case alternative can be replaced with the identity. When all alternatives simplify to the identity (also considering the DEFAULT alternative), then the entire case expression reduces to a single identity DEFAULT alternative (i.e. all other alternatives in the case can be dropped).

Many of the code examples in the join points paper (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/join-points-pldi17.pdf) exhibit these optimisation opportunities.

The already implemented suggestion in #9291 comes with the restriction that it only operates in the scope of the same type (see last comment there), but STG is untyped, so why not take advantage of this fact?

Change History (15)

comment:1 Changed 2 years ago by simonpj

Things to think about

  • It's not just memory layout. To substitute one data constructor for another, they must have the same tag; and may even need to belong to a data type with the same number of data constructors. Eg
    case x of
      Nothing -> True
    
    Nothing has tag zero, and True has tag 1.
  • Runtime debugging or heap analysis may display the data constructor. Substituting a data constructor from another type would break this utterly.

I'd need a bit of convincing that the benefit (in performance) justified the cost; and that the opportunities to exploit this happen often enough to use worth doing.

comment:2 in reply to:  1 Changed 2 years ago by heisenbug

Replying to simonpj:

Things to think about

  • It's not just memory layout. To substitute one data constructor for another, they must have the same tag; and may even need to belong to a data type with the same number of data constructors. Eg

Oh, yes I consider tags in my proposal. See my first bullet point. I doubt that the number of constructors must match, though. Maybe with nomeata's help I can come up with a proof-of-concept at some hackathon, so that we can get a feeling whether this optimisation kicks in often enough to become interesting.

case x of
  Nothing -> True

Nothing has tag zero, and True has tag 1.

  • Runtime debugging or heap analysis may display the data constructor. Substituting a data constructor from another type would break this utterly.

I'd need a bit of convincing that the benefit (in performance) justified the cost; and that the opportunities to exploit this happen often enough to use worth doing.

comment:3 Changed 2 years ago by maoe

Cc: maoe added

comment:4 Changed 2 years ago by nomeata

Cc: nomeata added

This could presumably easily be added to the STG CSE pass (#9291). Or at least that is roughly the right spot.

comment:5 Changed 2 years ago by heisenbug

When implemented, one could test: \case Right a → Just a should be seq b (unsafeCoerce b).

comment:6 Changed 2 years ago by heisenbug

Brain dump of conversation with SPJ at Haskell eXchange, London.

Currently pointer tagging only effective for "small constructor families". If not small, the tag is 1 on pointers to evaluated constructors.

See isSmallFamily in compiler/codeGen/StgCmmClosure.hs.

Of course this penalizes big families.

I suggest to set tags 1..6 for non-small families' lower constructors, 7 for all other (overflowing) constructors. This would allow more precise branching for big families too (in a significant number of cases), as the former constructors are usually the more common ones (keeping fingers crossed).

Also the coercion between small and big families would be straightforward, with following ranges directly castable:

  from
small big
to small 1..7 1..6
big 1..7 1..7

Conservatively in the beginning one could only allow 1..6.

Note: (-1 :: Int) .&. 7 == 7 so that would lead to all-ones too.

It is not immediately clear how to find out whether the constructor is n a big family. We could add the family size as an additional piece of information.

A (future) wiki page should explain the new conventions. Many references to pointer tagging in the code should be updated.

Last edited 21 months ago by heisenbug (previous) (diff)

comment:7 Changed 2 years ago by bgamari

It would be nice to know what the motivation for *not* using the available tag space for big families is. Surely there was a reason, even if it was just keeping implementation complexity at bay.

comment:8 Changed 2 years ago by simonpj

If the paper doesn't say and there are no comments to explain, the last port of call would be Simon Marlow. Otherwise I'd assume it was just becuase it seemed like the easiest thing, and most data types are smaller.

comment:9 in reply to:  7 ; Changed 2 years ago by heisenbug

Replying to bgamari:

It would be nice to know what the motivation for *not* using the available tag space for big families is. Surely there was a reason, even if it was just keeping implementation complexity at bay.

Probably the motivation was "we'll think about this later". Implementation complexity seems to be fairly low, see my branch (caveat: very WIP) https://github.com/ggreif/ghc/tree/wip/tag-big-families

comment:10 in reply to:  9 Changed 2 years ago by heisenbug

Replying to heisenbug:

Probably the motivation was "we'll think about this later". Implementation complexity seems to be fairly low, see my branch (caveat: very WIP) https://github.com/ggreif/ghc/tree/wip/tag-big-families

The last checkin does bootstrap GHC and seems to do well not he test side. Code still is not as pretty as I want it, but I shall clean it up in the next days. Can somebody do a perf evaluation with it? Thanks!

I should note that this is just the tagging of big families. The coercing idea is not included in this branch.

These are failing for me. There is clearly a GHCi bug here.

TEST="Conversions MultiLayerModules T10359 T10370 T10547 T10678 T10858 T10999 T12150 T12227 T12234 T12425 T12545 T12707 T12903 T13035 T13056 T13242a T13379 T13701 T13719 T1969 T2740 T2976 T3064 T3103 T3294 T4801 T4830 T5030 T5095 T5237 T5321FD T5321Fun T5549 T5603 T5631 T5642 T5837 T6048 T783 T8766 T9020 T9233 T9675 T9872a T9872b T9872c T9872d T9961 arith011 break001 break005 break006 break026 break027 compact_share determ021 dynbrk002 ghci055 integerConstantFolding parsing001 plusMinusInteger print002 print003 print005 print006 print008 print012 print013 print014 safePkg01 space_leak_001 tcfail072"

Last edited 2 years ago by heisenbug (previous) (diff)

comment:11 Changed 21 months ago by heisenbug

Owner: set to heisenbug

comment:12 Changed 21 months ago by heisenbug

Now that #14373 is up for review, it is time to revisit this one. It turns out that coercion between small and big families is not an issue any more (provided that wip/T14373 gets merged). From big family with ptr-tag 7 we anyway only can go to small family ptr-tag 7 if the tags in the info table match, so everything is okay. My conservative 1..6 assessment above is not needed. We can safely have 1..7.

comment:14 Changed 16 months ago by simonpj

Keywords: CodeGen added

comment:15 Changed 12 months ago by osa1

Cc: osa1 added
Note: See TracTickets for help on using tickets.