Opened 6 years ago

Last modified 3 years ago

#8774 new bug

Transitivity of Auto-Specialization

Reported by: crockeea Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.6.3
Keywords: Inlining Cc: erikd
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #5928, #8668, #8099 Differential Rev(s):
Wiki Page:

Description

From the docs:

[Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.

...

Moreover, given a SPECIALIZE pragma for a function f, GHC will automatically create specialisations for any type-class-overloaded functions called by f, if they are in the same module as the SPECIALIZE pragma, or if they are INLINABLE; and so on, transitively.

So GHC should automatically specialize some/most/all(?) functions marked INLINABLE without a pragma, and if I use an explicit pragma, the specialization is transitive. My question is: is the auto-specialization transitive? Either way, I'd like to see the docs updated to answer this question.

Specifically, the attached files demonstrate a bug if auto-specialization should be transitive.

Main.hs:

import Data.Vector.Unboxed as U
import Foo

main =
    let y = Bar $ Qux $ U.replicate 11221184 0 :: Foo (Qux Int)
        (Bar (Qux ans)) = iterate (plus y) y !! 100
    in putStr $ show $ foldl1' (*) ans

Foo.hs:

module Foo (Qux(..), Foo(..), plus) where
    
import Data.Vector.Unboxed as U

newtype Qux r = Qux (Vector r)
-- GHC inlines `plus` if I remove the bangs or the Baz constructor
data Foo t = Bar !t
           | Baz !t

instance (Num r, Unbox r) => Num (Qux r) where
    {-# INLINABLE (+) #-}
    (Qux x) + (Qux y) = Qux $ U.zipWith (+) x y

{-# INLINABLE plus #-}
plus :: (Num t) => (Foo t) -> (Foo t) -> (Foo t)
plus (Bar v1) (Bar v2) = Bar $ v1 + v2

GHC specializes the call to plus, but does *not* specialize (+) in the Qux Num instance. (In the attached core excerpt: main6 calls iterate main8. main8 is just plus, specialized for Int. So far so good. However, splus calls the *polymorphic* c+. If auto-specialization is transitive, I expect c+ to be specialized to Int.)

This kills performance: an explicit pragma

{-# SPECIALIZE plus :: Foo (Qux Int) -> Foo (Qux Int) -> Foo (Qux Int) #-}

results in transitive specialization as the docs indicate, so (+) is specialized and the code is 30x faster.

Is this expected behavior? Should I only expect (+) to be specialized transitively with an explicit pragma?

Note: this question is different from #5928 for two reasons:

  1. I believe that no inlining is occuring, and hence I don't think inlining is interfering with specialization
  2. I have INLINABLE pragmas on all relevant functions.

Note: this question is different from #8668 because I am asking about auto-specialization.

This question was originally posted on StackOverflow. As mentioned in the comments of that question, I am intentionally not fully applying the call to plus in Main, contrary to the suggestions in #8099. I'd love to see why I'm getting that behavior as well.

Attachments (1)

core.txt (2.8 KB) - added by crockeea 6 years ago.

Download all attachments as: .zip

Change History (18)

Changed 6 years ago by crockeea

Attachment: core.txt added

comment:1 Changed 5 years ago by thomie

crockeaa: this seems to have been overlooked, maybe try asking on the ghc-devs mailinglist.

comment:2 Changed 3 years ago by mpickering

Keywords: Inlining added

comment:3 Changed 3 years ago by erikd

Cc: erikd added

comment:4 Changed 3 years ago by bgamari

Cc: erikd removed

Matthew Pickering and I were recently pondering this. I wrote down some thoughts on the matter on #12463.

comment:5 Changed 3 years ago by bgamari

Cc: erikd added

comment:6 Changed 3 years ago by simonpj

JFollowing your plea today, I've just tried this with HEAD. I get great, specialised code.

I don't have 8.0 available. Can you try and see what happens now?

comment:7 Changed 3 years ago by erikd

@simonpj For the rest of us following along, how does one check this?

comment:8 Changed 3 years ago by bgamari

@erikd, plop the Foo and Main modules given in the ticket description in appropriately named files and compile Main with ghc -ddump-simpl -dsuppress-idinfo -O. You should see no Foo as in the simplified core; instead you should see a nicely specialized definition of plus with all Foos should be instantiated at Int. There should be no calls to the polymorphic Foo.plus.

comment:9 Changed 3 years ago by erikd

Doing as suggested I see no Foo a but I do see the specialized Foo (Qux Int) with both ghc 8.0 or with 7.10. Don't have 7.8 installed on this machine.

comment:10 Changed 3 years ago by erikd

Even with ghc 7.6, this seems to specialize correctly:

cabal exec -- ghc -fforce-recomp -ddump-simpl -dsuppress-idinfo main.hs  2>&1 | grep Foo
...
    @ (Foo.Qux GHC.Types.Int)
    @ (Foo.Foo (Foo.Qux GHC.Types.Int))
    (Foo.$WBar @ (Foo.Qux GHC.Types.Int))
...
Last edited 3 years ago by erikd (previous) (diff)

comment:11 in reply to:  10 ; Changed 3 years ago by crockeea

I finally got 7.6 installed, and can reproduce the issue. The problem isn't that a Foo a appears, it's that without the SPECIALIZE pragma, there is a function called Main.$splus in core that takes/uses dictionaries, even though the type is monomorphic:

Main.$splus
  :: Foo.Foo (Foo.Qux GHC.Types.Int)
     -> Foo.Foo (Foo.Qux GHC.Types.Int)
     -> Foo.Foo (Foo.Qux GHC.Types.Int)
[GblId,
 Arity=2,
 Str=DmdType SS,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, WorkFree=True, Expandable=True,
         Guidance=IF_ARGS [30 30] 110 20}]
Main.$splus =
  \ (ds_dJ0 :: Foo.Foo (Foo.Qux GHC.Types.Int))
    (ds1_dJ1 :: Foo.Foo (Foo.Qux GHC.Types.Int)) ->
    case ds_dJ0 of _ {
      Foo.Bar v1_aH6 ->
        case ds1_dJ1 of _ {
          Foo.Bar v2_aH7 ->
            case (Foo.$fNumQux_$c+
                    @ GHC.Types.Int
                    GHC.Num.$fNumInt
                    Data.Vector.Unboxed.Base.$fUnboxInt
                    v1_aH6
                    v2_aH7)

Replying to erikd:

Even with ghc 7.6, this seems to specialize correctly:

cabal exec -- ghc -fforce-recomp -ddump-simpl -dsuppress-idinfo main.hs  2>&1 | grep Foo
...
    @ (Foo.Qux GHC.Types.Int)
    @ (Foo.Foo (Foo.Qux GHC.Types.Int))
    (Foo.$WBar @ (Foo.Qux GHC.Types.Int))
...

comment:12 Changed 3 years ago by crockeea

However, as SPJ pointed out, this seems to be resolved in GHC 8.0.1. Indeed, I don't need either the SPECIALIZE or the INLINABLE with GHC 8.0.1. Kudos to it.

I'll consider this resolved as soon as someone confirms (or denies) that auto-specialization is intended to be transitive.

comment:13 in reply to:  11 ; Changed 3 years ago by erikd

@crockeea Two quick questions:

1) The presence of the dictionary is inferred from case expression matching on Foo.$fNumQux_$c+ right?

2) What command line are you using to compile this. I'm still having a bit of trouble reproducing this even with ghc 7.6.3.

comment:14 in reply to:  13 Changed 3 years ago by crockeea

Replying to erikd:

@crockeea Two quick questions:

1) The presence of the dictionary is inferred from case expression matching on Foo.$fNumQux_$c+ right?

Not quite sure what you're asking, but the dictionaries I see are the arguments to Foo.$fNumQux_$c+, namely GHC.Num.$fNumInt and Data.Vector.Unboxed.Base.$fUnboxInt.

2) What command line are you using to compile this. I'm still having a bit of trouble reproducing this even with ghc 7.6.3.

I'm compiling with ghc-7.6.3 -ddump-simpl -O2 Main.hs. With just the INLINABLE pragma on Foo.plus, this takes over a minute on my computer. With the SPECIALIZE pragma (with or without the INLINABLE), it completes in 3 seconds.

comment:15 Changed 3 years ago by erikd

Without SPECIALIZE it takes about a second on my laptop (month old high end Dell with SSD). I simply can't image an x86_64 machine could be over 60 times slower.

Ok, so the suspicious code is:

            case (Foo.$fNumQux_$c+
                    @ GHC.Types.Int
                    GHC.Num.$fNumInt
                    Data.Vector.Unboxed.Base.$fUnboxInt
                    v1_aHk
                    v2_aHl)

which is what I get with ghc-7.6.3. I get something very similar with ghc-7.8.4.

For ghc 7.10.3 and 8.0.1 there is no instance of the string GHC.Num in the output from Main (but there is for Foo which is expected).

comment:16 Changed 3 years ago by simonpj

OK, so is the conclusion is that this is a perf bug in 7.6 (and maybe 7.8) but fine now?

Simon

comment:17 Changed 3 years ago by crockeea

@erikd: The performance disparity is odd. Not sure what to tell you there.

@simonpj: Correct: performance bug in 7.6 and 7.8, fixed after that apparently. There's still the question of intended behavior: yes or no to transitivity of auto-specialization?

Note: See TracTickets for help on using tickets.