Opened 2 years ago

Last modified 2 years ago

#13873 new bug

Adding a SPECIALIZE at a callsite in Main.hs is causing a regression

Reported by: jberryman Owned by:
Priority: normal Milestone: 8.2.3
Component: Compiler Version: 8.2.1-rc2
Keywords: Specialise Cc: crockeea
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Github user jmoy documents an issue with lack of specialization happening for INLINABLE functions in GHC 8.0 here: https://github.com/jmoy/testing-specialize . Testing on 8.2.1 it seems that specialization happened (although I didn't verify this in the core) as the result was 10x faster. But the odd thing to me was uncommenting the SPECIALIZE pragma at the callsite actually resulted in a significant regression: https://github.com/jmoy/testing-specialize/issues/1#issuecomment-310868360

Maybe GHC is choosing the worse manual partial specialization for some reason. I'm sorry I can't produce a better ticket for this at the moment.

Change History (9)

comment:1 Changed 2 years ago by bgamari

Milestone: 8.2.2
Type of failure: None/UnknownRuntime performance bug

Quite suspicious indeed. I'll have a look.

comment:2 Changed 2 years ago by simonpj

Keywords: Specialise added

As I understand it, you are saying that

  • adding a SPECIALISE pragma with 8.2 made the the code runs 2x slower

Whereas adding the same pragma with 8.0 made the code run 3x faster. (But still not as fast as the slowest version with 8.2.)

But it's the bulleted point that is mysterious. Ben please do investigate.

comment:3 Changed 2 years ago by mpickering

I looked at this.

Firstly, 8.2.1 is much faster than 8.0.2. So I looked at the core for both versions.

The key function is wcJH which in 8.0.2 calls $winsertWith with a dictionary argument $fPrimMonadST.

In 8.2.1, this function does get specialised (see $s$winsertWith) which accounts for the difference.

When the specialisation happens, the type of the specialised function is

$w$sinsertWith :: MutVar# (PrimState (ST RealWorld)) (HashTable_ (PrimState (ST RealWorld)) ByteString v)
                            -> v -> v -> v
                            -> ByteString
                            -> v 
                            -> State# RealWorld
                            -> (# State# RealWorld, () #)

The specialisation produced by the specialise pragma is instead,

$w$sinsertWith :: MutVar# (PrimState (ST s)) (HashTable_ (PrimState (ST s)) ByteString v)
                            -> v -> v -> v
                            -> ByteString
                            -> v 
                            -> State# s
                            -> (# State# s, () #)

as such the s parameter is not specialised to RealWorld. Changing the specialise pragma to

 {-# SPECIALIZE J.insertWith::(Hashable k,Eq k)
             => J.HashTable RealWorld k v
                -> (v->v->v) -> k -> v
                -> ST RealWorld () #-}

makes the performance the same.

Digging much deeper, deep in the definition of the specialised insertWith function there is a call to checkResize. Without the pragma, this is specialised but with the pragma, it is not specialised and passed three dictionary arguments. Fixing s in the specialise pragma to RealWorld. also causes checkResize to be specialised which is why the performance improves.

So it seems that the specialisation is not as good with the manual pragma as GHC is specialising more than the pragma in order to also remove a type argument. This means that other functions called by insertWith can also be specialised, if we do not specialise on s as well then their type will mention this type argument s which means that calls to this function are removed by dumpBindUDs at the top level.

GHC itself will not remove the type argument s as we only specialise on dictionary arguments - it does seem like we might be able to do better here. If we were to further specialise insertWith in order to also remove the type argument then it would cause checkResize to specialise as well.

comment:4 Changed 2 years ago by jberryman

Thanks so much for looking into this. I maybe wasn't clear in the body that the reason I filed the ticket was I assumed that GHC would try to choose the best specialization from those in scope (since my intuition was specialize pragmas behave sort of like type classes in the way they leak via import statements) and from whatever specializations it might perform in context.

But maybe this is expected behavior, and a manual specialization will always take precedence?

comment:5 Changed 2 years ago by simonpj

I don't have bandwidth to reproduce right now, but I think Matthew is saying we have

f :: forall s. State# s -> blah
f = /\s. \(x::State# s). ....(g @s d1 d2 d3)....

where g is an overloaded function, called three dictionary arguments. But note that f is not overloaded, and so will not be auto-specialised. Somehow, if we specialised f, we could specialise g.

I don't quite see how this happens. Can you fill in a bit more detail about what g is, and what dictionaries d1-d3 are?

comment:6 Changed 2 years ago by mpickering

That's right Simon.

How this happens is that the SPECIALISE pragma creates a rule which looks like

"SPEC insertWith"
    forall (@ k)
           (@ s)
           (@ v)
           ($dHashable :: Hashable k)
           ($dEq :: Eq k)
           ($dPrimMonad :: PrimMonad (ST s)).
      insertWith @ k @ (ST s) @ v $dHashable $dEq $dPrimMonad
= $sinsertWith @ k @ s @ v $dHashable $dEq

which fires at quite an early stage in optimisation. Then any function in $sinsertWith which mentions s won't be specialised as it is lambda bound.

If you remove the pragma you just get one specialisation rule which probably arises after a lot of inlining has happened.

"SPEC/Main insert @ ByteString @ (ST RealWorld) _"
    forall (@ v)
           ($dPrimMonad :: PrimMonad (ST RealWorld))
           ($dEq :: Eq ByteString)
           ($dHashable :: Hashable ByteString).
      insert @ ByteString
             @ (ST RealWorld)
             @ v
             $dHashable
             $dEq
             $dPrimMonad
= $scheckResize_$sinsert @ v

and if you fix s = RealWorld with the specialisation pragma then both occur.

"SPEC insertWith"
    forall (@ k)
           (@ v)
           ($dHashable :: Hashable k)
           ($dEq :: Eq k)
           ($dPrimMonad :: PrimMonad (ST RealWorld)).
      insertWith @ k @ (ST RealWorld) @ v $dHashable $dEq $dPrimMonad
      = $sinsertWith @ k @ v $dHashable $dEq
"SPEC/Main insert @ ByteString @ (ST RealWorld) _"
    forall (@ v)
           ($dPrimMonad :: PrimMonad (ST RealWorld))
           ($dEq :: Eq ByteString)
           ($dHashable :: Hashable ByteString).
      insert @ ByteString
             @ (ST RealWorld)
             @ v
             $dHashable
             $dEq
             $dPrimMonad
= $scheckResize_$sinsert @ v

Seeing as you also ask for the type of g, it is defined in another module as is marked INLINABLE.

checkResize::(Hashable k,Eq k,PrimMonad m)
           => HashTable_ (PrimState m) k v
-> m (Maybe (HashTable (PrimState m) k v))

comment:7 in reply to:  4 Changed 2 years ago by mpickering

Replying to jberryman:

Thanks so much for looking into this. I maybe wasn't clear in the body that the reason I filed the ticket was I assumed that GHC would try to choose the best specialization from those in scope (since my intuition was specialize pragmas behave sort of like type classes in the way they leak via import statements) and from whatever specializations it might perform in context.

But maybe this is expected behavior, and a manual specialization will always take precedence?

The specialiser doesn't choose which specialisation to apply, it merely looks for specialisation opportunities and then creates a new definition along with a RULE which does the replacement.

If you have conflicting specialisations then the choice about which one applies is left up to he rule selection mechanism which I am not as familiar with.

comment:8 Changed 2 years ago by crockeea

Cc: crockeea added

comment:9 Changed 2 years ago by bgamari

Milestone: 8.2.28.2.3

If you have conflicting specialisations then the choice about which one applies is left up to the rule selection mechanism which I am not as familiar with.

Unfortunately this pretty much means that the choice comes down to chance (e.g. which rule happens to fire first).

Regardless, this won't be fixed for 8.2.2. Bumping to 8.2.3.

Note: See TracTickets for help on using tickets.