Opened 6 years ago

Last modified 7 months ago

#8598 new task

IO hack in demand analyzer gets in the way of CPR

Reported by: nomeata Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.6.3
Keywords: CPRAnalysis Cc: akio
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: T8598
Blocked By: Blocking:
Related Tickets: #1600 Differential Rev(s):
Wiki Page:

Description

After a lot of staring at code and comparing unexpected nofib results I found the following:

The IO hack in the demand analyzer (see dmdAnalAlt in StrAnal.lhs and #1592 for a good discussion) prevents CPR in any function that uses a C call. This is a small example, reduced from the scaleFloat method for doubles:

module Float(fun) where

import GHC.Float (Double(..))
import GHC.Integer (decodeDoubleInteger, encodeDoubleInteger)

fun :: Double -> Double
fun x | isFix           = x
      | otherwise       = case x of
          (D# x#) -> case decodeDoubleInteger x# of
            (# i, j #) -> D# (encodeDoubleInteger i j)
  where
  isFix = isDoubleFinite x == 0

foreign import ccall unsafe "isDoubleFinite" isDoubleFinite :: Double -> Int

Here, fun does current not get the CPR property, and the work gets type GHC.Prim.Double# -> GHC.Types.Double. Why? Because in core, there will be a

    case {__pkg_ccall main isDoubleFinite GHC.Prim.Double#
                                 -> GHC.Prim.State# GHC.Prim.RealWorld
                                 -> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Prim.Int# #)}_dQD
           ds_dQA GHC.Prim.realWorld#
    of _ [Occ=Dead, Dmd=<L,A>]
    { (# ds_dQC [Dmd=<L,A>, OS=OneShot], ds_dQB [Dmd=<S,1*U>] #) ->
      ...

where the case body has DmdType m {dQz-><L,1*U(U)> dQA-><L,U>}, but dmdAnalAlt detects that this is evaluating a possibly exiting function and throws aways the information using alt_ty lubDmdType topDmdType.

Would it be ok to lub only the demand on the free variables, but keep the CPR information?

In nofib (if I managed to compare the right results) this does nothing for almost all benchmarks, -9.2% of allocations for mandel and +4.9% for reverse-complement (but these numbers are not obtained very cleanly, and all is happening on top of the better-ho-cardinality branch.

Change History (16)

comment:1 Changed 6 years ago by nomeata

A shower later I believe I have a quite decent solution:

Currently, the CPR lattice has these elements:

    NoCPR
   /     \
ReProd  RetSum
   \     /
    BotCPR

where BotCPR not only tells us that there is not going to be a result, but also that the function is diverging (and hence puts a hyperstrict demand on free variables). For the hypothesised „clean exit of the IO function“ we need something in between:

    NoCPR
   /     \
ReProd  RetSum
   \     /
   ExitCPR
      |
   BotCPR

ExitCPR behaves like BotCPR when with regard to lubing it with another result (i.e. whatever the other things knows about the result stays there), but has a <L,A> demand on free variables.

comment:2 Changed 6 years ago by nomeata

Mail from SPJ:

Well spotted. I'm on a train, hence email response. Maybe you can paste this into the ticket?

There are two different issues here.

First, isDoubleFinite is declared as non-side-effecting:

foreign import ccall unsafe "isDoubleFinite" isDoubleFinite :: Double -> Int

But (as you can see from the code you give) we currently desugar it into something that looks (to Core) as though it might have a side effect, or raise a (synchronous) exception. That is stupid.

How might we fix that? I can think of two ways.

  • Generate a FCallId whose type is Double -> Int rather than (as now) Double -> IO Int. There would be a few knock-on consequences to make sure they were correctly code-generated. I like this path best, because it reflects the truth.
  • Currently
    IO a = State# RealWorld# -> (# State# RealWorld#, a #)
    
    For these non-side-effecting things we could instead generate a FCallId with a type involving SafeIO instead of IO:
    SafeIO a = State# SafeWorld# -> (# State# SafeWorld#, a #)
    
    The different "world token" would express the idea that the function can't throw an exception.

I don't like this as much, but it might in any case be useful for things that do have side effects but don't throw exceptions.

My preference is for the first.

Second, as you point out, consider

   f x = do { when (x>3) exit
            ; return (True, False) }

Function f might throw an exception or exit rather than returning, but if it does return it will certainly have the CPR property. So yes, CPR-ness is quite safe.

It's not quite so obvious for divergence:

   g x = do { when (x>3) exit
            ; g x }

   h 0 y = y
   h x y = g x

Is h strict in y? You might say (reasonably) that we can ignore the possible IO exception/exit in g when figuring out that g is sure to diverge. If so, we'd say that h is strict in y. But if (x>3) then really y is not evaluated... and spotting that is exactly what the IO hack in the demand analyser is spotting. So I think it is not safe to propagate divergence information.

In short, CPR info yes, guaranteed-divergence no.

comment:3 Changed 6 years ago by nomeata

In short, CPR info yes, guaranteed-divergence no.

Which is precisely what we obtain from adding ExitCPR to the lattice, as ExitCPR `lub` BotCPR = ExitCPR.

comment:4 Changed 6 years ago by nomeata

Nofib believes that this is a good change (runtime not reliable, as I ran other stuff in parallel):

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           fft2          -0.0%     -0.1%      0.10      0.10     +0.0%
         mandel          -0.0%     -9.2%      0.10      0.10     +0.0%
       nucleic2          -0.0%     -0.3%      0.10      0.10     +0.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -9.2%     -3.4%     -4.1%     +0.0%
            Max          -0.0%     +0.0%    +18.5%    +19.3%     +0.0%
 Geometric Mean          -0.0%     -0.1%     +1.7%     +1.6%     -0.0%

I’ll push it to branch wip/T8598 for review.

comment:5 Changed 6 years ago by nomeata

Also note that the current IO hack is very erratic with regard to IO inside unsafePerformIO (including non-IO FFI calls): It only zaps strictness demands if it actually sees the IO type. If, in the example above, isDoubleFinite were not inlined, but a separate function of type Double -> IO Bool, it would behave different. Very unsatisfying.

This proposal should eliminate the need for special-casing in StrAnal completely, and make the analysis more precise.

We could elaborate the DmdResult lattice some more (and I include the Converges from the nested-cpr-branch here, to show how that goes together). Basically we want the DmdResult to keep track of:

  • whether function may or will diverge.
  • whether the function may or will exit cleanly (this is new)
  • if it returns, what is the result.

So a first approximation of that would be

data CPR = NoCRP | RetCon Int [DmdResult]
data TriState = Yes | No | Maybe -- Maybe is top
data DmdResult = DmdResult { diverges :: TriState, exits :: TriState, cpr :: CPR } -- Product lattice

which is easy to understand and handle, but it has some invalid states, i.e. if it definitely exits or diverges, we do not really want a CPR field, and also not both diverges and exists should be True at the same time.

A format that captures it more precisely would be

data DmdResult
 = Diverges
 | Exits
 | DmdResult { mayExit :: True, mayDiverge :: True, cpr :: CPR}

which neatly expresses definite convergence with DmdResult False False cpr. I like that.

So the idea is now that primitive operations have their DmdResult annotation manually (and I guess most of them do not exit). FFI calls either get a conservative default of DmdResult True True NoCPR for things in the IO monad and DmdResult False True NoCPR for the others.

Then in the demand analyser no special handling of IO is required any more. Instead, in bothDmdType, one has to handle the mayExit flag of the right argument and possibly zap strictness, just like we do already for diverging things. And even IO-infested code will, as long as it does not call anything with mayExit = True (e.g. numerical calculation, working with arrays and IORefs), will get good strictness demands inferred.

How does that sound? Or is it overkill adding this to the demand types, given that (hopefully) almost everything will have mayExit = False?

comment:6 in reply to:  4 Changed 6 years ago by nomeata

Replying to nomeata:

I’ll push it to branch wip/T8598 for review.

The branch validates. Should I push this, independent of the extended approach suggested later?

comment:7 Changed 6 years ago by Joachim Breitner <mail@…>

comment:8 Changed 6 years ago by nomeata

Test Case: T8598

My extended approach suggested above has issues. For example, what should be the DmdResult of a function parameter be? Lets consider f x y = x `seq` y `seq` ... and look at the DmdResult of x.

Theoretically, we’d want to be conservative, and hence DmdResult { mayExit = True, mayDiverge = True, cpr = NoCPR }. But that would loose far too much strictness information, in particular we’d be lazy in y, which is undesirable.

OTOH, if we assume DmdResult { mayExit = False, mayDiverge = True, cpr = NoCPR } for x it would be wrong if we pass something that can cleanly exit to f. unsafePerformIO exitSuccess would be such a thing – but we do not really care about that, right?

If we take the stance that clean exists should only really be happening when evaluating IO (and any other kind exiting is allowed to be reordered with other evaluations), then a still correct, but more precise analysis that we have right now is to use the elaborated lattice described in comment:5 and move the IO hack from case expressions to arguments: Use DmdResult { mayExit = True, mayDiverge = True, cpr = NoCPR } for arguments of IO type, and DmdResult { mayExit = False, mayDiverge = True, cpr = NoCPR } for all others.

comment:9 Changed 6 years ago by nomeata

Simplified my fix in `wip/T8598` (did not add a new element to the lattice, rather added a simple unary function to defer a DmdType when it occurs after an IO action). Code is currently validating, and will be pushed afterwards.

This branch also contains a patch “Rename topDmdType to nopDmdType”:

because topDmdType is not the top of the lattice, as it puts an implicit absent demand on free variables, but Abs is the bottom of the Usage lattice.

Why nopDmdType? Becuase it is the demand of doing nothing: Everything lazy, everything absent, no definite divergence.

This ticket stays open to track the first issue SPJ mentioned in comment:2.

comment:10 Changed 6 years ago by Joachim Breitner <mail@…>

In def26d4335641b967940c56daac552b07b4efa6e/testsuite:

Mark testcase for #8598 as not broken

comment:11 Changed 6 years ago by Joachim Breitner <mail@…>

In a31cb5b07726f5739f6eac35cbb348fcd2d6b598/ghc:

Do not forget CPR information after an IO action

but do forget about certain divergence, if required. Fixes one part of
ticket #8598.

The added function (deferAfterIO) can maybe be merged with existing
code, but given the ongoing work in the nested-cpr branch, I defer that
work.

comment:12 Changed 3 years ago by thomie

Type of failure: None/UnknownRuntime performance bug

comment:13 Changed 2 years ago by akio

I have come across an example where adding a new item to the lattice is actually useful:

{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
module Float(fun) where

import GHC.Exts
import GHC.Integer (decodeDoubleInteger, encodeDoubleInteger)

fun :: Double -> Double
fun x@(D# x#)
  | x == 0 = 0
  | otherwise = case isDoubleFinite x of
      I# y# -> case decodeDoubleInteger x# of
        (# i, j #) -> fun (D# (encodeDoubleInteger i (y# +# j)))

foreign import ccall unsafe "isDoubleFinite" isDoubleFinite :: Double -> Int

Here fun is a tail-recursive function, and I think it can be CPRed just fine. However the current GHC fails to give fun a CPR property because it removes too much information in deferAfterIO. Specifically:

  1. In the first iteration of the fixpoint calculation, the recursive to call to fun gets the CPR result of Diverges.
  2. However, the case expression that scrutinizes isDoubleFinite x causes a call to deferAfterIO, which turns the CPR result to Dunno NoCPR.
  3. This in turn means the whole function doesn't get any useful CPR property.

If we had ExitCPR (or Exits to match the current naming convention) in the lattice, then the step (2) would have returned Exits instead of Dunno NoCPR, and allowed fun to get a useful CPR property.

I don't know how often this comes up in practice. With nested CPR, this will become a much bigger problem because it essentially means no tail-recursive IO function can get a useful CPR property.

comment:14 Changed 2 years ago by akio

Cc: akio added

comment:15 Changed 2 years ago by simonpj

For pure ccalls like this one we have no business messing with IO at all. I'd love to implement the first bullet of comment:2.

What you say may also be true of tail-recursive functions that do perform IO, but there would be fewer cases where it mattered.

comment:16 Changed 7 months ago by simonpj

Keywords: CPRAnalysis added
Note: See TracTickets for help on using tickets.