Opened 8 years ago

Closed 8 years ago

#5658 closed bug (fixed)

Strict bindings are wrongly floated out of case alternatives.

Reported by: benl Owned by: benl
Priority: high Milestone: 7.4.1
Component: Compiler Version: 7.2.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime crash Test Case: simplCore/should_compile/T5658b
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

With this program:

{-# LANGUAGE MagicHash, BangPatterns #-}
module Broken where
import qualified Data.Vector            as V
import qualified Data.Vector.Unboxed    as U

broken :: V.Vector (U.Vector Double) -> Int -> Int -> IO Double
broken !arrs !ix !len
 = case ix >= len of
         True   -> error "sorry"
         False  -> return ((arrs `V.unsafeIndex` ix) `U.unsafeIndex` ix)

Note that both indexing operations are within the 'False' branch, and the test is intended to check that the indexing is indeed safe.

Sadly, the simplifier floats the inner indexing operation outside the bounds check:

broken1
broken1 =
  \ arrs_s185 ix_s18a len_s18j eta_s18o ->
    let { Vector ipv_s18e _ ipv2_s18d ~ _ <- arrs_s185 } in
    let { I# ipv3_s18f ~ _     <- ix_s18a } in
    let { __DEFAULT ~ sat_s18D <- +# ipv_s18e ipv3_s18f } in
    let { (# x_s18p #) ~ _     <- indexArray# ipv2_s18d sat_s18D } in     *** NO! ***
    let { I# ipv4_s18m ~ _     <- len_s18j } in
    case >=# ipv3_s18f ipv4_s18m of _ {
      False ->
        let {
          sat_s18G
          sat_s18G =
            let { Vector rb_s18v _ rb2_s18u ~ _ <- x_s18p `cast` ... } in
            let { __DEFAULT ~ sat_s18J <- +# rb_s18v ipv3_s18f } in
            let { __DEFAULT ~ sat_s18K
            <- indexDoubleArray# rb2_s18u sat_s18J
            } in
            D# sat_s18K } in
        (# eta_s18o, sat_s18G #);
      True -> broken2 `cast` ...
    }

If it was a lazy binding it would have been ok to float it, but it's not. This issue is probably causing other problems in GHC, maybe #5085. This is broken in the head as well as 7.2.

Change History (22)

comment:1 Changed 8 years ago by simonpj

That is terrible. Happily the solution is simple: array indexing operations in primpops.txt.pp should be marked

   with can_fail = True

Can you try that?

Simon

comment:2 Changed 8 years ago by benl

Owner: set to benl

Adding the can_fail to all the indexing primps fixes the problem. I can push my patch when validate works on OSX again, it's broken now.

comment:3 Changed 8 years ago by benl@…

commit 657773c8e59917fda05ee08065ec566aebb50a5f

Author: Ben Lippmeier <benl@ouroborus.net>
Date:   Tue Nov 29 16:38:33 2011 +1100

    Fix #5658: mark all array indexing primops as can_fail
    
    If they're not marked as can_fail, then they are floated out of case expressions that check whether the indices are in-bounds, causing immense suffering.

 compiler/prelude/primops.txt.pp |  112 +++++++++++++++++++++++++++++++++++++-
 1 files changed, 109 insertions(+), 3 deletions(-)

comment:4 Changed 8 years ago by simonmar

I think we only need to mark the indexBlahArray# operations with can_fail, the readBlahArray# and writeBlahArray# are already fine: they are already marked as having side effects, so they will never be evaluated when they shouldn't be.

comment:5 Changed 8 years ago by benl

Hrm. I marked them all as can_fail because it's true that they can fail, and now the function primOpCanFail :: Name -> Bool should return the correct information for each primop.

If we want to say "If a primop is marked as has_side_effects then GHC won't do anything different if it's also marked as can_fail", then this is a property of the compiler rather than a property of the primop.

The only place I can find the can_fail and has_side_effects flags being used is via the following function:

primOpOkForSpeculation :: PrimOp -> Bool
primOpOkForSpeculation op
  = not (primOpHasSideEffects op || primOpOutOfLine op || primOpCanFail op)

From this, it looks like the can_fail flag is redundant anyway. We could replace all occurrences of can_fail in the primops.txt.pp file with has_side_effects and still get the same result.

What do you want to do?

comment:6 Changed 8 years ago by simonmar

difficulty: Unknown

I think of has_side_effects as implying can_fail, but not vice versa.

  • A primop that is neither can_fail nor has_side_effects can be executed speculatively, any number of times
  • A primop that is marked can_fail cannot be executed speculatively, but it can be repeated (why would you want to do that? Perhaps it might enable some eta-expansion, if you can prove that the lambda is definitely applied at least once. I guess we don't currently do that).
  • A primop that is marked has_side_effects can be neither speculated nor repeated; it must be executed exactly the right number of times.

This is all moot since as you point out we don't currently take advantage of can_fail && !has_side_effects.

comment:7 Changed 8 years ago by simonpj@…

commit 56a05294f3a94a6826c8eaaef6c9946d42c71eaf

Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Mon Dec 12 11:16:49 2011 +0000

    Add comments about the meaning of can_fail and has_side_effects
    
    Taken from Trac #5658

 compiler/prelude/PrimOp.lhs     |  136 ++++++++++++++++++++++++--------------
 compiler/prelude/primops.txt.pp |    5 +-
 2 files changed, 89 insertions(+), 52 deletions(-)

comment:8 Changed 8 years ago by simonpj

Ben: I think this ticket is fixed, but it'd be good to have a test case if possible. Could you think about concocting one? Ideally without depending on vector. If it's hard, just close the ticket.

Simon

comment:9 Changed 8 years ago by igloo

Milestone: 7.4.1

comment:10 Changed 8 years ago by rl

It turns out that this fix can decrease performance because now, index*Array# operations can't be eliminated or moved into cases. For an example of mine, we generated this code before:

      $weq_loop0_s1Ty =
        \ (ww_s1Sv :: Int#) (ww1_s1Sz :: Int#) ->
          case >=# ww_s1Sv ipv1_s1K9 of _ {
            False ->
              case >=# ww1_s1Sz sc1_s1VP of _ {
                False ->
                  case indexIntArray# ipv2_s1Ka (+# ipv_s1K8 ww_s1Sv)
                  of wild_a1rf { __DEFAULT ->
                  case indexIntArray# sc2_s1VQ (+# sc_s1VO ww1_s1Sz)
                  of wild3_X1rU { __DEFAULT ->
                  case ==# wild_a1rf wild3_X1rU of _ {
                    False -> False;
                    True -> $weq_loop0_s1Ty (+# ww_s1Sv 1) (+# ww1_s1Sz 1)
                  }
                  }
                  };
                True -> False
              };
            True -> >=# ww1_s1Sz sc1_s1VP
          }; 

And now we generate this:

      $weq_loop0_s1TF =
        \ (ww_s1SC :: Int#) (ww1_s1SG :: Int#) ->
          case >=# ww_s1SC ipv1_s1Kg of _ {
            False ->
              case indexIntArray# ipv2_s1Kh (+# ipv_s1Kf ww_s1SC)
              of wild_a1rg { __DEFAULT ->
              case >=# ww1_s1SG sc1_s1VW of _ {
                False ->
                  case indexIntArray# sc2_s1VX (+# sc_s1VV ww1_s1SG)
                  of wild3_X1rU { __DEFAULT ->
                  case ==# wild_a1rg wild3_X1rU of _ {
                    False -> False;
                    True -> $weq_loop0_s1TF (+# ww_s1SC 1) (+# ww1_s1SG 1)
                  }
                  };
                True -> False
              }
              };
            True ->
              case >=# ww1_s1SG sc1_s1VW of _ {
                False ->
                  case indexIntArray# sc2_s1VX (+# sc_s1VV ww1_s1SG)
                  of _ { __DEFAULT ->
                  False
                  };
                True -> True
              }
          }; 

I'm not sure if this is the intended semantics of can_fail. If so, we might want something weaker than can_fail for things that shouldn't be executed speculatively but are ok to not executed at all.

comment:11 Changed 8 years ago by rl

Actually, the good code looked like this (sorry, got my loops mixed up):

      $weq_loop0_s22g =
        \ (ww_s21d :: Int#) (ww1_s21h :: Int#) ->
          case >=# ww_s21d ipv1_s1QZ of _ {
            False ->
              case >=# ww1_s21h sc1_s24C of _ {
                False ->
                  case ==#
                         (indexIntArray# ipv2_s1R0 (+# ipv_s1QY ww_s21d))
                         (indexIntArray# sc2_s24D (+# sc_s24B ww1_s21h))
                  of _ {
                    False -> False;
                    True -> $weq_loop0_s22g (+# ww_s21d 1) (+# ww1_s21h 1)
                  };
                True -> False
              };
            True -> >=# ww1_s21h sc1_s24C
          }; 

Not that it matters.

comment:12 Changed 8 years ago by rl

Incidentally, am I correct in assuming that the reason this has only shown up now is because of the fixes to #4978 which made primops appear cheaper (too cheap, it seems)? We shouldn't really float array accesses past bounds checks anyway because if we do, we will be doing unnecessary work if the check fails. If this is so, then would it make sense to revert the patch from #4978 that caused this and remove the can_fail annotations again? Note that #5623 is another bug caused by primops being too cheap. This change seems to have a lot of unintended knock-on effects.

comment:13 Changed 8 years ago by simonmar

To be clear, the changes in #4978 were only to GHC's perception of the code size of a primop, not its runtime cost. It was completely wrong before, charging as much for the code size of an inline primop (typically one instruction) as for a lambda (tens of instructions and an info table). If anything, we're still overcharging for inline primops. But, this is a delicate area and making any change, whether correct or not, is likely to upset heuristics elsewhere. Rather than going back to the definitely-wrong behaviour before, I suggest we find out which heuristics are now wrong and fix them.

I think you're right about wanting a weaker version of can_fail, incidentally.

comment:14 Changed 8 years ago by rl

Oh, I see, I thought that the patches also affected runtime costs. But can we agree that floating array accesses (or any primops, for that matter) out of conditionals is a bug even if those accesses aren't marked as can_fail? I really don't see how this can possibly be beneficial since the only thing it can achieve is doing unnecessary work. But if array accesses can't be floated out of conditionals then there is no reason to mark them as can_fail because they will never be floated past bounds checks anyway.

comment:15 Changed 8 years ago by simonmar

Sounds reasonable to me, I'm not sure why cases are floated out of conditionals, we'll have to wait for Simon to comment.

comment:16 Changed 8 years ago by rl

As requested, here is a small program that demonstrated this. Note the difference between 7.2 and 7.4 when compiling with -O2.

{-# LANGUAGE MagicHash, BangPatterns #-}
module T where
import GHC.Prim

foo :: ByteArray# -> ByteArray# -> Int# -> Int# -> Bool
foo xs ys m n = go 0# 0#
  where
    go i j = case i >=# m of
      False -> let !x = indexIntArray# xs i in
        case j >=# n of
          False -> case x ==# indexIntArray# ys j of
            False -> False
            True  -> go (i +# 1#) (j +# 1#)
          True -> False
      True -> case j >=# n of
        False -> let !y = indexIntArray# ys i in False
        True -> True

comment:17 Changed 8 years ago by simonpj

Priority: normalhigh

comment:18 Changed 8 years ago by simonpj@…

commit 3beb1a831b37f616b5e8092def2e51cd9825735f

Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Thu Jan 12 17:17:22 2012 +0000

    Fix Trac #5658: strict bindings not floated in
    
    Two changes here
    
    * The main change here is to enhance the FloatIn pass so that it can
      float case-bindings inwards.  In particular the case bindings for
      array indexing.
    
    * Also change the code in Simplify, to allow a case on array
      indexing (ie can_fail is true) to be discarded altogether if its
      results are unused.
    
    Lots of new comments in PrimOp about can_fail and has_side_effects
    
    Some refactoring to share the FloatBind data structure between
    FloatIn and FloatOut

 compiler/coreSyn/CorePrep.lhs   |    2 +-
 compiler/coreSyn/CoreUtils.lhs  |   55 +++++++------
 compiler/coreSyn/MkCore.lhs     |   22 +++++
 compiler/prelude/PrimOp.lhs     |  163 ++++++++++++++++++++++-----------------
 compiler/simplCore/FloatIn.lhs  |  123 +++++++++++++++++++-----------
 compiler/simplCore/FloatOut.lhs |   20 +----
 compiler/simplCore/SimplEnv.lhs |    1 +
 compiler/simplCore/Simplify.lhs |   10 ++-
 8 files changed, 237 insertions(+), 159 deletions(-)

comment:19 Changed 8 years ago by simonpj

Status: newmerge

Ian: I believe we should merge this to the branch. It validates ok, but I have not done performance tests. Roman, maybe you can test too?

Simon

comment:20 Changed 8 years ago by simonpj

Test Case: simplCore/should_compile/T5658b

Test T5658b is a simple test of Roman's program.

comment:21 Changed 8 years ago by rl

Things look much better than before, thanks! I'm still seeing two significant slowdowns in the vector benchmarks. I think I tracked one down to #5767, in particular to the missing integerToInt/smallInteger rule. I'll investigate the other one this weekend.

comment:22 Changed 8 years ago by igloo

Resolution: fixed
Status: mergeclosed
Note: See TracTickets for help on using tickets.