Opened 15 months ago

Closed 14 months ago

Last modified 11 months ago

#15300 closed bug (fixed)

Unboxed Sums Crash

Reported by: andrewthad Owned by: osa1
Priority: highest Milestone: 8.6.1
Component: Compiler Version: 8.5
Keywords: UnboxedSums Cc: bgamari, osa1
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D4962
Wiki Page:

Description (last modified by bgamari)

I've made it a little further in my experiments with unboxed tuples in the packed library. However, I've run into another issue that I strongly suspect is the result of bad behavior of unboxed tuples. To replicate this issue (with GHC 8.4.3), do the following:

git clone https://github.com/andrewthad/packed
cd packed
cabal new-build

We use cabal new-build for its side effect of creating a .ghc.environment.xyz file. Now, create a minimal example in the directory called eol.hs with the following contents:

import Packed.Bytes.Parser (Parser)
import Data.Word
import Packed.Bytes (Bytes)
import GHC.Exts (RealWorld)
import Packed.Bytes.Stream.IO (ByteStream)
import qualified Packed.Bytes as B
import qualified Data.Char
import qualified Packed.Bytes.Parser as P
import qualified Packed.Bytes.Stream.IO as Stream

main :: IO ()
main = do
  r <- runExampleParser
    ( do P.takeBytesUntilEndOfLineConsume
         P.takeBytesUntilEndOfLineConsume
         P.takeBytesUntilEndOfLineConsume
    ) (foldMap Stream.singleton (map charToWord8 "the\nemporium\rhas\narrived"))
  print r

runExampleParser :: Parser e () a -> ByteStream RealWorld -> IO (Maybe a, Maybe String)
runExampleParser parser stream = do
  P.Result mleftovers r _ <- P.parseStreamIO stream () parser
  mextra <- case mleftovers of
    Nothing -> return Nothing
    Just (P.Leftovers chunk remainingStream) -> do
      bs <- Stream.unpack remainingStream
      return (Just (map word8ToChar (B.unpack chunk ++ bs)))
  return (either (const Nothing) Just r,mextra)

word8ToChar :: Word8 -> Char
word8ToChar = Data.Char.chr . fromIntegral

charToWord8 :: Char -> Word8
charToWord8 = fromIntegral . Data.Char.ord

s2b :: String -> Bytes
s2b = B.pack . map charToWord8

c2w :: Char -> Word8
c2w = charToWord8

Finally, build this with ghc -O2 eol.hs, and then run the executable this produces to get the following:

(Nothing,Just "\rhas\narrived")
eol: internal error: stg_ap_n_ret
    (GHC version 8.4.3 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Aborted (core dumped)

Things worth noting:

  1. I think the program fails in the final GC that runs right before the program terminates. We can see that it produces a correct result of (Nothing,Just "\rhas\narrived"), but something on the heap has definitely been corrupted.
  2. This only happens with -O2 turned on.
  3. This only happens when the parser does not successfully parse its input.

Here's some more context around this. I've been working on a parser that uses unboxed sums instead of continuations. After #15038 was fixed, everything had been going well. Then, I took the parser type and added two things to it: (1) context and (2) typed errors. Context is basically like throwing StateT on top and errors are like throwing ExceptT on top. After this, everything in my test suite kept working except for a single test, which now consistently crashes my test suite. So, I originally had this:

type Bytes# = (# ByteArray#, Int#, Int# #)
type Maybe# (a :: TYPE r) = (# (# #) | a #)
type Leftovers# s = (# Bytes# , ByteStream s #)
type Result# s (r :: RuntimeRep) (a :: TYPE r) =
  (# Maybe# (Leftovers# s), Maybe# a #)
newtype ParserLevity (r :: RuntimeRep) (a :: TYPE r) = ParserLevity
  { getParserLevity :: forall s.
       Maybe# (Leftovers# s)
    -> State# s
    -> (# State# s, Result# s r a #)
  }

But I changed it to this:

type Bytes# = (# ByteArray#, Int#, Int# #)
type Maybe# (a :: TYPE r) = (# (# #) | a #)
type Either# a (b :: TYPE r) = (# a | b #)
type Leftovers# s = (# Bytes# , ByteStream s #)
type Result# e c s (r :: RuntimeRep) (a :: TYPE r) =
  (# Maybe# (Leftovers# s), Either# (Maybe e) a, c #)

newtype ParserLevity e c (r :: RuntimeRep) (a :: TYPE r) = ParserLevity
  { getParserLevity :: forall s.
       c
    -> Maybe# (Leftovers# s)
    -> State# s
    -> (# State# s, Result# e c s r a #)
  }

Specifically, the function causing trouble is (as currently defined):

{-# NOINLINE takeBytesUntilEndOfLineConsumeUnboxed #-}
takeBytesUntilEndOfLineConsumeUnboxed :: ParserLevity e c BytesRep Bytes#
takeBytesUntilEndOfLineConsumeUnboxed = ParserLevity (go (# (# #) | #)) where
  go :: Maybe# Bytes# -> c -> Maybe# (Leftovers# s) -> State# s -> (# State# s, Result# e c s BytesRep Bytes# #)
  go !_ c (# (# #) | #) s0 = (# s0, (# (# (# #) | #), (# Nothing | #), c #) #)
  go !mbytes c (# | (# bytes0@(# arr0, off0, len0 #), !stream0@(ByteStream streamFunc) #) #) s0 = case BAW.findAnyByte2 (I# off0) (I# len0) 10 13 (ByteArray arr0) of
    Nothing -> case streamFunc s0 of
      (# s1, r #) -> go (# | appendMaybeBytes mbytes bytes0 #) c r s1
    Just (I# ix, W8# theByte) -> case theByte of
      10## -> (# s0, (# (# | (# unsafeDrop# ((ix -# off0) +# 1# ) bytes0, stream0 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #) #), c #) #)
      -- second case means it was 13
      _ -> case ix <# (off0 +# len0 -# 1#) of
        1# -> case indexWord8Array# arr0 (ix +# 1# ) of
          10## -> (# s0, (# (# | (# unsafeDrop# ((ix -# off0) +# 2# ) bytes0, stream0 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #) #), c #) #)
          _ -> (# s0, (# (# | (# unsafeDrop# (ix -# off0) bytes0, stream0 #) #), (# Nothing | #), c #) #)
        _ -> case nextNonEmpty stream0 s0 of
          (# s1, m #) -> case m of
            (# (# #) | #) -> (# s1, (# (# | (# unboxBytes (B.singleton 13), Stream.empty #) #), (# Nothing | #), c #) #)
            (# | (# bytes1@(# arr1, _, _ #), stream1 #) #) -> case indexWord8Array# arr1 0# of
              10## -> (# s1, (# (# | (# unsafeDrop# 1# bytes1, stream1 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #) #), c #) #)
              _ -> (# s1, (# (# | (# unboxBytes (B.cons 13 (boxBytes bytes1)), stream1 #) #), (# Nothing | #), c #) #)

That's all I've got for now. If no one's able to make headway, I'll probably come back to this and try to make a more minimal example at some point. I won't have time to do this soon though.

Change History (25)

comment:1 Changed 15 months ago by osa1

Thanks for reporting. I managed to reproduce this with GHC HEAD.

I think the program fails in the final GC that runs right before the program terminates. We can see that it produces a correct result of (Nothing,Just "\rhas\narrived"), but something on the heap has definitely been corrupted.

This program does not do any GC at all. Try this:

$ gdb --args ./eol
Reading symbols from ./eol...done.
>>> break GarbageCollect
Breakpoint 1 at 0x8b3ca7: file rts/sm/GC.c, line 224.
>>> r

you'll see that the breakpoint is never triggered. So this is definitely a code generation bug.

comment:2 Changed 15 months ago by osa1

Version: 8.4.38.5

comment:3 Changed 15 months ago by osa1

Type: taskbug

comment:4 Changed 15 months ago by osa1

Both the library and the program passes Core, Stg and Cmm lints.

comment:5 Changed 15 months ago by andrewthad

Priority: normalhigh

Bumping this to high priority to increase make this visible in the list of important tickets for GHC 8.6.

comment:6 Changed 15 months ago by bgamari

Description: modified (diff)

comment:7 Changed 15 months ago by osa1

andrewthad, why do you think takeBytesUntilEndOfLineConsumeUnboxed is the problematic function here?

This is currently hard to debug as the reproducer is huge. A smaller reproducer would be very helpful.

comment:8 Changed 15 months ago by andrewthad

I don't have any suspicions about why that specific function is the only one that causes problems. I can put together a smaller reproducer though.

comment:9 Changed 15 months ago by andrewthad

For a smaller reproducer, I've reused https://github.com/andrewthad/corrupted-memory-example. On master, I've basically got a copy of packed with everything stripped out that isn't needed to demonstrate this particular problem. I'm going to try to golf it down a little more today.

comment:10 Changed 15 months ago by andrewthad

I just brought the size down by another 50 lines of code. It's still at 550 lines total. I'll keep coming back to this occasionally to see what else I can eliminate.

comment:11 Changed 14 months ago by bgamari

Owner: set to osa1
Priority: highhighest

comment:12 Changed 14 months ago by osa1

This is probably a StgCSE bug as disabling it fixes this bug.

comment:13 Changed 14 months ago by osa1

Looking at STG before and after CSE, the only difference is:

case val_scuG of {
  (#_|#) err_scuJ [Occ=Once] ->
      case (#_|#) [err_scuJ] of sat_scuK [Occ=Once] {
        __DEFAULT ->
            Main.$w$j ipv_scuC leftovers1_scuF sat_scuK;
      };
  ...

becomes:

case val_scuG of {
  (#_|#) err_scuJ [Occ=Once] ->
      case wild2_scuI of sat_scuK [Occ=Once] {
        __DEFAULT ->
            Main.$w$j ipv_scuC leftovers1_scuF sat_scuK;
      };
  ...

But note that wild2_scuI was dead before (dead binders are not printed by the STG printer so I'm not sure where it's bound). If the problem is reviving wild2_scuI then this is a duplicate of #14895.

comment:14 Changed 14 months ago by osa1

Confirmed that the transformation above is the problem. With this transformation the unarised expression:

Main.$w$j
    GHC.Prim.void#
    us_gcvd
    us_gcve
    us_gcvf
    us_gcvg
    us_gcvh
    1#
    us_gcvj;

becomes

Main.$w$j
    GHC.Prim.void#
    us_gcvd
    us_gcve
    us_gcvf
    us_gcvg
    us_gcvh
    us_gcvi
    us_gcvj
    us_gcvk
    us_gcvl;

We now pass more arguments to $w$j!

I had spent some time debugging this in the assembly level and found out that stg_ap_n is trying to apply a non-pointer to a constructor. If we're passing more arguments to function or a constructor than this kind of error makes sense.

comment:15 Changed 14 months ago by osa1

I hacked the Id printer to print dead binders with their types. It turns out the dead-but-used binder is this:

wild2_scuI [Occ=Dead]
    :: (# GHC.Maybe.Maybe GHC.Types.Any | Packed.Bytes.Parser.Bytes# #)

This kind of variables (dead case binders with unboxed sum/tuple types) are not supposed to be used! We even have a comment about this in UnariseStg:

unariseExpr rho (StgCase scrut bndr alt_ty alts)
  ...
  -- general case
  | otherwise
  = do scrut' <- unariseExpr rho scrut
       alts'  <- unariseAlts rho alt_ty bndr alts
       return (StgCase scrut' bndr alt_ty alts')
                       -- bndr may have a unboxed sum/tuple type but it will be
                       -- dead after unarise (checked in StgLint)

and we actually check this in StgLint:

lintStgExpr (StgCase scrut bndr alts_type alts) = do
    ...
             -- Case binders of unboxed tuple or unboxed sum type always dead
             -- after the unariser has run.
             -- See Note [Post-unarisation invariants].
             MultiValAlt _ -> not (lf_unarised lf)
    ...

I don't understand yet why this program passes StgLint.

comment:16 Changed 14 months ago by osa1

Aha! The problem is StgCse's term equality check does not work on unboxed sums. Are these two equal? (and a binder to one could be substituted for the other?)

  • (# 1# | #) :: (# Int# | Bool #)
  • (# 1# | #) :: (# Int# | Int# #)

Of course not, becuse first one unarises to (# 1#, 1#, absentSumField #) while the second one unarises to (# 1#, 1# #).

Not sure how to compare unboxed sums for equality in StgCse as we don't have enough type information at that point. However, it seems to me that CSE on unboxed sums is useless as unboxed sums are not allocated (so CSE doesn't buy us anything). Maybe we should just not do CSE on unboxed sum (and maybe even tuple) terms.

comment:17 Changed 14 months ago by simonpj

Good sleuthing!

What about doing StgCse after unarising? Then the difference would be obvious!

We do CSE in STG even though we've done it already in Core, because we can sometimes common-up things that have the same representation in STG even though they have different types in Core. So it's possible that as well as prevent bogus CSE in the case you describe, you might get extra CSE in some other case. But I'm not close enough to StgCSE to think of an example.

comment:18 Changed 14 months ago by osa1

What about doing StgCse after unarising?

I think this is a good idea, but it reveals another bug in StgCse. Basically StgCse reverts unarise by using case binders when scrutinee is a unboxed tuple, e.g.

case <unboxed_tuple> of bndr { <alts> }

Unarise eliminates all occurances of bndr in alts, but StgCse does the opposite, e.g. replaces (#,#) foo bar with bndr. This is caught by StgLint which is good.

Patch incoming.

comment:19 Changed 14 months ago by osa1

Basically StgCse reverts unarise by using case binders when scrutinee is a unboxed tuple

I should add that this is only a bug when we run StgCse after unarise. Before unarise unboxed tuple terms can be replaced by case binders.

comment:20 Changed 14 months ago by osa1

Differential Rev(s): Phab:D4962
Status: newpatch

comment:21 Changed 14 months ago by bgamari

Status: patchmerge

comment:22 Changed 14 months ago by andrewthad

When the next GHC alpha (or beta or release candidate) comes out, I'll give this a try.

comment:23 Changed 14 months ago by Ben Gamari <ben@…>

In 3c311e50/ghc:

Run StgCse after unarise, fixes #15300

Given two unboxed sum terms:

    (# 1 | #) :: (# Int | Int# #)
    (# 1 | #) :: (# Int | Int  #)

These two terms are not equal as they unarise to different unboxed
tuples. However StgCse was thinking that these are equal, and replacing
one of these with a binder to the other.

To not deal with unboxed sums in StgCse we now do it after unarise. For
StgCse to maintain post-unarise invariants we factor-out case binder
in-scopeness check to `stgCaseBndrInScope` and use it in StgCse.

Also did some refactoring in SimplStg.

Another way to fix this would be adding a special case in StgCse to not
bring unboxed sum binders in scope:

    diff --git a/compiler/simplStg/StgCse.hs
b/compiler/simplStg/StgCse.hs
    index 6c740ca4cb..93a0f8f6ad 100644
    --- a/compiler/simplStg/StgCse.hs
    +++ b/compiler/simplStg/StgCse.hs
    @@ -332,7 +332,11 @@ stgCseExpr env (StgLetNoEscape binds body)
     stgCseAlt :: CseEnv -> OutId -> InStgAlt -> OutStgAlt
     stgCseAlt env case_bndr (DataAlt dataCon, args, rhs)
         = let (env1, args') = substBndrs env args
    -          env2 = addDataCon case_bndr dataCon (map StgVarArg
args') env1
    +          env2
    +            | isUnboxedSumCon dataCon
    +            = env1
    +            | otherwise
    +            = addDataCon case_bndr dataCon (map StgVarArg args')
env1
                 -- see note [Case 2: CSEing case binders]
               rhs' = stgCseExpr env2 rhs
           in (DataAlt dataCon, args', rhs')

I think this patch seems better in that it doesn't add a special case to
StgCse.

Test Plan:
Validate.

I tried to come up with a minimal example but failed. I thought a simple
program like

    data T = T (# Int | Int #) (# Int# | Int #)

    case T (# 1 | #) (# 1 | #) of ...

should be enough to trigger this bug, but for some reason StgCse
doesn't do
anything on this program.

Reviewers: simonpj, bgamari

Reviewed By: simonpj

Subscribers: rwbarton, thomie, carter

GHC Trac Issues: #15300

Differential Revision: https://phabricator.haskell.org/D4962

comment:24 Changed 14 months ago by bgamari

Resolution: fixed
Status: mergeclosed

comment:25 Changed 11 months ago by Simon Peyton Jones <simonpj@…>

In 232b0cb3/ghc:

Add comments for StgCse and Unarise

This just improves docmentation for the fix to Trac #15300
Note: See TracTickets for help on using tickets.