Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#10565 closed bug (wontfix)

GHC 7.10.2 RC: the impossible happened on hPDB-examples-1.2.0.2

Reported by: snoyberg Owned by: bgamari
Priority: normal Milestone: 7.10.2
Component: Compiler Version: 7.10.2-rc1
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

When compiling with the RC for Stackage, I get:

[1 of 1] Compiling Main             ( examples/Rg.hs, dist/build/Rg/Rg-tmp/Main.o )

<no location info>:
    ghc: panic! (the 'impossible' happened)
  (GHC version 7.10.1.20150619 for x86_64-unknown-linux):
        Simplifier ticks exhausted
  When trying UnfoldingDone bindIO
  To increase the limit, use -fsimpl-tick-factor=N (default 100)
  If you need to do this, let GHC HQ know, and what factor you needed
  To see detailed counts use -ddump-simpl-stats
  Total ticks: 19573

Change History (5)

comment:1 Changed 4 years ago by sopvop

Probably duplicate of either #10491 or #10527

comment:2 Changed 4 years ago by bgamari

Owner: set to bgamari

Thankfully this doesn't appear to be related to #10527 (which is quite a hard nut to crack). Instead this is due to the fact that guessElement is marked INLINE coupled with its rather suboptimial implementation. To see why this is problematic, we have to look at how this function is desugared. The pattern match is using OverloadedStrings, which desugars this,

f :: ByteString -> String
f "foo" = 1
f "bar" = 2
f _ = 3

Roughly into this,

f :: ByteString -> String
f x | x == (fromString "foo" :: ByteString) = 1
f x | x == (fromString "bar" :: ByteString) = 2
f _ = 3

Note that we are using using ByteString's Eq instance here to check the pattern match. These guards are then lowered to nested case analyses in the desugared Core.

It is with this Core that we enter the Core-to-Core optimization pipeline. ByteString's Eq instance looks like this,

instance Eq ByteString where
    (==) = eq

eq :: ByteString -> ByteString -> Bool
eq a@(PS fp off len) b@(PS fp' off' len')
  | len /= len'              = False    -- short cut on length
  | fp == fp' && off == off' = True     -- short cut for the same string
  | otherwise                = compareBytes a b == EQ
{-# INLINE eq #-}

compareBytes :: ByteString -> ByteString -> Ordering
compareBytes = ...

Despite (==) not having an explicit INLINE pragma, GHC is still quite eager to inline in. Consequently our chain of equality comparisons in f quickly explodes into a large ball of deeply nested cases.

To make matters a bit worse, the literal of each alternative is itself a fair amount of code. e.g. the "OG1" literal is floated out into a top-level binding looking like this,

lvl10_r5ix :: BS.ByteString
[GblId, Str=DmdType]
lvl10_r5ix =
  case newMutVar# @ Finalizers @ RealWorld NoFinalizers realWorld#
  of _ [Occ=Dead] { (# ipv_a3qk, ipv1_a3ql #) ->
  let {
    s_a263 :: Addr#
    [LclId, Str=DmdType]
    s_a263 = "OG1"# } in
  case {__pkg_ccall bytestring-0.10.6.0 strlen Addr#
                                        -> State# RealWorld -> (# State# RealWorld, Word# #)}_a3rx
         s_a263 ipv_a3qk
  of _ [Occ=Dead] { (# ds3_a3rC, ds4_a3rD #) ->
  Data.ByteString.Internal.PS
    s_a263 (PlainForeignPtr ipv1_a3ql) 0# (word2Int# ds4_a3rD)
  }
  }

Resolution

Above we saw that the original code has a few issues,

  1. ByteString's sizeable (==) implementation is inlined once for each alternative, resulting in a large quantity of code for f
  2. The pattern match desugars to a linear search where some more efficient strategy would be desired

Let's first look at how we might fix (1), which causes the simplifier blow-up noted in this bug. After this we can move on to improving the asymptotic issue, (2).

Note: To some extent, the pattern matching behavior exposed by OverloadedStrings is a bit dangerous as it emulates pattern matching, something that most Haskellers regard as "cheap" (up to the cost of forcing the scrutinee), with a call to an arbitrarily complex function.

Issue (1) arises from the desugaring of the OverloadedStrings pattern match, which produces a new test for every alternative. GHC will then happily inline (==) in to each of these despite the fact that there is no benefit to doing so. Ideally we would want to make it obvious to GHC that the comparison should be shared. One way to accomplish this would be to encode the alternatives as an associated list and use lookup,

guessElement :: BS.ByteString -> String
guessElement = \e -> fromMaybe "" $ lookup e els
  where
    els :: [(ByteString, String)]
    els =
        [ ("C"   , "C"),
          ("C1'" , "C"),
          ("C2"  , "C"),
          ...
        ]

Here we ensure that there is exactly one use of (==), preventing the explosion of code.

While we are looking at optimizations we might also consider that ByteString is generally best used with large strings. In the case of small comparisons like this it might be worthwhile to avoid using ByteString's Eq altogether and instead compare Strings,

guessElement :: BS.ByteString -> String
guessElement = \e -> fromMaybe "" $ lookup (BS.unpack e) els
  where
    els :: [(String, String)]
    els = [ ... ]

However, both of these avoid solving the issue of linear search. For this we can turn to any number of finite map data structures. For instance, we know that ByteString has an Ord instance so we can employ Data.Map from containers,

guessElement :: BS.ByteString -> String
guessElement = \e -> fromMaybe "" $ M.lookup e els
  where
    els :: M.Map BS.ByteString String
    els = M.fromList [ ... ]

Of course, it also has a Hashable instance so we can also try Data.HashMap from unordered-containers. I've prepared a small benchmark (https://github.com/bgamari/ghc-T10565) comparing these options. The results (mean time per iteration, standard deviation, and the size of the tidy core in terms) from a run compiled with -O on my Core i5 laptop can be found below.

Implementation *Core size* Time per iteration (μs)
Original pattern match 5071 24.5 ± 1.0
Association list
lookup on ByteString 1818 52.8 ± 1.3
lookup on String 719 55.9 ± 1.3
Association list (inlined)
lookup on ByteString 1925 31.5 ± 1.3
lookup on String 775 39.3 ± 1.3
Ordered map
lookup on ByteString 2449 7.7 ± 0.5
lookup on String 1202 10.5 ± 0.3
Unordered map
lookup on ByteString 3731 6.8 ± 0.2
lookup on String 1669 5.0 ± 0.1

We see that while using lookup produces a substantially smaller program, it performs substantially worse than the original pattern match. Looking at the Core we see that it is not inlined at all. If we take the implementation of lookup and plop it in the module we find that things are a bit better, but still not back to where we started.

When we switch to a map however, we find that things improve significantly. The difference between unordered map and ordered map is likely just a function of how much work has gone into optimizing the respective implementations.

Improving GHC?

In an ideal world GHC would be able to look at the original code and independently determine something like we determined above. The key insight that we had here is that our ByteString type has structure beyond the Eq used by OverloadedString's pattern matching behavior. Can we teach GHC to have this same insight?

It would be possible for the compiler to construct a similar lookup data structure for pattern matches on IsString types also having an Ord instance. The trouble with this is that the operational behavior of the program is now dependent upon which instances are in scope. This seems like a very undesirable property.

One could instead add a compare' :: a -> a -> Ordering to IsString, allowing GHC to generate efficient logarithmic lookups without a dependence on which instances are available.

The above option, however, prevents us from using the best tool in our arsenault, the HashMap. Another more general possibility would be to add function to IsString to explicitly handle matching against groups of patterns. The compiler would collect the pattern matches which can be simultaneously resolved and pass them, along with the scrutinee, to a match function. This might look like this,

class IsString a where
    fromString :: String -> a
    toString :: a -> String
    match :: [(a, b)] -> a -> b

This, however, carries with it a number of issues. Perhaps most concerning is the fact that now a library author has the ability to break pattern matching semantics (consider, for instance, match alts _ = last alts). Moreover, all of these have unfortunate compatibility issues.

Finally, these options all interact very poorly with more complex pattern matches. Take for instance,

f :: String -> Int -> Result
f "a" _ = resultA
f "b" _ = resultB
f _   3 = resultC
f "c" _ = resultD
f "d" 4 = resultE
f "d" _ = resultF
f _   _ = resultG

It would be rather difficult to define matching semantics which took advantage of the match function introduced above yet treated cases like this (or even simpler cases). Consequently, in many cases the user would still need to fall back to the current approach of manually implementing their desired match behavior.

Ultimately, these are decisions that should arguably be left to the user anyways. Data structure choice will be highly specific to the structure of the alternatives and values being scrutinized. The compiler is not in a position to be able to wisely choose from the numerous options.

Last edited 4 years ago by bgamari (previous) (diff)

comment:3 Changed 4 years ago by bgamari

I should note that the original testcase (found here https://github.com/bgamari/ghc-T10565/blob/master/Rg.hs) compiles with 7.10.1 yet not with 7.10.2. It does compile with -fsimpl-tick-factor=150 which lead me to believe that it's perhaps not related to #10527, which is a much more severe blow-up. It also compiles if one uses any of the other guessElement implementations I listed above.

I do wonder what changed between 7.10.1 and 7.10.2, however.

Last edited 4 years ago by bgamari (previous) (diff)

comment:4 Changed 4 years ago by bgamari

Resolution: wontfix
Status: newclosed

As described above I'm not convinced that we can do much better on the code as-written. I'll send a patch suggesting an alternate implementation upstream.

comment:5 Changed 3 years ago by MichalGajda

It would be indeed nice, if GHC could factorize pattern matching to be logarithmic in all cases. That would greatly improve performance in case of integer range lookup, and so for much of the token comparison codes.

Of course clever people already use hashes and atoms, but ability to program at the highest possible level will always bring numerous fans to Haskell.

Note: See TracTickets for help on using tickets.