Opened 4 years ago

Last modified 14 months ago

#11284 closed bug

Lambda-lifting fails in simple Text example — at Version 5

Reported by: bgamari Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.10.3
Keywords: Cc: sgraf
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by bgamari)

Consider the example (which uses Text; I'm working on finding a more minimal example),

import Data.Char (isSpace)
import Data.List (foldl')
import GHC.Exts (build)
import qualified  Data.Text as T

longestWord :: T.Text -> Int
longestWord t = foldl' max 0 $ map T.length $ fusedWords t

fusedWords :: T.Text -> [T.Text]
fusedWords t0 = build $ \cons nil ->
  let go !t
        | T.null t  = nil
        | otherwise = let (w, rest) = T.span (not . isSpace) t
                      in cons w (go $ T.dropWhile isSpace rest)
  in go t0

-- For reference
data Text = Text
    {-# UNPACK #-} !A.Array          -- payload (Word16 elements)
    {-# UNPACK #-} !Int              -- offset (units of Word16, not Char)
    {-# UNPACK #-} !Int              -- length (units of Word16, not Char)

longestWord here produces the simplified Core`,

Rec {
Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int#
Ticket.$wgo1 =
  \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) ->
    case w_s4GJ of _ {
      [] -> ww_s4GN;
      : y_a4vC ys_a4vD ->
        case y_a4vC
        of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR ->
        let {
          a_a4jO :: GHC.Prim.Int#
          a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in
        letrec {
          -- Why must you allocate? For the love of all that is good, why?
          -- This loop is just `T.length`, the first argument being the 
          -- length accumulator and the second being an index into the
          -- ByteArray#
          $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
          $wloop_length_s4GI =
            \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) ->
              case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check
              of _ {
                False -> {
                  ...
                  -- in this body there are few cases analyses with
                  -- recursive calls of the form
                  $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1)
                  ...
                True -> ww1_s4Gz
              }; } in
        case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT ->
        case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH)
        of _ {
          False -> Ticket.$wgo1 ys_a4vD ww_s4GN;
          True -> Ticket.$wgo1 ys_a4vD ww1_s4GH
        }
        }
        }
    }
end Rec }

longestWord :: T.Text -> Int
longestWord =
  \ (w_s4GT :: T.Text) ->
    case w_s4GT
    of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY ->
    case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0
    of ww4_s4H2 { __DEFAULT ->
    GHC.Types.I# ww4_s4H2
    }
    }

Notice $wloop_length_s4GI: It should be a nice tight loop counting Unicode characters in the array dt_a4jP until it finds the end of the Text. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!

Change History (5)

comment:1 Changed 4 years ago by bgamari

Description: modified (diff)
Type of failure: None/UnknownRuntime performance bug

comment:2 Changed 4 years ago by bgamari

Description: modified (diff)

comment:3 Changed 4 years ago by bgamari

Description: modified (diff)

comment:4 Changed 4 years ago by bgamari

Description: modified (diff)

comment:5 Changed 4 years ago by bgamari

Description: modified (diff)
Note: See TracTickets for help on using tickets.