Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#9369 closed bug (fixed)

Data.List.unfoldr does not fuse and is not inlined.

Reported by: dfeuer Owned by:
Priority: normal Milestone: 7.10.1
Component: libraries/base Version: 7.8.3
Keywords: Cc: hvr, ekmett
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D198
Wiki Page:

Description

Data.List.unfoldr is not a good producer for foldr/build fusion, and it's not wrapped to enable inlining. I don't know how often people explicitly fold over an unfold, but this of course also affects map and filter. The inlining issue is also serious: inlining unfoldr can often allow the Maybe to be erased altogether. I'm not sure this fix is perfect, but it seems a lot better than the current situation:

import GHC.Exts (build)

{-# NOINLINE [1] unfoldr #-}
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
unfoldr f b = go b
  where
    go b = case f b of
             Just (a,new_b) -> a : go new_b
             Nothing -> []

{-# INLINE [0] unfoldrB #-}
unfoldrB :: (b -> Maybe (a, b)) -> b -> (a -> c -> c) -> c -> c
unfoldrB f b' c n = go b'
  where
    go b = case f b of
             Just (a,new_b) -> a `c` go new_b
             Nothing        -> n

{-# RULES
"unfoldr"  [~1] forall f b . unfoldr f b = build (unfoldrB f b)
 #-}

As a simple example, consider the code

hello :: Double -> Double -> [Double]
hello x n = map (* 3) $ L.unfoldr f x
  where
    f x | x < n     = Just (x, x**1.2)
        | otherwise = Nothing

With Data.List.unfoldr and the latest bleeding-edge GHC, this produces

hello1
hello1 =
  \ ds_d1ZF -> case ds_d1ZF of _ { D# x_a21W -> D# (*## x_a21W 3.0) }

$whello
$whello =
  \ w_s266 ww_s26a ->
    map
      hello1
      (unfoldr
         (\ x_X1Hx ->
            case x_X1Hx of wild_a20E { D# x1_a20G ->
            case tagToEnum# (<## x1_a20G ww_s26a) of _ {
              False -> Nothing;
              True -> Just (wild_a20E, D# (**## x1_a20G 1.2))
            }
            })
         w_s266)

hello
hello =
  \ w_s266 w1_s267 ->
    case w1_s267 of _ { D# ww1_s26a -> $whello w_s266 ww1_s26a }

Using the above implementation (and renaming the function from hello to bye) yields

$wbye
$wbye =
  \ ww_s25Z ww1_s263 ->
    letrec {
      $wgo_s25U
      $wgo_s25U =
        \ ww2_s25S ->
          case tagToEnum# (<## ww2_s25S ww1_s263) of _ {
            False -> [];
            True -> : (D# (*## ww2_s25S 3.0)) ($wgo_s25U (**## ww2_s25S 1.2))
          }; } in
    $wgo_s25U ww_s25Z

bye
bye =
  \ w_s25V w1_s25W ->
    case w_s25V of _ { D# ww1_s25Z ->
    case w1_s25W of _ { D# ww3_s263 -> $wbye ww1_s25Z ww3_s263 }
    }

I don't think there can be any doubt which is better. Yes, some fine tuning may be needed to make the rules apply in all appropriate cases. I don't understand things like the comment on the definition of map drawing attention to eta expansion.

Attachments (1)

unfoldr.diff (2.1 KB) - added by dfeuer 5 years ago.
Patch to make unfoldr inline and make it a good producer

Download all attachments as: .zip

Change History (12)

comment:1 Changed 5 years ago by schyler

(By the way, I don't think this counts as a 'Runtime performance bug' as much as it should be categorised as a 'Compile-time performance bug' maybe..)

Good work though, +1 from me.

Last edited 5 years ago by schyler (previous) (diff)

comment:2 in reply to:  1 Changed 5 years ago by dfeuer

Replying to schyler:

(By the way, I don't think this counts as a 'Runtime performance bug' as much as it should be categorised as a 'Compile-time performance bug' maybe..)

Good work though, +1 from me.

My interpretation, correct or not, is that compile-time performance bugs are things that lead to GHC itself running slowly.

As long as I'm adding a comment, I'll note that there are functions in the libraries that are very naturally written as unfolds, but that currently are not—perhaps because the current unfold performs so poorly. The ones I've encountered so far are groupBy and various implementations of enumFromTo that have their own custom rewrite rules to get around this. I can't guarantee anything, but it would be nice to simplify those if we can do so without losing speed.

comment:3 Changed 5 years ago by ekmett

FWIW- My intuition follows dfeuer's in this.

Changed 5 years ago by dfeuer

Attachment: unfoldr.diff added

Patch to make unfoldr inline and make it a good producer

comment:4 Changed 5 years ago by dfeuer

Status: newpatch

This took long enough to work through. I decided in the end that we shouldn't use any rules whatsoever for unfoldr, but should instead just INLINE it. A detailed explanation of my reasoning is included in the patch, but it comes down to two things:

  1. Concerns about rules inlining unfoldr too early, and
  2. The importance of inlining unfoldr whether it fuses or not to optimize common uses.

comment:5 Changed 5 years ago by nomeata

Differential Rev(s): Phab:D198

I’m convinced by the reasoning :-)

I put it on phabricator (more practice with this tool for me,and we get a validate run for free).

comment:6 Changed 5 years ago by Joachim Breitner <mail@…>

In 78209d70596dcbfcb11ad1de1c961ab8479e531e/ghc:

INLINE unfoldr

Summary:
to allow GHC to maybe remove the Maybe. See the code comment for more
commentary. This fixes #9369.

Test Plan: see what happens on ghcspeed (once it is merged)

Reviewers: austin

Reviewed By: austin

Subscribers: simonmar, ezyang, carter

Differential Revision: https://phabricator.haskell.org/D198

GHC Trac Issues: #9369

comment:7 Changed 5 years ago by nomeata

Pushed, thanks.

Hmm, looks like I forgot to change the author to David. Sorry! I don’t even know how to do that with arc land. A pitty that either git knowledge or arc becomes less useful.

comment:8 Changed 5 years ago by nomeata

Resolution: fixed
Status: patchclosed

comment:9 Changed 5 years ago by nomeata

Small followup: According to http://ghcspeed-nomeata.rhcloud.com/changes/?rev=78209d70596dcbfcb11ad1de1c961ab8479e531e&exe=2&env=nomeata%27s%20buildbot no significant changes in performance can be observed.

comment:10 in reply to:  9 Changed 5 years ago by dfeuer

Replying to nomeata:

Small followup: According to http://ghcspeed-nomeata.rhcloud.com/changes/?rev=78209d70596dcbfcb11ad1de1c961ab8479e531e&exe=2&env=nomeata%27s%20buildbot no significant changes in performance can be observed.

The reason for this is very simple: not a single one of those benchmarks uses unfoldr anywhere. The only benchmarks in the nofib tree that use it are the fibon ones, which are not run by default:

[dfeuer@lemur nofib]$ find . -name "*hs" -exec grep unfoldr \{\} \; -print

split n xs = unfoldr (g n) xs
./fibon/Hackage/Crypto/Codec/Text/Raw.hs
      blocks = map (fromOctets 256) (unfoldr h $ concat $ unfoldr g s) 
   concat $ unfoldr g s where
./fibon/Hackage/Crypto/Codec/Encryption/Padding.hs
import Data.List(unfoldr)
   unfoldr f $ md5s $ Str $ map (chr . fromIntegral) xs
./fibon/Hackage/Crypto/Data/Digest/MD5.hs
   unfoldr g
   unfoldr g
./fibon/Hackage/Crypto/Data/Digest/SHA1.hs
padding x = unfoldr block $ paddingHelper x 0 (0::Int) (0::Integer)
./fibon/Hackage/Crypto/Data/Digest/SHA2.hs
import Data.List    (unfoldr)
unroll = unfoldr step
./fibon/Hackage/Gf/src/runtime/haskell/Data/Binary.hs

comment:11 Changed 5 years ago by thoughtpolice

Milestone: 7.8.47.10.1
Note: See TracTickets for help on using tickets.