Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#9355 closed bug (fixed)

scanr does not participate in stream fusion

Reported by: dfeuer Owned by:
Priority: normal Milestone: 7.10.1
Component: libraries/base Version: 7.8.3
Keywords: Cc: hvr, ekmett
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

This should be a no-brainer—it's trivial two write scanr as a foldr, making it a good consumer. Wrapping it up in build makes it a good producer too, but may require back-and-forth RULES.

Attachments (1)

scanr.diff (1.3 KB) - added by dfeuer 5 years ago.
nomeata's improvement modified for there and back again

Download all attachments as: .zip

Change History (10)

comment:1 Changed 5 years ago by dfeuer

OK, so it didn't turn out to be quite as trivial as I thought (because the argument to build isn't allowed to inspect its own result as the implementation in Data.List does), but it looks like this does the trick:

scanr :: forall a b . (a -> b -> b) -> b -> [a] -> [b]
scanr f q0 ls = build scr
  where
   scr :: forall c . (b -> c -> c) -> c -> c
   scr c n = snd $ foldr go (q0, q0 `c` n) ls
     where
       go x (r,est) = let fxr = f x r in (fxr, fxr `c` est)

Fortunately, the tuples get unboxed as they should. Some extra rules may be needed for map, et al.

Last edited 5 years ago by dfeuer (previous) (diff)

comment:2 Changed 5 years ago by nomeata

(because the argument to build isn't allowed to inspect its own result as the implementation in Data.List does),

I wouldn’t be surprised that returning (# x, x:xs #) is faster than returning x:xs and pattern matching on it. OTOH, the CPR optimization should change the code returning x:xs into (# x, xs #) and move the consing to the caller.

Can’t you do that by hand, i.e.:

scanrB :: forall a b . (a -> b -> b) -> b -> [a] -> [b]
scanrB f q0 ls = build scr
  where
   scr :: forall c . (b -> c -> c) -> c -> c
   scr c n = case foldr go (q0, n) ls of (r, esult) -> c r esult
     where
       go x (r,est) = (f x r, r `c` est)

The Core looks a bit nicer:

Scanr.scanrA :: forall a b. (a -> b -> b) -> b -> [a] -> [b]
Scanr.scanrA =
  \ (@ a) (@ b) (f :: a -> b -> b) (q0 :: b) (ls :: [a]) ->
    let {
      a :: [b]
      a = GHC.Types.: q0 (GHC.Types.[]) } in
    letrec {
      $wgo :: [a] -> (# b, [b] #)
      $wgo =
        \ (w :: [a]) ->
          case w of _ {
            [] -> (# q0, a #);
            : y ys ->
              case $wgo ys of _ { (# ww1, ww2 #) ->
              let {
                fxr :: b
                fxr = f y ww1 } in
              (# fxr, GHC.Types.: fxr ww2 #)
              }
          }; } in
    case $wgo ls of _ { (# _, ww2 #) -> ww2 }

Scanr.scanrB :: forall a b. (a -> b -> b) -> b -> [a] -> [b]
Scanr.scanrB =
  \ (@ a) (@ b) (f :: a -> b -> b) (q0 :: b) (ls :: [a]) ->
    letrec {
      $wgo :: [a] -> (# b, [b] #)
      $wgo =
        \ (w :: [a]) ->
          case w of _ {
            [] -> (# q0, GHC.Types.[] #);
            : y ys ->
              case $wgo ys of _ { (# ww1, ww2 #) ->
              (# f y ww1, GHC.Types.: ww1 ww2 #)
              }
          }; } in
    case $wgo ls of _ { (# ww1, ww2 #) -> GHC.Types.: ww1 ww2 }

But I don’t expect there to be a measurable difference (and I didn’t check):

Some extra rules may be needed for map, et al.

not sure what you mean by that?

comment:3 in reply to:  2 Changed 5 years ago by dfeuer

Replying to nomeata:

(because the argument to build isn't allowed to inspect its own result as the implementation in Data.List does),

I wouldn’t be surprised that returning (# x, x:xs #) is faster than returning x:xs and pattern matching on it. OTOH, the CPR optimization should change the code returning x:xs into (# x, xs #) and move the consing to the caller.

Can’t you do that by hand, i.e.:

scanrB :: forall a b . (a -> b -> b) -> b -> [a] -> [b]
scanrB f q0 ls = build scr
  where
   scr :: forall c . (b -> c -> c) -> c -> c
   scr c n = case foldr go (q0, n) ls of (r, esult) -> c r esult
     where
       go x (r,est) = (f x r, r `c` est)

The Core looks a bit nicer:

Scanr.scanrA :: forall a b. (a -> b -> b) -> b -> [a] -> [b]
Scanr.scanrA =
  \ (@ a) (@ b) (f :: a -> b -> b) (q0 :: b) (ls :: [a]) ->
    let {
      a :: [b]
      a = GHC.Types.: q0 (GHC.Types.[]) } in
    letrec {
      $wgo :: [a] -> (# b, [b] #)
      $wgo =
        \ (w :: [a]) ->
          case w of _ {
            [] -> (# q0, a #);
            : y ys ->
              case $wgo ys of _ { (# ww1, ww2 #) ->
              let {
                fxr :: b
                fxr = f y ww1 } in
              (# fxr, GHC.Types.: fxr ww2 #)
              }
          }; } in
    case $wgo ls of _ { (# _, ww2 #) -> ww2 }

Scanr.scanrB :: forall a b. (a -> b -> b) -> b -> [a] -> [b]
Scanr.scanrB =
  \ (@ a) (@ b) (f :: a -> b -> b) (q0 :: b) (ls :: [a]) ->
    letrec {
      $wgo :: [a] -> (# b, [b] #)
      $wgo =
        \ (w :: [a]) ->
          case w of _ {
            [] -> (# q0, GHC.Types.[] #);
            : y ys ->
              case $wgo ys of _ { (# ww1, ww2 #) ->
              (# f y ww1, GHC.Types.: ww1 ww2 #)
              }
          }; } in
    case $wgo ls of _ { (# ww1, ww2 #) -> GHC.Types.: ww1 ww2 }

But I don’t expect there to be a measurable difference (and I didn’t check):

Some extra rules may be needed for map, et al.

not sure what you mean by that?

Sorry, I've been a tad confused, as usual. The problem, I now see, is the bunch of thunks we produce. I don't think it's possible to make scanr a good producer, and I'm not even sure we can make it a significantly better one (though I'd have to profile to be sure). What we can do, in an apparently incompatible fashion, is make it a good consumer. Start with the current tail-eating implementation:

scanr _ q0 [] = [q0]
scanr f qo (x:xs) = f x q : qs
  where
    qs@(q:_) = scanr f q0 xs

Rewriting that pattern match thing more explicitly:

scanr f q0 (x:xs) = let qs = scanr f q0 xs
                    in f x (head qs) : qs

This, then, is a plain old fold:

scanr f q0 = foldr go [q0] xs
  where
    go x qs = f x (head qs) : qs

If we apply it to a build, we get

scanr f q0 (build g) = g go [q0]
  where go x qs = f x (head qs):qs
Last edited 5 years ago by dfeuer (previous) (diff)

comment:4 Changed 5 years ago by dfeuer

Since foldr is strict in its list argument, I think it's then safe to go back to the pattern match style:

scanr f q0 = foldr go [q0] xs
  where
    go x qs@(q:_) = f x q : qs

EDIT: No, this is not good. It requires the whole list to be traversed before producing anything, even if f is lazy. The match against qs must be a lazy one, as it is in the code using head.

Last edited 5 years ago by dfeuer (previous) (diff)

comment:5 in reply to:  2 Changed 5 years ago by dfeuer

Replying to nomeata:

(because the argument to build isn't allowed to inspect its own result as the implementation in Data.List does),

I wouldn’t be surprised that returning (# x, x:xs #) is faster than returning x:xs and pattern matching on it. OTOH, the CPR optimization should change the code returning x:xs into (# x, xs #) and move the consing to the caller.

Can’t you do that by hand, i.e.:

SNIP

But I don’t expect there to be a measurable difference (and I didn’t check):

I brilliantly read your code wrong and drew a bad conclusion. This looks much better than I initially thought (Look, ma, no let! ). I'm going to see how nofib likes it.

Changed 5 years ago by dfeuer

Attachment: scanr.diff added

nomeata's improvement modified for there and back again

comment:6 Changed 5 years ago by dfeuer

Status: newpatch

Unfortunately, nofib doesn't use scanr at all (and maybe nobody else does either). That said, the core looks very good in some small fusion tests.

comment:7 Changed 5 years ago by Joachim Breitner <mail@…>

In 4e1dfc3767167dddd0e151a2df8305b12aa0f49c/ghc:

Make scanr a good producer and consumer

This fixes #9355.

comment:8 Changed 5 years ago by nomeata

Resolution: fixed
Status: patchclosed

comment:9 Changed 5 years ago by thoughtpolice

Milestone: 7.8.47.10.1
Note: See TracTickets for help on using tickets.