#9355 closed bug (fixed)
scanr does not participate in stream fusion
Reported by: | dfeuer | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.10.1 |
Component: | libraries/base | Version: | 7.8.3 |
Keywords: | Cc: | hvr, ekmett | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Runtime performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description
This should be a no-brainer—it's trivial two write scanr
as a foldr, making it a good consumer. Wrapping it up in build
makes it a good producer too, but may require back-and-forth RULES.
Attachments (1)
Change History (10)
comment:2 follow-ups: 3 5 Changed 5 years ago by
(because the argument to build isn't allowed to inspect its own result as the implementation in Data.List does),
I wouldn’t be surprised that returning (# x, x:xs #)
is faster than returning x:xs
and pattern matching on it. OTOH, the CPR optimization should change the code returning x:xs
into (# x, xs #) and move the consing to the caller.
Can’t you do that by hand, i.e.:
scanrB :: forall a b . (a -> b -> b) -> b -> [a] -> [b] scanrB f q0 ls = build scr where scr :: forall c . (b -> c -> c) -> c -> c scr c n = case foldr go (q0, n) ls of (r, esult) -> c r esult where go x (r,est) = (f x r, r `c` est)
The Core looks a bit nicer:
Scanr.scanrA :: forall a b. (a -> b -> b) -> b -> [a] -> [b] Scanr.scanrA = \ (@ a) (@ b) (f :: a -> b -> b) (q0 :: b) (ls :: [a]) -> let { a :: [b] a = GHC.Types.: q0 (GHC.Types.[]) } in letrec { $wgo :: [a] -> (# b, [b] #) $wgo = \ (w :: [a]) -> case w of _ { [] -> (# q0, a #); : y ys -> case $wgo ys of _ { (# ww1, ww2 #) -> let { fxr :: b fxr = f y ww1 } in (# fxr, GHC.Types.: fxr ww2 #) } }; } in case $wgo ls of _ { (# _, ww2 #) -> ww2 } Scanr.scanrB :: forall a b. (a -> b -> b) -> b -> [a] -> [b] Scanr.scanrB = \ (@ a) (@ b) (f :: a -> b -> b) (q0 :: b) (ls :: [a]) -> letrec { $wgo :: [a] -> (# b, [b] #) $wgo = \ (w :: [a]) -> case w of _ { [] -> (# q0, GHC.Types.[] #); : y ys -> case $wgo ys of _ { (# ww1, ww2 #) -> (# f y ww1, GHC.Types.: ww1 ww2 #) } }; } in case $wgo ls of _ { (# ww1, ww2 #) -> GHC.Types.: ww1 ww2 }
But I don’t expect there to be a measurable difference (and I didn’t check):
Some extra rules may be needed for map, et al.
not sure what you mean by that?
comment:3 Changed 5 years ago by
Replying to nomeata:
(because the argument to build isn't allowed to inspect its own result as the implementation in Data.List does),
I wouldn’t be surprised that returning
(# x, x:xs #)
is faster than returningx:xs
and pattern matching on it. OTOH, the CPR optimization should change the code returningx:xs
into (# x, xs #) and move the consing to the caller.Can’t you do that by hand, i.e.:
scanrB :: forall a b . (a -> b -> b) -> b -> [a] -> [b] scanrB f q0 ls = build scr where scr :: forall c . (b -> c -> c) -> c -> c scr c n = case foldr go (q0, n) ls of (r, esult) -> c r esult where go x (r,est) = (f x r, r `c` est)The Core looks a bit nicer:
Scanr.scanrA :: forall a b. (a -> b -> b) -> b -> [a] -> [b] Scanr.scanrA = \ (@ a) (@ b) (f :: a -> b -> b) (q0 :: b) (ls :: [a]) -> let { a :: [b] a = GHC.Types.: q0 (GHC.Types.[]) } in letrec { $wgo :: [a] -> (# b, [b] #) $wgo = \ (w :: [a]) -> case w of _ { [] -> (# q0, a #); : y ys -> case $wgo ys of _ { (# ww1, ww2 #) -> let { fxr :: b fxr = f y ww1 } in (# fxr, GHC.Types.: fxr ww2 #) } }; } in case $wgo ls of _ { (# _, ww2 #) -> ww2 } Scanr.scanrB :: forall a b. (a -> b -> b) -> b -> [a] -> [b] Scanr.scanrB = \ (@ a) (@ b) (f :: a -> b -> b) (q0 :: b) (ls :: [a]) -> letrec { $wgo :: [a] -> (# b, [b] #) $wgo = \ (w :: [a]) -> case w of _ { [] -> (# q0, GHC.Types.[] #); : y ys -> case $wgo ys of _ { (# ww1, ww2 #) -> (# f y ww1, GHC.Types.: ww1 ww2 #) } }; } in case $wgo ls of _ { (# ww1, ww2 #) -> GHC.Types.: ww1 ww2 }But I don’t expect there to be a measurable difference (and I didn’t check):
Some extra rules may be needed for map, et al.
not sure what you mean by that?
Sorry, I've been a tad confused, as usual. The problem, I now see, is the bunch of thunks we produce. I don't think it's possible to make scanr
a good producer, and I'm not even sure we can make it a significantly better one (though I'd have to profile to be sure). What we can do, in an apparently incompatible fashion, is make it a good consumer. Start with the current tail-eating implementation:
scanr _ q0 [] = [q0] scanr f qo (x:xs) = f x q : qs where qs@(q:_) = scanr f q0 xs
Rewriting that pattern match thing more explicitly:
scanr f q0 (x:xs) = let qs = scanr f q0 xs in f x (head qs) : qs
This, then, is a plain old fold:
scanr f q0 = foldr go [q0] xs where go x qs = f x (head qs) : qs
If we apply it to a build
, we get
scanr f q0 (build g) = g go [q0] where go x qs = f x (head qs):qs
comment:4 Changed 5 years ago by
Since foldr
is strict in its list argument, I think it's then safe to go back to the pattern match style:
scanr f q0 = foldr go [q0] xs where go x qs@(q:_) = f x q : qs
EDIT: No, this is not good. It requires the whole list to be traversed before producing anything, even if f
is lazy. The match against qs
must be a lazy one, as it is in the code using head
.
comment:5 Changed 5 years ago by
Replying to nomeata:
(because the argument to build isn't allowed to inspect its own result as the implementation in Data.List does),
I wouldn’t be surprised that returning
(# x, x:xs #)
is faster than returningx:xs
and pattern matching on it. OTOH, the CPR optimization should change the code returningx:xs
into (# x, xs #) and move the consing to the caller.Can’t you do that by hand, i.e.:
SNIP
But I don’t expect there to be a measurable difference (and I didn’t check):
I brilliantly read your code wrong and drew a bad conclusion. This looks much better than I initially thought (Look, ma, no let
! ). I'm going to see how nofib likes it.
Changed 5 years ago by
Attachment: | scanr.diff added |
---|
nomeata's improvement modified for there and back again
comment:6 Changed 5 years ago by
Status: | new → patch |
---|
Unfortunately, nofib
doesn't use scanr
at all (and maybe nobody else does either). That said, the core looks very good in some small fusion tests.
comment:8 Changed 5 years ago by
Resolution: | → fixed |
---|---|
Status: | patch → closed |
comment:9 Changed 5 years ago by
Milestone: | 7.8.4 → 7.10.1 |
---|
OK, so it didn't turn out to be quite as trivial as I thought (because the argument to
build
isn't allowed to inspect its own result as the implementation in Data.List does), but it looks like this does the trick:Fortunately, the tuples get unboxed as they should. Some extra rules may be needed for map, et al.