Opened 2 years ago

Last modified 23 months ago

#13966 new feature request

Skip-less stream fusion: a missed opportunity

Reported by: jmspiewak Owned by: mpickering
Priority: normal Milestone:
Component: Compiler Version: 8.2.1-rc3
Keywords: JoinPoints, StaticArgumentTransformation Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #14067 #14068 Differential Rev(s):
Wiki Page:

Description (last modified by jmspiewak)

A simple stream chain

chain :: Int -> Int
chain = sum . filter even . enumFromTo 1

doesn't fuse under a Skip-less stream on GHC 8.2-rc3 -O2.

Benchmarked against a Skip stream (LLVM backend):

benchmarking Skip-less
time                 248.9 ms   (243.3 ms .. 257.3 ms)
                     0.998 R²   (0.995 R² .. 0.999 R²)
mean                 250.9 ms   (248.1 ms .. 254.7 ms)
std dev              5.985 ms   (4.831 ms .. 7.311 ms)

benchmarking Skip
time                 61.26 ms   (60.39 ms .. 62.44 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 62.45 ms   (61.96 ms .. 62.91 ms)
std dev              1.387 ms   (1.190 ms .. 1.669 ms)

Relevant core (chain1 is Skip-less, chain2 has Skip):

-- RHS size: {terms: 51, types: 27, coercions: 0, joins: 1/2}
Main.$wchain1 [InlPrag=NOINLINE] :: Int# -> Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>]
Main.$wchain1
  = \ (ww_s9ep :: Int#) ->
      letrec {
        $wloop_s9ea [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int
        [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []]
        $wloop_s9ea
          = \ (ww1_s9e8 :: Int#) ->
              case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of {
                False ->
                  case remInt# ww1_s9e8 2# of {
                    __DEFAULT -> $wloop_s9ea (+# ww1_s9e8 1#);
                    0# ->
                      Main.Yield1
                        @ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#)) (GHC.Types.I# ww1_s9e8)
                  };
                True -> Main.Done1 @ Int @ Int
              }; } in
      joinrec {
        $wloop1_s9el [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
        [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []]
        $wloop1_s9el (ww1_s9ef :: Int#) (ww2_s9ej :: Int#)
          = case $wloop_s9ea ww2_s9ej of {
              Done1 -> ww1_s9ef;
              Yield1 s'_a497 x_a498 ->
                case x_a498 of { GHC.Types.I# y_a66i ->
                case s'_a497 of { GHC.Types.I# ww4_X9hA ->
                jump $wloop1_s9el (+# ww1_s9ef y_a66i) ww4_X9hA
                }
                }
            }; } in
      jump $wloop1_s9el 0# 1#


-- RHS size: {terms: 33, types: 9, coercions: 0, joins: 1/1}
Main.$wchain2 [InlPrag=NOINLINE] :: Int# -> Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>]
Main.$wchain2
  = \ (ww_s9dZ :: Int#) ->
      joinrec {
        $wloop_s9dV [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
        [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []]
        $wloop_s9dV (ww1_s9dP :: Int#) (ww2_s9dT :: Int#)
          = case tagToEnum# @ Bool (># ww2_s9dT ww_s9dZ) of {
              False ->
                case remInt# ww2_s9dT 2# of {
                  __DEFAULT -> jump $wloop_s9dV ww1_s9dP (+# ww2_s9dT 1#);
                  0# -> jump $wloop_s9dV (+# ww1_s9dP ww2_s9dT) (+# ww2_s9dT 1#)
                };
              True -> ww1_s9dP
            }; } in
      jump $wloop_s9dV 0# 1#

The code was adapted from M. Snoyman's blog post "Iterators and Streams in Rust and Haskell".

Attachments (2)

Main.hs (1.9 KB) - added by jmspiewak 2 years ago.
Criterion benchmark
Main.2.hs (3.6 KB) - added by jmspiewak 2 years ago.
Added typeclass-based stream

Download all attachments as: .zip

Change History (21)

comment:1 Changed 2 years ago by jmspiewak

Description: modified (diff)
Keywords: JoinPoints added
Priority: lownormal
Summary: Missed optimization - loop fusionSkip-less stream fusion

Changed 2 years ago by jmspiewak

Attachment: Main.hs added

Criterion benchmark

comment:2 Changed 2 years ago by jmspiewak

A typeclass-based Skip-less stream (also from the blog post) does fuse.

data Step3 s = Done3 | Yield3 s (Item3 s)

class Stream3 s where
  type Item3 s
  next3 :: s -> Step3 s


data EnumFromTo3 a = EnumFromTo3 a a

instance (Ord a, Num a) => Stream3 (EnumFromTo3 a) where
  type Item3 (EnumFromTo3 a) = a
  next3 (EnumFromTo3 i high)
    | i > high  = Done3
    | otherwise = Yield3 (EnumFromTo3 (i + 1) high) i


data Filter3 a s = Filter3 (a -> Bool) s

instance (Stream3 s, Item3 s ~ a) => Stream3 (Filter3 a s) where
  type Item3 (Filter3 a s) = a
  next3 (Filter3 predicate s0) = loop s0 where
    loop s1 = case next3 s1 of
      Done3 -> Done3
      Yield3 s2 x
        | predicate x -> Yield3 (Filter3 predicate s2) x
        | otherwise   -> loop s2


sum3 :: (Num (Item3 s), Stream3 s) => s -> Item3 s
sum3 = loop 0 where
  loop total s1 = case next3 s1 of
    Done3 -> total
    Yield3 s2 x -> loop (total + x) s2


{-# NOINLINE chain3 #-}
chain3 :: Int -> Int
chain3 = sum3 . Filter3 even . EnumFromTo3 1

Adding an existential wrapper doesn't break the fusion.

data Stream4 a = forall s. (Stream3 s, Item3 s ~ a) => Stream4 s

enumFromTo4 :: (Ord a, Num a) => a -> a -> Stream4 a
enumFromTo4 start high = Stream4 (EnumFromTo3 start high)

filter4 :: (a -> Bool) -> Stream4 a -> Stream4 a
filter4 p (Stream4 s) = Stream4 (Filter3 p s)

sum4 :: Num a => Stream4 a -> a
sum4 (Stream4 s) = sum3 s

{-# NOINLINE chain4 #-}
chain4 :: Int -> Int
chain4 = sum4 . filter4 even . enumFromTo4 1
benchmarking typeclass Skip-less
time                 73.11 ms   (72.50 ms .. 73.94 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 69.80 ms   (68.86 ms .. 70.72 ms)
std dev              2.916 ms   (2.483 ms .. 3.577 ms)
variance introduced by outliers: 20% (moderately inflated)

benchmarking typeclass existential Skip-less
time                 75.44 ms   (74.91 ms .. 76.13 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 75.44 ms   (75.06 ms .. 75.80 ms)
std dev              1.118 ms   (904.6 μs .. 1.479 ms)

Both result in:

Rec {
-- RHS size: {terms: 36, types: 11, coercions: 0, joins: 1/1}
Main.main_$s$wloop1 [Occ=LoopBreaker]
  :: Int# -> Int# -> Int# -> Int#
[GblId, Arity=3, Caf=NoCafRefs, Str=<S,U><S,U><S,U>]
Main.main_$s$wloop1
  = \ (sc_s9HL :: Int#) (sc1_s9HK :: Int#) (sc2_s9HJ :: Int#) ->
      joinrec {
        $wloop2_s9y9 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
        [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []]
        $wloop2_s9y9 (ww_s9y1 :: Int#) (ww1_s9y6 :: Int#)
          = case tagToEnum# @ Bool (># ww_s9y1 ww1_s9y6) of {
              False ->
                case remInt# ww_s9y1 2# of {
                  __DEFAULT -> jump $wloop2_s9y9 (+# ww_s9y1 1#) ww1_s9y6;
                  0# ->
                    Main.main_$s$wloop1 ww1_s9y6 (+# ww_s9y1 1#) (+# sc2_s9HJ ww_s9y1)
                };
              True -> sc2_s9HJ
            }; } in
      jump $wloop2_s9y9 sc1_s9HK sc_s9HL
end Rec }

Changed 2 years ago by jmspiewak

Attachment: Main.2.hs added

Added typeclass-based stream

comment:3 Changed 2 years ago by bgamari

I am rather confused. What concretely is this ticket asking? Are you saying that ghc should reduce the skip+less program to the skip-ful core? Are you proposing stream fusion as an alternative to the foldr/build fusion which currently underlies GHC's list implementation? Something else entirely?

comment:4 Changed 2 years ago by jmspiewak

Sorry, after reading the paper on join points I thought GHC would be able to fuse the Skip-less stream. The fact that it does fuse the typeclass version convinced me it's possible.

comment:5 Changed 2 years ago by mpickering

Here's how I am understanding this post as I have also been thinking about this recently.

  1. In the Join Points PLDI paper there is an example in section 5 where join points allows find defined in terms of any to fuse. The author wants(/expects(?)) this technique to also fuse together normal functional pipelines as well as this simple examples.
  1. The introduction of type classes seems irrelevant to the first point. GHC has quite a few optimisations which it uses to eliminate type classes and you can play lots of tricks around this to cause fusion like this to happen. If you look at the code is structured, in sum where the specific instance of next3 is resolved to Filter3 Int (EnumFromTo Int) which means that all the consumers nicely line up with each other and hence fuse. The control flow of this program is very different to the first one.
  1. A program with a polymorphic return type parametrised by a type class is very similar to a program written in CPS. To see this similarity, when writing a program in CPS in order for execution to continue you must provide a function which says what to do next. When using type classes, you must instead provide the *type* of the result which in then is elaborated to a function or dictionary of functions which explain how to proceed. Thus structuring your program in this way usually allows this kind of fusion to happen.
  1. Ah-ha! but purpose of join points is to allow the compiler to optimise code written in direct style as well as code written in continuation passing style so can we do better in this case? The answer to which I don't yet know and I think is an open research problem.

comment:6 Changed 2 years ago by jmspiewak

Thank you for your detailed comment.

The paper specifically mentions fusing Skip-less filter so indeed I was expecting this example to work.

I brought up the typeclass approach because it's very inefficient on 8.0 but on 8.2 it's close to hand written recursion. It seems recursive join points can be used to fuse this stream, but only with the help of some typeclass-specific optimization.

comment:7 Changed 2 years ago by jmspiewak

I conducted more tests - it's not about the typeclass. Somehow putting the filter predicate in stream state allows fusion to happen. Is it changing the order of inlining? Prevents some other optimization which was clobbering the fusion opportunity?

filter5 :: (a -> Bool) -> Stream1 a -> Stream1 a
filter5 predicate (Stream1 s0 next) = Stream1 (Filter3 predicate s0) f where
  f (Filter3 p s1) = loop s1 where
    loop s = case next s of
      Done1 -> Done1
      Yield1 s' x
        | p x       -> Yield1 (Filter3 p s') x
        | otherwise -> loop s'

{-# NOINLINE chain5 #-}
chain5 :: Int -> Int
chain5 = sum1 . filter5 even . enumFromTo1 1

Compiles to:

-- RHS size: {terms: 36, types: 12, coercions: 0, joins: 2/2}
Main.$wchain5 [InlPrag=NOINLINE] :: Int# -> Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>]
Main.$wchain5
  = \ (ww_s9EU :: Int#) ->
      joinrec {
        $s$wloop_s9QI [Occ=LoopBreaker] :: Int# -> Int# -> Int#
        [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []]
        $s$wloop_s9QI (sc_s9QH :: Int#) (sc1_s9QG :: Int#)
          = joinrec {
              $wloop2_s9EA [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int#
              [LclId[JoinId(1)], Arity=1, Str=<S,U>, Unf=OtherCon []]
              $wloop2_s9EA (ww1_s9Ey :: Int#)
                = case tagToEnum# @ Bool (># ww1_s9Ey ww_s9EU) of {
                    False ->
                      case remInt# ww1_s9Ey 2# of {
                        __DEFAULT -> jump $wloop2_s9EA (+# ww1_s9Ey 1#);
                        0# -> jump $s$wloop_s9QI (+# ww1_s9Ey 1#) (+# sc1_s9QG ww1_s9Ey)
                      };
                    True -> sc1_s9QG
                  }; } in
            jump $wloop2_s9EA sc_s9QH; } in
      jump $s$wloop_s9QI 1# 0#

comment:8 Changed 2 years ago by simonpj

This is a very interesting example, thank you.

I note that in HEAD it fuses just fine. I have not yet figured out exactly why, but I want to look at this code, from teh Description:

letrec {
  $wloop [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int
  [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []]
  $wloop
    = \ (ww1_s9e8 :: Int#) ->
        case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of {
          False ->
            case remInt# ww1_s9e8 2# of {
              __DEFAULT -> $wloop (+# ww1_s9e8 1#);
              0# ->
                Main.Yield1
                  @ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#)) (GHC.Types.I# ww1_s9e8)
            };
          True -> Main.Done1 @ Int @ Int
        }; } in
joinrec {
  $wloop1 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
  [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []]
  $wloop1 (ww1_s9ef :: Int#) (ww2_s9ej :: Int#)
    = case $wloop ww2_s9ej of {
        Done1 -> ww1_s9ef;
        Yield1 s'_a497 x_a498 ->
          case x_a498 of { GHC.Types.I# y_a66i ->
          case s'_a497 of { GHC.Types.I# ww4_X9hA ->
          jump $wloop1 (+# ww1_s9ef y_a66i) ww4_X9hA
          }
          }
      }; } in
jump $wloop1 0# 1#

Once GHC gets the program into this state, it's not going to be able to optimise it. HEAD somehow avoids this dead end, but I hate things where GHC can get stuck in a dead end. I think this code ought to optimise just fine. Here's why.

Look at thar functionn $wloop. It's not a join point becuaes it's not tail-called in the body of the letrec. But suppose we transform $wloop like this:

let {
  $wloop [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int
  [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []]
  $wloop
    = \x. joinrec $j (ww1_s9e8 :: Int#)
            = case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of {
                False ->
                  case remInt# ww1_s9e8 2# of {
                    __DEFAULT -> jump $j (+# ww1_s9e8 1#);
                    0# ->
                      Main.Yield1
                        @ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#)) (GHC.Types.I# ww1_s9e8)
                  };
                True -> Main.Done1 @ Int @ Int
          in jump $j x

Now $wloop is non-recursive, so we can inline it at its only call site, in $wloop1:

joinrec {
  $wloop1 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
  [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []]
  $wloop1 (ww1_s9ef :: Int#) (ww2_s9ej :: Int#)
    = case (...the body of $wloop...) of {
        Done1 -> ww1_s9ef;
        Yield1 s'_a497 x_a498 ->
          case x_a498 of { GHC.Types.I# y_a66i ->
          case s'_a497 of { GHC.Types.I# ww4_X9hA ->
          jump $wloop1 (+# ww1_s9ef y_a66i) ww4_X9hA
          }
          }
      }; } in
jump $wloop1 0# 1#

And now the right fusion will happen.

The crucial bit was the transformation of $wloop: we took a tail-recursive function, introduced a (recursive) join point into it, which made it non-recursive. Even if nothing further happens, the implementation of $wloop is a bit more efcient because the tail call is just a branch. But the big thing here is that we can now inline $wloop in $wloop1.

So the idea is this: a transformation to turn a tail-recursive function definition into one that is implemented with a recursive join point. If we had such a transformation, it'd get us out of the dead end.

Actually, there's a variant of the Static Argument Transformation (StaticArgumentTransformation) at work here. Consider

f x y = case y of
          A -> f x y'
          B -> e2
          C -> e3

Here the first argument to f is "static"; that is, the same in every call. So we can transform like this

f x y = joinrec $j y = case y of
                          A -> $j y'
                          B -> e2
                          C -> e3
        in $j y

Note that x isn't passed around in every iteration.

There's a GHC module SAT.hs which does the static argument transformation, but it is not join-point aware. So we should fix that.

One reason we don't currently do SAT all the time is that the results are a bit ambiguous; see Andre Santos's thesis for more, cited on StaticArgumentTransformation. BUT I think that some (maybe most) of the problems with SAT may go away if we restrict SAT to tail-recursive functions that we can turn into joinrecs.

Any volunteers?

comment:9 in reply to:  8 Changed 2 years ago by mpickering

Owner: set to mpickering

I can look at this.

comment:10 Changed 2 years ago by simonpj

Keywords: StaticArgumentTransformation added
Summary: Skip-less stream fusionSkip-less stream fusion: a missed opportunity

comment:11 Changed 2 years ago by nomeata

This ticket is confusing, and about a specific use case. I took the liberty of creating a new ticket for the task of implementing a SAT for tail-recursive functions at #14067. Once that is done, we can revisit if it actually fixed the problem at hand.

comment:12 Changed 2 years ago by nomeata

Hmm, maybe I did that a bit prematurely, without fully understanding what we should be doing.

Upon closer reading, there are two tasks:

  • Transforming a tail-recursive function into a non-recursive function with a joinrec.
  • SAT for tail-recursive functions

It seems that the first is independent of the second, and if the first one was done, one can rephrase the second as “SAT for joinrecs”. Is that roughly correct?

(I am surprised that the former is not done already from the beginning of join points in GHC. But a brief look at the distribution of labor between OccAnal and the Simplifier makes it clear to me that this change is not trivial.)

comment:13 Changed 2 years ago by nomeata

See #14068 for the first of these tasks.

comment:14 Changed 2 years ago by simonpj

Yes, two tasks. Of these, the most important is the first #14068.

The second SAT part is worthwhile, I think, but this particular ticket doesn't really show why. The point is that while SAT is sometimes good for normal definitions, I think it's probably always good (or at least not harmful) for joinrecs.

comment:15 Changed 23 months ago by mpickering

Implementing this leads to:

benchmarking Skip-less
time                 632.2 ms   (532.7 ms .. 835.8 ms)
                     0.988 R²   (0.976 R² .. 1.000 R²)
mean                 1.187 s    (1.044 s .. 1.308 s)
std dev              191.0 ms   (0.0 s .. 207.9 ms)
variance introduced by outliers: 46% (moderately inflated)

benchmarking Skip
time                 904.3 ms   (904.1 ms .. 904.8 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.230 s    (1.130 s .. 1.306 s)
std dev              114.9 ms   (0.0 s .. 130.3 ms)
variance introduced by outliers: 22% (moderately inflated)

And the core looks quite similar:

chain1                                                                          
  = \ (w_s9yo :: Int) ->                                                        
      case w_s9yo of { GHC.Types.I# ww1_s9yr ->                                 
      joinrec {                                                                 
        $wsat_worker2_s9yn [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker]           
          :: Int# -> Int# -> Int                                                
        [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>m, Unf=OtherCon []]           
        $wsat_worker2_s9yn (ww2_s9yh :: Int#) (ww3_s9yl :: Int#)                
          = join {                                                              
              lvl_s9BD [Dmd=<L,U(U)>] :: Int                                    
              [LclId[JoinId(0)], Str=m, Unf=OtherCon []]                        
              lvl_s9BD = GHC.Types.I# ww2_s9yh } in                             
            joinrec {                                                           
              $wsat_worker3_s9yc [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker]     
                :: Int# -> Int                                                      
              [LclId[JoinId(1)], Arity=1, Str=<S,U>m, Unf=OtherCon []]          
              $wsat_worker3_s9yc (ww4_s9ya :: Int#)                             
                = case ># ww4_s9ya ww1_s9yr of {                                
                    __DEFAULT ->                                                
                      case remInt# ww4_s9ya 2# of {                             
                        __DEFAULT -> jump $wsat_worker3_s9yc (+# ww4_s9ya 1#);  
                        0# ->                                                   
                          jump $wsat_worker2_s9yn (+# ww2_s9yh ww4_s9ya) (+# ww4_s9ya 1#)
                      };                                                        
                    1# -> jump lvl_s9BD                                         
                  }; } in                                                       
            jump $wsat_worker3_s9yc ww3_s9yl; } in                              
      jump $wsat_worker2_s9yn 0# 1#                                             
      }
chain2                                                                          
  = \ (w_s9xY :: Int) ->                                                        
      case w_s9xY of { GHC.Types.I# ww1_s9y1 ->                                 
      joinrec {                                                                 
        $wsat_worker2_s9xX [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker]           
          :: Int# -> Int# -> Int                                                
        [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>m, Unf=OtherCon []]           
        $wsat_worker2_s9xX (ww2_s9xR :: Int#) (ww3_s9xV :: Int#)                
          = case ># ww3_s9xV ww1_s9y1 of {                                      
              __DEFAULT ->                                                      
                case remInt# ww3_s9xV 2# of {                                   
                  __DEFAULT -> jump $wsat_worker2_s9xX ww2_s9xR (+# ww3_s9xV 1#);
                  0# ->                                                         
                    jump $wsat_worker2_s9xX (+# ww2_s9xR ww3_s9xV) (+# ww3_s9xV 1#)
                };                                                              
              1# -> GHC.Types.I# ww2_s9xR                                       
            }; } in                                                             
      jump $wsat_worker2_s9xX 0# 1#                                             
      }  

Still need to investigate the general impact more.

comment:16 Changed 23 months ago by simonpj

I'm getting lost.

Implementing this leads to

  • What exactly is "this"?
  • Does chain1 have "this" implemented? Or chain2?
  • Which do you think is most desirable, and why? (chain2 looks simpler.)
  • Are you assuming loopification #14068 is done?

comment:17 Changed 23 months ago by mpickering

Here is the exact file I am using.

{-# LANGUAGE ExistentialQuantification #-}
module Main where

import GHC.Prim

import Criterion.Main
import GHC.Prim

main :: IO ()
main = defaultMain [b1, b2] where
  b1 = bench "Skip-less" $ whnf chain1 x
  b2 = bench "Skip" $ whnf chain2 x
  x = 100000000


--------------------------------------------------------------------------------

data Step1 s a = Done1 | Yield1 s a
data Stream1 a = forall s. Stream1 s (s -> Step1 s a)

enumFromTo1 :: (Ord a, Num a) => a -> a -> Stream1 a
enumFromTo1 start high = Stream1 start f where
  f i | i > high  = Done1
      | otherwise = Yield1 (i + 1) i

filter1 :: (a -> Bool) -> Stream1 a -> Stream1 a
filter1 predicate (Stream1 s0 next) = Stream1 s0 loop where
  loop s = case next s of
    Done1 -> Done1
    Yield1 s' x
      | predicate x -> Yield1 s' x
      | otherwise   -> loop s'

sum1 :: Num a => Stream1 a -> a
sum1 (Stream1 s0 next) = loop 0 s0 where
  loop total s = case next s of
    Done1 -> total
    Yield1 s' x -> loop (total + x) s'

chain1 :: Int -> Int
chain1 = sum1 . filter1 even . enumFromTo1 1

--------------------------------------------------------------------------------

data Step2 s a = Done2 | Skip2 s | Yield2 s a
data Stream2 a = forall s. Stream2 s (s -> Step2 s a)

enumFromTo2 :: (Ord a, Num a) => a -> a -> Stream2 a
enumFromTo2 start high = Stream2 start f where
  f i | i > high  = Done2
      | otherwise = Yield2 (i + 1) i

filter2 :: (a -> Bool) -> Stream2 a -> Stream2 a
filter2 predicate (Stream2 s0 next) = Stream2 s0 loop where
  loop s = case next s of
    Done2 -> Done2
    Skip2 s' -> Skip2 s'
    Yield2 s' x
      | predicate x -> Yield2 s' x
      | otherwise   -> Skip2 s'

sum2 :: Num a => Stream2 a -> a
sum2 (Stream2 s0 next) = loop 0 s0 where
  loop total s = case next s of
    Done2 -> total
    Skip2 s' -> loop total s'
    Yield2 s' x -> loop (total + x) s'

chain2 :: Int -> Int
chain2 = sum2 . filter2 even . enumFromTo2 1

I modified the SAT pass to ignore information about static arguments, perform the SAT transformation and then check whether we created a join point. If we create a join point then we keep the transformed version, otherwise we leave the code as it was. (This is what you suggested in comment:8)

I then compiled the above program with this transformation turned on. chain2 was unaffected, the core is as before but the core for chain1 changed quite a bit.

It seems from running the benchmarks that chain1 is better but I didn't look yet why this might be the case.

I am building from a recent HEAD (11d9615e9f751d6ed084f1cb20c24ad6b408230e) so whether loopification is in there or not I don't know.

comment:18 Changed 23 months ago by mpickering

I read #14068 now which I thought the whole point of this ticket was?

What is the difference meant to be?

comment:19 Changed 23 months ago by bgamari

To summarize (excerpting from comment:8), this ticket consists of two parts,

  • #14068: Transforming recursive functions with non-tail-call uses into recursive join points
  • #14067: Enabling SAT to transform join points. This isn't necessary for the case described in this ticket, but rather an unrelated issue noticed while looking at this case.

It might be best to keep specific discussion on those tickets where possible.

Last edited 23 months ago by bgamari (previous) (diff)
Note: See TracTickets for help on using tickets.