Opened 2 years ago
Last modified 23 months ago
#13966 new feature request
Skip-less stream fusion: a missed opportunity
Reported by: | jmspiewak | Owned by: | mpickering |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Compiler | Version: | 8.2.1-rc3 |
Keywords: | JoinPoints, StaticArgumentTransformation | Cc: | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | None/Unknown | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | #14067 #14068 | Differential Rev(s): | |
Wiki Page: |
Description (last modified by )
A simple stream chain
chain :: Int -> Int chain = sum . filter even . enumFromTo 1
doesn't fuse under a Skip-less stream on GHC 8.2-rc3 -O2.
Benchmarked against a Skip stream (LLVM backend):
benchmarking Skip-less time 248.9 ms (243.3 ms .. 257.3 ms) 0.998 R² (0.995 R² .. 0.999 R²) mean 250.9 ms (248.1 ms .. 254.7 ms) std dev 5.985 ms (4.831 ms .. 7.311 ms) benchmarking Skip time 61.26 ms (60.39 ms .. 62.44 ms) 0.998 R² (0.997 R² .. 0.999 R²) mean 62.45 ms (61.96 ms .. 62.91 ms) std dev 1.387 ms (1.190 ms .. 1.669 ms)
Relevant core (chain1 is Skip-less, chain2 has Skip):
-- RHS size: {terms: 51, types: 27, coercions: 0, joins: 1/2} Main.$wchain1 [InlPrag=NOINLINE] :: Int# -> Int# [GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>] Main.$wchain1 = \ (ww_s9ep :: Int#) -> letrec { $wloop_s9ea [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []] $wloop_s9ea = \ (ww1_s9e8 :: Int#) -> case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of { False -> case remInt# ww1_s9e8 2# of { __DEFAULT -> $wloop_s9ea (+# ww1_s9e8 1#); 0# -> Main.Yield1 @ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#)) (GHC.Types.I# ww1_s9e8) }; True -> Main.Done1 @ Int @ Int }; } in joinrec { $wloop1_s9el [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int# [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []] $wloop1_s9el (ww1_s9ef :: Int#) (ww2_s9ej :: Int#) = case $wloop_s9ea ww2_s9ej of { Done1 -> ww1_s9ef; Yield1 s'_a497 x_a498 -> case x_a498 of { GHC.Types.I# y_a66i -> case s'_a497 of { GHC.Types.I# ww4_X9hA -> jump $wloop1_s9el (+# ww1_s9ef y_a66i) ww4_X9hA } } }; } in jump $wloop1_s9el 0# 1# -- RHS size: {terms: 33, types: 9, coercions: 0, joins: 1/1} Main.$wchain2 [InlPrag=NOINLINE] :: Int# -> Int# [GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>] Main.$wchain2 = \ (ww_s9dZ :: Int#) -> joinrec { $wloop_s9dV [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int# [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []] $wloop_s9dV (ww1_s9dP :: Int#) (ww2_s9dT :: Int#) = case tagToEnum# @ Bool (># ww2_s9dT ww_s9dZ) of { False -> case remInt# ww2_s9dT 2# of { __DEFAULT -> jump $wloop_s9dV ww1_s9dP (+# ww2_s9dT 1#); 0# -> jump $wloop_s9dV (+# ww1_s9dP ww2_s9dT) (+# ww2_s9dT 1#) }; True -> ww1_s9dP }; } in jump $wloop_s9dV 0# 1#
The code was adapted from M. Snoyman's blog post "Iterators and Streams in Rust and Haskell".
Attachments (2)
Change History (21)
comment:1 Changed 2 years ago by
Description: | modified (diff) |
---|---|
Keywords: | JoinPoints added |
Priority: | low → normal |
Summary: | Missed optimization - loop fusion → Skip-less stream fusion |
Changed 2 years ago by
comment:2 Changed 2 years ago by
A typeclass-based Skip-less stream (also from the blog post) does fuse.
data Step3 s = Done3 | Yield3 s (Item3 s) class Stream3 s where type Item3 s next3 :: s -> Step3 s data EnumFromTo3 a = EnumFromTo3 a a instance (Ord a, Num a) => Stream3 (EnumFromTo3 a) where type Item3 (EnumFromTo3 a) = a next3 (EnumFromTo3 i high) | i > high = Done3 | otherwise = Yield3 (EnumFromTo3 (i + 1) high) i data Filter3 a s = Filter3 (a -> Bool) s instance (Stream3 s, Item3 s ~ a) => Stream3 (Filter3 a s) where type Item3 (Filter3 a s) = a next3 (Filter3 predicate s0) = loop s0 where loop s1 = case next3 s1 of Done3 -> Done3 Yield3 s2 x | predicate x -> Yield3 (Filter3 predicate s2) x | otherwise -> loop s2 sum3 :: (Num (Item3 s), Stream3 s) => s -> Item3 s sum3 = loop 0 where loop total s1 = case next3 s1 of Done3 -> total Yield3 s2 x -> loop (total + x) s2 {-# NOINLINE chain3 #-} chain3 :: Int -> Int chain3 = sum3 . Filter3 even . EnumFromTo3 1
Adding an existential wrapper doesn't break the fusion.
data Stream4 a = forall s. (Stream3 s, Item3 s ~ a) => Stream4 s enumFromTo4 :: (Ord a, Num a) => a -> a -> Stream4 a enumFromTo4 start high = Stream4 (EnumFromTo3 start high) filter4 :: (a -> Bool) -> Stream4 a -> Stream4 a filter4 p (Stream4 s) = Stream4 (Filter3 p s) sum4 :: Num a => Stream4 a -> a sum4 (Stream4 s) = sum3 s {-# NOINLINE chain4 #-} chain4 :: Int -> Int chain4 = sum4 . filter4 even . enumFromTo4 1
benchmarking typeclass Skip-less time 73.11 ms (72.50 ms .. 73.94 ms) 0.999 R² (0.998 R² .. 1.000 R²) mean 69.80 ms (68.86 ms .. 70.72 ms) std dev 2.916 ms (2.483 ms .. 3.577 ms) variance introduced by outliers: 20% (moderately inflated) benchmarking typeclass existential Skip-less time 75.44 ms (74.91 ms .. 76.13 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 75.44 ms (75.06 ms .. 75.80 ms) std dev 1.118 ms (904.6 μs .. 1.479 ms)
Both result in:
Rec { -- RHS size: {terms: 36, types: 11, coercions: 0, joins: 1/1} Main.main_$s$wloop1 [Occ=LoopBreaker] :: Int# -> Int# -> Int# -> Int# [GblId, Arity=3, Caf=NoCafRefs, Str=<S,U><S,U><S,U>] Main.main_$s$wloop1 = \ (sc_s9HL :: Int#) (sc1_s9HK :: Int#) (sc2_s9HJ :: Int#) -> joinrec { $wloop2_s9y9 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int# [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []] $wloop2_s9y9 (ww_s9y1 :: Int#) (ww1_s9y6 :: Int#) = case tagToEnum# @ Bool (># ww_s9y1 ww1_s9y6) of { False -> case remInt# ww_s9y1 2# of { __DEFAULT -> jump $wloop2_s9y9 (+# ww_s9y1 1#) ww1_s9y6; 0# -> Main.main_$s$wloop1 ww1_s9y6 (+# ww_s9y1 1#) (+# sc2_s9HJ ww_s9y1) }; True -> sc2_s9HJ }; } in jump $wloop2_s9y9 sc1_s9HK sc_s9HL end Rec }
comment:3 Changed 2 years ago by
I am rather confused. What concretely is this ticket asking? Are you saying that ghc should reduce the skip+less program to the skip-ful core? Are you proposing stream fusion as an alternative to the foldr/build fusion which currently underlies GHC's list implementation? Something else entirely?
comment:4 Changed 2 years ago by
Sorry, after reading the paper on join points I thought GHC would be able to fuse the Skip-less stream. The fact that it does fuse the typeclass version convinced me it's possible.
comment:5 Changed 2 years ago by
Here's how I am understanding this post as I have also been thinking about this recently.
- In the Join Points PLDI paper there is an example in section 5 where join points allows
find
defined in terms ofany
to fuse. The author wants(/expects(?)) this technique to also fuse together normal functional pipelines as well as this simple examples.
- The introduction of type classes seems irrelevant to the first point. GHC has quite a few optimisations which it uses to eliminate type classes and you can play lots of tricks around this to cause fusion like this to happen. If you look at the code is structured, in
sum
where the specific instance ofnext3
is resolved toFilter3 Int (EnumFromTo Int)
which means that all the consumers nicely line up with each other and hence fuse. The control flow of this program is very different to the first one.
- A program with a polymorphic return type parametrised by a type class is very similar to a program written in CPS. To see this similarity, when writing a program in CPS in order for execution to continue you must provide a function which says what to do next. When using type classes, you must instead provide the *type* of the result which in then is elaborated to a function or dictionary of functions which explain how to proceed. Thus structuring your program in this way usually allows this kind of fusion to happen.
- Ah-ha! but purpose of join points is to allow the compiler to optimise code written in direct style as well as code written in continuation passing style so can we do better in this case? The answer to which I don't yet know and I think is an open research problem.
comment:6 Changed 2 years ago by
Thank you for your detailed comment.
The paper specifically mentions fusing Skip-less filter so indeed I was expecting this example to work.
I brought up the typeclass approach because it's very inefficient on 8.0 but on 8.2 it's close to hand written recursion. It seems recursive join points can be used to fuse this stream, but only with the help of some typeclass-specific optimization.
comment:7 Changed 2 years ago by
I conducted more tests - it's not about the typeclass. Somehow putting the filter predicate in stream state allows fusion to happen. Is it changing the order of inlining? Prevents some other optimization which was clobbering the fusion opportunity?
filter5 :: (a -> Bool) -> Stream1 a -> Stream1 a filter5 predicate (Stream1 s0 next) = Stream1 (Filter3 predicate s0) f where f (Filter3 p s1) = loop s1 where loop s = case next s of Done1 -> Done1 Yield1 s' x | p x -> Yield1 (Filter3 p s') x | otherwise -> loop s' {-# NOINLINE chain5 #-} chain5 :: Int -> Int chain5 = sum1 . filter5 even . enumFromTo1 1
Compiles to:
-- RHS size: {terms: 36, types: 12, coercions: 0, joins: 2/2} Main.$wchain5 [InlPrag=NOINLINE] :: Int# -> Int# [GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>] Main.$wchain5 = \ (ww_s9EU :: Int#) -> joinrec { $s$wloop_s9QI [Occ=LoopBreaker] :: Int# -> Int# -> Int# [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []] $s$wloop_s9QI (sc_s9QH :: Int#) (sc1_s9QG :: Int#) = joinrec { $wloop2_s9EA [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# [LclId[JoinId(1)], Arity=1, Str=<S,U>, Unf=OtherCon []] $wloop2_s9EA (ww1_s9Ey :: Int#) = case tagToEnum# @ Bool (># ww1_s9Ey ww_s9EU) of { False -> case remInt# ww1_s9Ey 2# of { __DEFAULT -> jump $wloop2_s9EA (+# ww1_s9Ey 1#); 0# -> jump $s$wloop_s9QI (+# ww1_s9Ey 1#) (+# sc1_s9QG ww1_s9Ey) }; True -> sc1_s9QG }; } in jump $wloop2_s9EA sc_s9QH; } in jump $s$wloop_s9QI 1# 0#
comment:8 follow-up: 9 Changed 2 years ago by
This is a very interesting example, thank you.
I note that in HEAD it fuses just fine. I have not yet figured out exactly why, but I want to look at this code, from teh Description:
letrec { $wloop [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []] $wloop = \ (ww1_s9e8 :: Int#) -> case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of { False -> case remInt# ww1_s9e8 2# of { __DEFAULT -> $wloop (+# ww1_s9e8 1#); 0# -> Main.Yield1 @ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#)) (GHC.Types.I# ww1_s9e8) }; True -> Main.Done1 @ Int @ Int }; } in joinrec { $wloop1 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int# [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []] $wloop1 (ww1_s9ef :: Int#) (ww2_s9ej :: Int#) = case $wloop ww2_s9ej of { Done1 -> ww1_s9ef; Yield1 s'_a497 x_a498 -> case x_a498 of { GHC.Types.I# y_a66i -> case s'_a497 of { GHC.Types.I# ww4_X9hA -> jump $wloop1 (+# ww1_s9ef y_a66i) ww4_X9hA } } }; } in jump $wloop1 0# 1#
Once GHC gets the program into this state, it's not going to be able to optimise it. HEAD somehow avoids this dead end, but I hate things where GHC can get stuck in a dead end. I think this code ought to optimise just fine. Here's why.
Look at thar functionn $wloop
. It's not a join point becuaes it's not tail-called in the body of the letrec
. But suppose we transform $wloop
like this:
let { $wloop [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []] $wloop = \x. joinrec $j (ww1_s9e8 :: Int#) = case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of { False -> case remInt# ww1_s9e8 2# of { __DEFAULT -> jump $j (+# ww1_s9e8 1#); 0# -> Main.Yield1 @ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#)) (GHC.Types.I# ww1_s9e8) }; True -> Main.Done1 @ Int @ Int in jump $j x
Now $wloop
is non-recursive, so we can inline it at its only call site, in $wloop1
:
joinrec { $wloop1 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int# [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []] $wloop1 (ww1_s9ef :: Int#) (ww2_s9ej :: Int#) = case (...the body of $wloop...) of { Done1 -> ww1_s9ef; Yield1 s'_a497 x_a498 -> case x_a498 of { GHC.Types.I# y_a66i -> case s'_a497 of { GHC.Types.I# ww4_X9hA -> jump $wloop1 (+# ww1_s9ef y_a66i) ww4_X9hA } } }; } in jump $wloop1 0# 1#
And now the right fusion will happen.
The crucial bit was the transformation of $wloop
: we took a tail-recursive function, introduced a (recursive) join point into it, which made it non-recursive. Even if nothing further happens, the implementation of $wloop
is a bit more efcient because the tail call is just a branch. But the big thing here is that we can now inline $wloop
in $wloop1
.
So the idea is this: a transformation to turn a tail-recursive function definition into one that is implemented with a recursive join point. If we had such a transformation, it'd get us out of the dead end.
Actually, there's a variant of the Static Argument Transformation (StaticArgumentTransformation) at work here. Consider
f x y = case y of A -> f x y' B -> e2 C -> e3
Here the first argument to f
is "static"; that is, the same in every call. So we can transform like this
f x y = joinrec $j y = case y of A -> $j y' B -> e2 C -> e3 in $j y
Note that x
isn't passed around in every iteration.
There's a GHC module SAT.hs
which does the static argument transformation, but it is not join-point aware. So we should fix that.
One reason we don't currently do SAT all the time is that the results are a bit ambiguous; see Andre Santos's thesis for more, cited on StaticArgumentTransformation. BUT I think that some (maybe most) of the problems with SAT may go away if we restrict SAT to tail-recursive functions that we can turn into joinrecs
.
Any volunteers?
comment:10 Changed 2 years ago by
Keywords: | StaticArgumentTransformation added |
---|---|
Summary: | Skip-less stream fusion → Skip-less stream fusion: a missed opportunity |
comment:11 Changed 2 years ago by
Related Tickets: | → #14067 |
---|
This ticket is confusing, and about a specific use case. I took the liberty of creating a new ticket for the task of implementing a SAT for tail-recursive functions at #14067. Once that is done, we can revisit if it actually fixed the problem at hand.
comment:12 Changed 2 years ago by
Hmm, maybe I did that a bit prematurely, without fully understanding what we should be doing.
Upon closer reading, there are two tasks:
- Transforming a tail-recursive function into a non-recursive function with a
joinrec
. - SAT for tail-recursive functions
It seems that the first is independent of the second, and if the first one was done, one can rephrase the second as “SAT for joinrec
s”. Is that roughly correct?
(I am surprised that the former is not done already from the beginning of join points in GHC. But a brief look at the distribution of labor between OccAnal and the Simplifier makes it clear to me that this change is not trivial.)
comment:13 Changed 2 years ago by
Related Tickets: | #14067 → #14067 #14068 |
---|
See #14068 for the first of these tasks.
comment:14 Changed 2 years ago by
Yes, two tasks. Of these, the most important is the first #14068.
The second SAT part is worthwhile, I think, but this particular ticket doesn't really show why. The point is that while SAT is sometimes good for normal definitions, I think it's probably always good (or at least not harmful) for joinrecs
.
comment:15 Changed 23 months ago by
Implementing this leads to:
benchmarking Skip-less time 632.2 ms (532.7 ms .. 835.8 ms) 0.988 R² (0.976 R² .. 1.000 R²) mean 1.187 s (1.044 s .. 1.308 s) std dev 191.0 ms (0.0 s .. 207.9 ms) variance introduced by outliers: 46% (moderately inflated) benchmarking Skip time 904.3 ms (904.1 ms .. 904.8 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.230 s (1.130 s .. 1.306 s) std dev 114.9 ms (0.0 s .. 130.3 ms) variance introduced by outliers: 22% (moderately inflated)
And the core looks quite similar:
chain1 = \ (w_s9yo :: Int) -> case w_s9yo of { GHC.Types.I# ww1_s9yr -> joinrec { $wsat_worker2_s9yn [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker] :: Int# -> Int# -> Int [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>m, Unf=OtherCon []] $wsat_worker2_s9yn (ww2_s9yh :: Int#) (ww3_s9yl :: Int#) = join { lvl_s9BD [Dmd=<L,U(U)>] :: Int [LclId[JoinId(0)], Str=m, Unf=OtherCon []] lvl_s9BD = GHC.Types.I# ww2_s9yh } in joinrec { $wsat_worker3_s9yc [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker] :: Int# -> Int [LclId[JoinId(1)], Arity=1, Str=<S,U>m, Unf=OtherCon []] $wsat_worker3_s9yc (ww4_s9ya :: Int#) = case ># ww4_s9ya ww1_s9yr of { __DEFAULT -> case remInt# ww4_s9ya 2# of { __DEFAULT -> jump $wsat_worker3_s9yc (+# ww4_s9ya 1#); 0# -> jump $wsat_worker2_s9yn (+# ww2_s9yh ww4_s9ya) (+# ww4_s9ya 1#) }; 1# -> jump lvl_s9BD }; } in jump $wsat_worker3_s9yc ww3_s9yl; } in jump $wsat_worker2_s9yn 0# 1# }
chain2 = \ (w_s9xY :: Int) -> case w_s9xY of { GHC.Types.I# ww1_s9y1 -> joinrec { $wsat_worker2_s9xX [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker] :: Int# -> Int# -> Int [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>m, Unf=OtherCon []] $wsat_worker2_s9xX (ww2_s9xR :: Int#) (ww3_s9xV :: Int#) = case ># ww3_s9xV ww1_s9y1 of { __DEFAULT -> case remInt# ww3_s9xV 2# of { __DEFAULT -> jump $wsat_worker2_s9xX ww2_s9xR (+# ww3_s9xV 1#); 0# -> jump $wsat_worker2_s9xX (+# ww2_s9xR ww3_s9xV) (+# ww3_s9xV 1#) }; 1# -> GHC.Types.I# ww2_s9xR }; } in jump $wsat_worker2_s9xX 0# 1# }
Still need to investigate the general impact more.
comment:16 Changed 23 months ago by
I'm getting lost.
Implementing this leads to
- What exactly is "this"?
- Does
chain1
have "this" implemented? Orchain2
?
- Which do you think is most desirable, and why? (
chain2
looks simpler.)
- Are you assuming loopification #14068 is done?
comment:17 Changed 23 months ago by
Here is the exact file I am using.
{-# LANGUAGE ExistentialQuantification #-} module Main where import GHC.Prim import Criterion.Main import GHC.Prim main :: IO () main = defaultMain [b1, b2] where b1 = bench "Skip-less" $ whnf chain1 x b2 = bench "Skip" $ whnf chain2 x x = 100000000 -------------------------------------------------------------------------------- data Step1 s a = Done1 | Yield1 s a data Stream1 a = forall s. Stream1 s (s -> Step1 s a) enumFromTo1 :: (Ord a, Num a) => a -> a -> Stream1 a enumFromTo1 start high = Stream1 start f where f i | i > high = Done1 | otherwise = Yield1 (i + 1) i filter1 :: (a -> Bool) -> Stream1 a -> Stream1 a filter1 predicate (Stream1 s0 next) = Stream1 s0 loop where loop s = case next s of Done1 -> Done1 Yield1 s' x | predicate x -> Yield1 s' x | otherwise -> loop s' sum1 :: Num a => Stream1 a -> a sum1 (Stream1 s0 next) = loop 0 s0 where loop total s = case next s of Done1 -> total Yield1 s' x -> loop (total + x) s' chain1 :: Int -> Int chain1 = sum1 . filter1 even . enumFromTo1 1 -------------------------------------------------------------------------------- data Step2 s a = Done2 | Skip2 s | Yield2 s a data Stream2 a = forall s. Stream2 s (s -> Step2 s a) enumFromTo2 :: (Ord a, Num a) => a -> a -> Stream2 a enumFromTo2 start high = Stream2 start f where f i | i > high = Done2 | otherwise = Yield2 (i + 1) i filter2 :: (a -> Bool) -> Stream2 a -> Stream2 a filter2 predicate (Stream2 s0 next) = Stream2 s0 loop where loop s = case next s of Done2 -> Done2 Skip2 s' -> Skip2 s' Yield2 s' x | predicate x -> Yield2 s' x | otherwise -> Skip2 s' sum2 :: Num a => Stream2 a -> a sum2 (Stream2 s0 next) = loop 0 s0 where loop total s = case next s of Done2 -> total Skip2 s' -> loop total s' Yield2 s' x -> loop (total + x) s' chain2 :: Int -> Int chain2 = sum2 . filter2 even . enumFromTo2 1
I modified the SAT pass to ignore information about static arguments, perform the SAT transformation and then check whether we created a join point. If we create a join point then we keep the transformed version, otherwise we leave the code as it was. (This is what you suggested in comment:8)
I then compiled the above program with this transformation turned on. chain2
was unaffected, the core is as before but the core for chain1
changed quite a bit.
It seems from running the benchmarks that chain1
is better but I didn't look yet why this might be the case.
I am building from a recent HEAD (11d9615e9f751d6ed084f1cb20c24ad6b408230e) so whether loopification is in there or not I don't know.
comment:18 Changed 23 months ago by
I read #14068 now which I thought the whole point of this ticket was?
What is the difference meant to be?
comment:19 Changed 23 months ago by
To summarize (excerpting from comment:8), this ticket consists of two parts,
- #14068: Transforming recursive functions with non-tail-call uses into recursive join points
- #14067: Enabling SAT to transform join points. This isn't necessary for the case described in this ticket, but rather an unrelated issue noticed while looking at this case.
It might be best to keep specific discussion on those tickets where possible.
Criterion benchmark