Opened 5 years ago

Closed 5 years ago

# Data.List.cycle is not a good producer

Reported by: Owned by: dfeuer normal libraries/base 7.8.3 hvr, ekmett Unknown/Multiple Unknown/Multiple Runtime performance bug

Data.List.cycle is not a good producer. I believe the following fixes it. The tests I've profiled so far suggest it does so.

{-# INLINE cycle #-}
cycle :: [a] -> [a]
cycle [] = error "Empty cycle."
cycle xs = let cyc = augment cycle' cyc in cyc
where cycle' c n = foldr c n xs


### comment:1 Changed 5 years ago by dfeuer

Resolution: → invalid new → closed

No, not ready yet. There are some more subtleties.

### comment:2 Changed 5 years ago by dfeuer

Description: modified (diff) invalid closed → new

### comment:3 follow-up:  4 Changed 5 years ago by nomeata

Remember that GHC doesn’t inline recursive things (such as cyc).

Also, I don’t think your cycle is a good consumer: To inline it, you need to know that its argument is a :. But to make it a good consumer, the argument needs to be of the shape build ....

### comment:4 in reply to:  3 Changed 5 years ago by dfeuer

Remember that GHC doesn’t inline recursive things (such as cyc).

Also, I don’t think your cycle is a good consumer: To inline it, you need to know that its argument is a :. But to make it a good consumer, the argument needs to be of the shape build ....

I suppose I should (try to) find a way to rewrite it to another form if it doesn't get eaten by foldr. I'm not understanding your concern about the consumption side. cyc isn't inlined, but the enclosing cycle is. Shouldn't that be good enough to allow the foldr in cycle' to fuse with a build or augment in xs?

### comment:5 follow-up:  6 Changed 5 years ago by nomeata

But note that you are pattern matching on xs!

### comment:6 in reply to:  5 Changed 5 years ago by dfeuer

But note that you are pattern matching on xs!

Ah, I see what you mean. That pattern match, however, is not essential. Suppose we leave it out. Then we get

cycle [] = let cyc = augment cycle' cyc in cyc
where
cycle' c n = foldr c n []


foldr/nil gives

cycle [] = let cyc = augment (\c n -> []) cyc in cyc


Then applying augment:

cycle [] = let cyc = [] in cyc


So we've turned _|_ into [], whereas a similar omission from the current definition of cycle would turn an error into a much-less-desirable non-termination. This is not a disaster, and in fact it makes sense from a monoid perspective.

### comment:7 follow-up:  8 Changed 5 years ago by nomeata

I think you meant

cycle [] = let cyc = augment cycle' cyc in cyc
where
cycle' c n = foldr c n []

cycle [] = let cyc = augment (\c n -> n) cyc in cyc

cycle [] = let cyc = cyc in cyc

cycle [] = ⊥


so you turned a helpful error message into nontermination

### comment:8 in reply to:  7 Changed 5 years ago by dfeuer

I think you meant

cycle [] = let cyc = augment cycle' cyc in cyc
where
cycle' c n = foldr c n []

cycle [] = let cyc = augment (\c n -> n) cyc in cyc


I don't think so. Substituting [] for xs, cycle' c n = foldr c n [] = [], so cycle' = \c n -> [].

### comment:9 follow-up:  10 Changed 5 years ago by nomeata

foldr c n [] = n!

### comment:10 in reply to:  9 Changed 5 years ago by dfeuer

foldr c n [] = n!

I am a fool! That said, we could make an optimization flag, and maybe include it in -O2.

### comment:11 Changed 5 years ago by nomeata

Not sure what you mean. An optimization flag that changes whether cycle [] is an error or nontermination? I don’t think this is a good idea.

Anyways, as you said: It is more important that cycle is a good producer. I don’t see how an enclosing foldr would get in touch with the augment in let cyc = augment cycle' cyc in cyc.

### comment:12 follow-ups:  13  15 Changed 5 years ago by nomeata

I think you need to move the knot-tying into the argument to build, e.g. something like

cycle xs = build $\c _ -> let cyc = foldr c cyc xs in cyc  (ignoring the issue of cycle [] for now) It seems that this would be both a good produce, and possibly even a good consumer. ### comment:13 in reply to: 12 Changed 5 years ago by dfeuer Replying to nomeata: I think you need to move the knot-tying into the argument to build, e.g. something like cycle xs = build$ \c _ ->  let cyc = foldr c cyc xs in cyc


(ignoring the issue of cycle [] for now)

It seems that this would be both a good producer, and possibly even a good consumer.

I won't be able to test anything for some hours, but that does look very promising indeed. I wonder if GHC performed some transformation that turned mine into yours somehow, but yours is definitely much clearer and prettier in any case. I think it's a good producer, and a good consumer for build. It doesn't look like a perfect consumer for augment (although it's not a bad one), but that may be unavoidable. I believe cycle is one of the more popular list functions in production code, so personally I think it's probably worth giving up the error message on a null argument to buy a little performance, even on the consumption side, but it's also true that the production side is more important in general.

### comment:14 Changed 5 years ago by isaacdupree

In compiled code, the latter definition throws <<loop>>, which is nice. Try runghc test.hs (nontermination) vs ghc test.hs && ./test (exception):

import GHC.Exts

cycle2 xs = build $\c _ -> let cyc = foldr c cyc xs in cyc main = print (cycle2 [] !! 0 :: Int)  ### comment:15 in reply to: 12 Changed 5 years ago by dfeuer Replying to nomeata: I wrote some test functions, using the type type BuildArg a = forall b . (a -> b -> b) -> b -> b  Generally things look pretty good (lots of Core to look at below, along with comments you should take with a grain of salt since I'm just a newbie). I found one case where your definition works significantly worse than the Prelude's. The bad case I've found: main = print$ foldr (+) 0 $take 30000000$ map (* 13) $cycle [1,8,4,0,(5::Int)]  For some reason, your cycle implementation allocates a lot, but the Prelude one runs in constant space. The basic test translations: cycleBuild :: BuildArg a -> [a] cycleBuild g = cycle (build g)  produces cycleBuild cycleBuild = \ @ a_a3jG g_a3hO -> letrec { cyc_a2iD cyc_a2iD = g_a3hO (:) cyc_a2iD; } in cyc_a2iD  which is obviously good, and obviously better than the result with Prelude.cycle, which is cycleBuild cycleBuild = \ @ a_a1Xy g_a1zF -> case g_a1zF (:) ([]) of wild_a2aX { [] -> cycle1; : ipv_a2b3 ipv1_a2b4 -> letrec { xs'_a2b1 xs'_a2b1 = ++ wild_a2aX xs'_a2b1; } in xs'_a2b1 }  cycleAugment :: BuildArg a -> [a] -> [a] cycleAugment g xs = cycle (augment g xs)  produces cycleAugment cycleAugment = \ @ a_a3j1 g_a3hS xs_a3hT -> letrec { cyc_a2iD cyc_a2iD = g_a3hS (:) (++ xs_a3hT cyc_a2iD); } in cyc_a2iD  which I think is likely the best we can do, and clearly better than what the Prelude gives: cycleAugment cycleAugment = \ @ a_a1WT g_a1zJ xs_a1zK -> case g_a1zJ (:) xs_a1zK of wild_a2aX { [] -> cycle1; : ipv_a2b3 ipv1_a2b4 -> letrec { xs'_a2b1 xs'_a2b1 = ++ wild_a2aX xs'_a2b1; } in xs'_a2b1 }  foldCycleBuild :: (a -> b -> b) -> b -> BuildArg a -> b foldCycleBuild c n g = foldr c n (cycle (build g))  produces foldCycleBuild foldCycleBuild = \ @ a_a3jl @ b_a3jm c_a3hP _ g_a3hR -> letrec { cyc_a2iD cyc_a2iD = g_a3hR c_a3hP cyc_a2iD; } in cyc_a2iD  which is obviously good, and incomparably better than what the Prelude gives, which I will not paste here because it is very obviously very much inferior. The rest of the tests are a little harder for me to interpret, because the nested letrecs confuse me, but I think they're probably good too. foldCycle c n xs = foldr c n (cycle xs)  produces $wfoldCycle
$wfoldCycle = \ @ a_a3lf @ b_a3lg w_s3mE w1_s3mG -> letrec { cyc_a2iD cyc_a2iD = letrec { go_a2GG go_a2GG = \ ds_a2GH -> case ds_a2GH of _ { [] -> cyc_a2iD; : y_a2GM ys_a2GN -> w_s3mE y_a2GM (go_a2GG ys_a2GN) }; } in go_a2GG w1_s3mG; } in cyc_a2iD foldCycle foldCycle = \ @ a_a3lf @ b_a3lg w_s3mE _ w2_s3mG ->$wfoldCycle w_s3mE w2_s3mG

foldCycleAugment :: (a -> b -> b) -> b -> BuildArg a -> [a] -> b
foldCycleAugment c n g xs = foldr c n (cycle (augment g xs))


produces

$wfoldCycleAugment$wfoldCycleAugment =
\ @ a_a3hx @ b_a3hy w_s3mD w1_s3mF w2_s3mG ->
letrec {
cyc_a2iD
cyc_a2iD =
w1_s3mF
w_s3mD
(letrec {
go_a2GG
go_a2GG =
\ ds_a2GH ->
case ds_a2GH of _ {
[] -> cyc_a2iD;
: y_a2GM ys_a2GN -> w_s3mD y_a2GM (go_a2GG ys_a2GN)
}; } in
go_a2GG w2_s3mG); } in
cyc_a2iD

foldCycleAugment
foldCycleAugment =
\ @ a_a3hx @ b_a3hy w_s3mD _ w2_s3mF w3_s3mG ->

### comment:17 in reply to:  16 Changed 5 years ago by dfeuer

Yes, it actually cycles the argument and then maps over the result—horrible in every way.

But this is well-known: You usually cannot “modify” cyclic data structures without breaking them.

Of course its cool that with the above definition of cycle + fusion, we can suddenly map over a cyclic structure without breaking it, but maybe that’s a tad too much magic? Especially as people probably don’t have a good feeling for when this happens and when not? OTOH, I don’t think it hurts either, and nice surprises are – well – nice.

Well, I agree that magic can be a problem, but if it falls out of something that tends to be good, that's not a major issue.

Did you investigate why foldr (+) 0 $take 30000000$ map (* 13) \$ cycle [1,8,4,0,(5::Int)] allocates more?

I'm not entirely sure, but it looks to me like it's all about unboxing. Somehow, the Prelude version is able to unbox all the Ints, whereas this one is not. If I replace the (+) with

{-# NOINLINE g #-}
g a b = a


then both versions run in constant space.

### comment:18 in reply to:  16 Changed 5 years ago by dfeuer

OK, thinking about this more clearly, I think I see some things I did not realize before, and I am less and less confident that any of this was a good idea. Aside from the magical map, and the pleasantness of the form when fused on both sides, I think this whole thing probably does more harm than good.

### comment:19 Changed 5 years ago by nomeata

Don’t be discouraged! I think this is very much worth it, and if people use cycle in list comprehensions (which they likely do), there might be real-world benefit in this. We just need to avoid regressions.

I tried to reproduce your findings. I do observe the higher allocation, but the accumulator Int# is still unboxed. The extra allocations seem to stem from the ys1 allocated here:

Rec {
go :: [GHC.Types.Int] -> GHC.Prim.Int# -> GHC.Types.Int
go =
\ (ds :: [GHC.Types.Int]) ->
case ds of _ {
[] -> Main.main_cyc;
: y ys ->
let {
ys1 :: GHC.Prim.Int# -> GHC.Types.Int
ys1 = go ys } in
\ (eta :: GHC.Prim.Int#) ->
case GHC.Prim.tagToEnum# (GHC.Prim.<=# eta 1) of _ {
GHC.Types.False ->
case y of _ { GHC.Types.I# x ->
case ys1 (GHC.Prim.-# eta 1) of _ { GHC.Types.I# y1 ->
GHC.Types.I# (GHC.Prim.+# (GHC.Prim.*# x 13) y1)
}
};
GHC.Types.True ->
case y of _ { GHC.Types.I# x -> GHC.Types.I# (GHC.Prim.*# x 13) }
}
}

Main.main_cyc :: GHC.Prim.Int# -> GHC.Types.Int
Main.main_cyc = go lvl9
end Rec }


This is an interesting case. It seems to be a shortcoming in the call arity analysis: One might think it should be able to infer that go is always called with two arguments (and hence move the \eta out of the let and case).

But it (rightfully) doesn’t do that because main_cyc is a thunk, and eta-expanding it would duplicate the case analysis done by case ds.

The regular arity analysis also refuses to improve that code, for a similar reason: It considers to move the \eta up, but it would escape the let ys1, and the arity analysis is unable to determine if that would be expensive or not.

Interesting case!

Last edited 5 years ago by nomeata (previous) (diff)

### comment:20 Changed 5 years ago by nomeata

Staring at this case a bit more, I conclude that we will never get good code from fusing a higher order foldr (like foldl or take or anything with an accumulating parameter) with a cyclic producer. It will sucessfully tie the know, but a knot of type Int# -> Int (and we can see this happening here). So it will create 5 function closures of that type that call each other in a round-robin style; the pattern match on the list elements happens only once.

And I as there is no way to pull the accumulating argument into the knot (after all, it changes while going round the circle), this can hardly be avoided.

### comment:21 Changed 5 years ago by dfeuer

Resolution: → invalid new → closed

Now that we've discussed this to death, I think it's time to close it up as a lesson learned.

### comment:22 Changed 5 years ago by thoughtpolice

Milestone: 7.8.4
Note: See TracTickets for help on using tickets.