Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#9398 closed bug (invalid)

Data.List.cycle is not a good producer

Reported by: dfeuer Owned by:
Priority: normal Milestone:
Component: libraries/base Version: 7.8.3
Keywords: Cc: hvr, ekmett
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by dfeuer)

Data.List.cycle is not a good producer. I believe the following fixes it. The tests I've profiled so far suggest it does so.

{-# INLINE cycle #-}
cycle :: [a] -> [a]
cycle [] = error "Empty cycle."
cycle xs = let cyc = augment cycle' cyc in cyc
  where cycle' c n = foldr c n xs 

Change History (22)

comment:1 Changed 5 years ago by dfeuer

Resolution: invalid
Status: newclosed

No, not ready yet. There are some more subtleties.

comment:2 Changed 5 years ago by dfeuer

Description: modified (diff)
Resolution: invalid
Status: closednew

comment:3 Changed 5 years ago by nomeata

Remember that GHC doesn’t inline recursive things (such as cyc).

Also, I don’t think your cycle is a good consumer: To inline it, you need to know that its argument is a :. But to make it a good consumer, the argument needs to be of the shape build ....

comment:4 in reply to:  3 Changed 5 years ago by dfeuer

Replying to nomeata:

Remember that GHC doesn’t inline recursive things (such as cyc).

Also, I don’t think your cycle is a good consumer: To inline it, you need to know that its argument is a :. But to make it a good consumer, the argument needs to be of the shape build ....

I suppose I should (try to) find a way to rewrite it to another form if it doesn't get eaten by foldr. I'm not understanding your concern about the consumption side. cyc isn't inlined, but the enclosing cycle is. Shouldn't that be good enough to allow the foldr in cycle' to fuse with a build or augment in xs?

comment:5 Changed 5 years ago by nomeata

But note that you are pattern matching on xs!

comment:6 in reply to:  5 Changed 5 years ago by dfeuer

Replying to nomeata:

But note that you are pattern matching on xs!

Ah, I see what you mean. That pattern match, however, is not essential. Suppose we leave it out. Then we get

cycle [] = let cyc = augment cycle' cyc in cyc
  where
    cycle' c n = foldr c n [] 

foldr/nil gives

cycle [] = let cyc = augment (\c n -> []) cyc in cyc 

Then applying augment:

cycle [] = let cyc = [] in cyc

So we've turned _|_ into [], whereas a similar omission from the current definition of cycle would turn an error into a much-less-desirable non-termination. This is not a disaster, and in fact it makes sense from a monoid perspective.

comment:7 Changed 5 years ago by nomeata

I think you meant

cycle [] = let cyc = augment cycle' cyc in cyc
  where
    cycle' c n = foldr c n []

cycle [] = let cyc = augment (\c n -> n) cyc in cyc

cycle [] = let cyc = cyc in cyc

cycle [] = 

so you turned a helpful error message into nontermination

comment:8 in reply to:  7 Changed 5 years ago by dfeuer

Replying to nomeata:

I think you meant

cycle [] = let cyc = augment cycle' cyc in cyc
  where
    cycle' c n = foldr c n []

cycle [] = let cyc = augment (\c n -> n) cyc in cyc

I don't think so. Substituting [] for xs, cycle' c n = foldr c n [] = [], so cycle' = \c n -> [].

comment:9 Changed 5 years ago by nomeata

foldr c n [] = n!

comment:10 in reply to:  9 Changed 5 years ago by dfeuer

Replying to nomeata:

foldr c n [] = n!

I am a fool! That said, we could make an optimization flag, and maybe include it in -O2.

comment:11 Changed 5 years ago by nomeata

Not sure what you mean. An optimization flag that changes whether cycle [] is an error or nontermination? I don’t think this is a good idea.

Anyways, as you said: It is more important that cycle is a good producer. I don’t see how an enclosing foldr would get in touch with the augment in let cyc = augment cycle' cyc in cyc.

comment:12 Changed 5 years ago by nomeata

I think you need to move the knot-tying into the argument to build, e.g. something like

cycle xs = build $ \c _ ->  let cyc = foldr c cyc xs in cyc

(ignoring the issue of cycle [] for now)

It seems that this would be both a good produce, and possibly even a good consumer.

comment:13 in reply to:  12 Changed 5 years ago by dfeuer

Replying to nomeata:

I think you need to move the knot-tying into the argument to build, e.g. something like

cycle xs = build $ \c _ ->  let cyc = foldr c cyc xs in cyc

(ignoring the issue of cycle [] for now)

It seems that this would be both a good producer, and possibly even a good consumer.

I won't be able to test anything for some hours, but that does look very promising indeed. I wonder if GHC performed some transformation that turned mine into yours somehow, but yours is definitely much clearer and prettier in any case. I think it's a good producer, and a good consumer for build. It doesn't look like a perfect consumer for augment (although it's not a bad one), but that may be unavoidable. I believe cycle is one of the more popular list functions in production code, so personally I think it's probably worth giving up the error message on a null argument to buy a little performance, even on the consumption side, but it's also true that the production side is more important in general.

comment:14 Changed 5 years ago by isaacdupree

In compiled code, the latter definition throws <<loop>>, which is nice. Try runghc test.hs (nontermination) vs ghc test.hs && ./test (exception):

import GHC.Exts

cycle2 xs = build $ \c _ ->  let cyc = foldr c cyc xs in cyc

main = print (cycle2 [] !! 0 :: Int)

comment:15 in reply to:  12 Changed 5 years ago by dfeuer

Replying to nomeata: I wrote some test functions, using the type

type BuildArg a = forall b . (a -> b -> b) -> b -> b

Generally things look pretty good (lots of Core to look at below, along with comments you should take with a grain of salt since I'm just a newbie). I found one case where your definition works significantly worse than the Prelude's. The bad case I've found:

main = print $ foldr (+) 0 $ take 30000000 $ map (* 13) $ cycle [1,8,4,0,(5::Int)]

For some reason, your cycle implementation allocates a lot, but the Prelude one runs in constant space.

The basic test translations:

cycleBuild :: BuildArg a -> [a]
cycleBuild g = cycle (build g)

produces

cycleBuild
cycleBuild =
  \ @ a_a3jG g_a3hO ->
    letrec {
      cyc_a2iD
      cyc_a2iD = g_a3hO (:) cyc_a2iD; } in
    cyc_a2iD

which is obviously good, and obviously better than the result with Prelude.cycle, which is

cycleBuild
cycleBuild =
  \ @ a_a1Xy g_a1zF ->
    case g_a1zF (:) ([]) of wild_a2aX {
      [] -> cycle1;
      : ipv_a2b3 ipv1_a2b4 ->
        letrec {
          xs'_a2b1
          xs'_a2b1 = ++ wild_a2aX xs'_a2b1; } in
        xs'_a2b1
    }
cycleAugment :: BuildArg a -> [a] -> [a]
cycleAugment g xs = cycle (augment g xs)

produces

cycleAugment
cycleAugment =
  \ @ a_a3j1 g_a3hS xs_a3hT ->
    letrec {
      cyc_a2iD
      cyc_a2iD = g_a3hS (:) (++ xs_a3hT cyc_a2iD); } in
    cyc_a2iD

which I think is likely the best we can do, and clearly better than what the Prelude gives:

cycleAugment
cycleAugment =
  \ @ a_a1WT g_a1zJ xs_a1zK ->
    case g_a1zJ (:) xs_a1zK of wild_a2aX {
      [] -> cycle1;
      : ipv_a2b3 ipv1_a2b4 ->
        letrec {
          xs'_a2b1
          xs'_a2b1 = ++ wild_a2aX xs'_a2b1; } in
        xs'_a2b1
    }
foldCycleBuild :: (a -> b -> b) -> b -> BuildArg a -> b
foldCycleBuild c n g = foldr c n (cycle (build g))

produces

foldCycleBuild
foldCycleBuild =
  \ @ a_a3jl @ b_a3jm c_a3hP _ g_a3hR ->
    letrec {
      cyc_a2iD
      cyc_a2iD = g_a3hR c_a3hP cyc_a2iD; } in
    cyc_a2iD

which is obviously good, and incomparably better than what the Prelude gives, which I will not paste here because it is very obviously very much inferior.

The rest of the tests are a little harder for me to interpret, because the nested letrecs confuse me, but I think they're probably good too.

foldCycle c n xs = foldr c n (cycle xs)

produces

$wfoldCycle
$wfoldCycle =
  \ @ a_a3lf @ b_a3lg w_s3mE w1_s3mG ->
    letrec {
      cyc_a2iD
      cyc_a2iD =
        letrec {
          go_a2GG
          go_a2GG =
            \ ds_a2GH ->
              case ds_a2GH of _ {
                [] -> cyc_a2iD;
                : y_a2GM ys_a2GN -> w_s3mE y_a2GM (go_a2GG ys_a2GN)
              }; } in
        go_a2GG w1_s3mG; } in
    cyc_a2iD

foldCycle
foldCycle =
  \ @ a_a3lf @ b_a3lg w_s3mE _ w2_s3mG -> $wfoldCycle w_s3mE w2_s3mG
foldCycleAugment :: (a -> b -> b) -> b -> BuildArg a -> [a] -> b
foldCycleAugment c n g xs = foldr c n (cycle (augment g xs))

produces

$wfoldCycleAugment
$wfoldCycleAugment =
  \ @ a_a3hx @ b_a3hy w_s3mD w1_s3mF w2_s3mG ->
    letrec {
      cyc_a2iD
      cyc_a2iD =
        w1_s3mF
          w_s3mD
          (letrec {
             go_a2GG
             go_a2GG =
               \ ds_a2GH ->
                 case ds_a2GH of _ {
                   [] -> cyc_a2iD;
                   : y_a2GM ys_a2GN -> w_s3mD y_a2GM (go_a2GG ys_a2GN)
                 }; } in
           go_a2GG w2_s3mG); } in
    cyc_a2iD

foldCycleAugment
foldCycleAugment =
  \ @ a_a3hx @ b_a3hy w_s3mD _ w2_s3mF w3_s3mG ->
    $wfoldCycleAugment w_s3mD w2_s3mF w3_s3mG

which I have no idea what to think of, honestly.

mapCycle f xs = map f (cycle xs)

produces

mapCycle
mapCycle =
  \ @ a_a3lA @ b_a3lB f_a3hJ xs_a3hK ->
    letrec {
      cyc_a2iD
      cyc_a2iD =
        letrec {
          go_a2GG
          go_a2GG =
            \ ds_a2GH ->
              case ds_a2GH of _ {
                [] -> cyc_a2iD;
                : y_a2GM ys_a2GN -> : (f_a3hJ y_a2GM) (go_a2GG ys_a2GN)
              }; } in
        go_a2GG xs_a3hK; } in
    cyc_a2iD

which looks sane enough—it just maps over xs and then cycles the result. I also took a look at

cycleMap f xs = cycle (map f xs)

This produced exactly the same Core, which I think is a positive sign.

The Prelude implementation, on the other hand, gives a rather terrible result:

mapCycle
mapCycle =
  \ @ a_a2aw @ b_a2ax f_a1x7 xs_a1x8 ->
    case xs_a1x8 of wild_a2aV {
      [] -> case cycle1 of wild1_00 { };
      : ipv_a2b1 ipv1_a2b2 ->
        letrec {
          xs'_a2aZ
          xs'_a2aZ = ++ wild_a2aV xs'_a2aZ; } in
        map f_a1x7 xs'_a2aZ
    }

Yes, it actually cycles the argument and then maps over the result—horrible in every way.

comment:16 Changed 5 years ago by nomeata

Yes, it actually cycles the argument and then maps over the result—horrible in every way.

But this is well-known: You usually cannot “modify” cyclic data structures without breaking them.

Of course its cool that with the above definition of cycle + fusion, we can suddenly map over a cyclic structure without breaking it, but maybe that’s a tad too much magic? Especially as people probably don’t have a good feeling for when this happens and when not? OTOH, I don’t think it hurts either, and nice surprises are – well – nice.

Did you investigate why foldr (+) 0 $ take 30000000 $ map (* 13) $ cycle [1,8,4,0,(5::Int)] allocates more?

comment:17 in reply to:  16 Changed 5 years ago by dfeuer

Replying to nomeata:

Yes, it actually cycles the argument and then maps over the result—horrible in every way.

But this is well-known: You usually cannot “modify” cyclic data structures without breaking them.

Of course its cool that with the above definition of cycle + fusion, we can suddenly map over a cyclic structure without breaking it, but maybe that’s a tad too much magic? Especially as people probably don’t have a good feeling for when this happens and when not? OTOH, I don’t think it hurts either, and nice surprises are – well – nice.

Well, I agree that magic can be a problem, but if it falls out of something that tends to be good, that's not a major issue.

Did you investigate why foldr (+) 0 $ take 30000000 $ map (* 13) $ cycle [1,8,4,0,(5::Int)] allocates more?

I'm not entirely sure, but it looks to me like it's all about unboxing. Somehow, the Prelude version is able to unbox all the Ints, whereas this one is not. If I replace the (+) with

{-# NOINLINE g #-}
g a b = a 

then both versions run in constant space.

comment:18 in reply to:  16 Changed 5 years ago by dfeuer

Replying to nomeata:

OK, thinking about this more clearly, I think I see some things I did not realize before, and I am less and less confident that any of this was a good idea. Aside from the magical map, and the pleasantness of the form when fused on both sides, I think this whole thing probably does more harm than good.

comment:19 Changed 5 years ago by nomeata

Don’t be discouraged! I think this is very much worth it, and if people use cycle in list comprehensions (which they likely do), there might be real-world benefit in this. We just need to avoid regressions.

I tried to reproduce your findings. I do observe the higher allocation, but the accumulator Int# is still unboxed. The extra allocations seem to stem from the ys1 allocated here:

Rec {
go :: [GHC.Types.Int] -> GHC.Prim.Int# -> GHC.Types.Int
go =
  \ (ds :: [GHC.Types.Int]) ->
    case ds of _ {
      [] -> Main.main_cyc;
      : y ys ->
        let {
          ys1 :: GHC.Prim.Int# -> GHC.Types.Int
          ys1 = go ys } in
        \ (eta :: GHC.Prim.Int#) ->
          case GHC.Prim.tagToEnum# (GHC.Prim.<=# eta 1) of _ {
            GHC.Types.False ->
              case y of _ { GHC.Types.I# x ->
              case ys1 (GHC.Prim.-# eta 1) of _ { GHC.Types.I# y1 ->
              GHC.Types.I# (GHC.Prim.+# (GHC.Prim.*# x 13) y1)
              }
              };
            GHC.Types.True ->
              case y of _ { GHC.Types.I# x -> GHC.Types.I# (GHC.Prim.*# x 13) }
          }
    }

Main.main_cyc :: GHC.Prim.Int# -> GHC.Types.Int
Main.main_cyc = go lvl9
end Rec }

This is an interesting case. It seems to be a shortcoming in the call arity analysis: One might think it should be able to infer that go is always called with two arguments (and hence move the \eta out of the let and case).

But it (rightfully) doesn’t do that because main_cyc is a thunk, and eta-expanding it would duplicate the case analysis done by case ds.

The regular arity analysis also refuses to improve that code, for a similar reason: It considers to move the \eta up, but it would escape the let ys1, and the arity analysis is unable to determine if that would be expensive or not.

Interesting case!

Last edited 5 years ago by nomeata (previous) (diff)

comment:20 Changed 5 years ago by nomeata

Staring at this case a bit more, I conclude that we will never get good code from fusing a higher order foldr (like foldl or take or anything with an accumulating parameter) with a cyclic producer. It will sucessfully tie the know, but a knot of type Int# -> Int (and we can see this happening here). So it will create 5 function closures of that type that call each other in a round-robin style; the pattern match on the list elements happens only once.

And I as there is no way to pull the accumulating argument into the knot (after all, it changes while going round the circle), this can hardly be avoided.

comment:21 Changed 5 years ago by dfeuer

Resolution: invalid
Status: newclosed

Now that we've discussed this to death, I think it's time to close it up as a lesson learned.

comment:22 Changed 5 years ago by thoughtpolice

Milestone: 7.8.4
Note: See TracTickets for help on using tickets.