Opened 4 years ago

Closed 4 years ago

# last uses too much space with optimizations disabled

Reported by: Owned by: rwbarton nomeata normal 7.10.2 Core Libraries 7.10.1 nomeata, core-libraries-committee@… Unknown/Multiple Unknown/Multiple Runtime performance bug ​Phab:D847

I know we don't worry too much about performance when optimizations are disabled, but last [1..n] using O(n) space when the naive definition would run in O(1) space is a bit too much I think.

Can we split out the new definition to a separate function and use a rule to rewrite last to that function when optimizations are enabled? I'm not expert enough on the use of rules and inlining phases etc. to write the patch myself, but I think it should be simple.

This showed up when investigating #10247

### comment:1 Changed 4 years ago by simonpj

Description: modified (diff)

### comment:2 Changed 4 years ago by simonpj

What is the "new definition"? I'm pretty hazy about what exactly you are proposing here. Could you write it out explicitly and then the Core Libraries Committee can decide.

### comment:3 Changed 4 years ago by nomeata

Owner: changed from ekmett to nomeata

When foldl started to be a good consumer, we implemented last using foldl, which is cool, but not helpful without optimization.

I’ll take care of this.

### comment:4 Changed 4 years ago by nomeata

Differential Rev(s): → Phab:D844 new → patch

Put this up at Phab:D844 and will push this once it validated.

### comment:5 follow-up:  10 Changed 4 years ago by nomeata

@rwbarton, just to make sure I don’t get you wrong: Are you talking about compiling a "normal" program using last, using a normally build compiler, or are you talking about compiling the base libraries without optimization, e.g. during GHC development?

### comment:6 Changed 4 years ago by nomeata

I had a closer look, and the problem is the following:

last = foldl (\_ x -> x) (error "..."))


to GHC.List in #9339, we assumed (and I thought we checked, but maybe that not true) that GHC would optimize the core will look something like this:

last :: forall a_ask. [a_ask] -> a_ask
[GblId,
Arity=1,
Str=DmdType,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 30 60}
Tmpl=\ (@ a_aTE) -> foldl @ a_aTE @ a_aTE (GHC.List.last2 @ a_aTE) (GHC.List.last1 @ a_aTE)]
last = .. efficient implementation derived from foldl ... ..


This way, when using last in code compiled without -O, the efficient variant would be called, while with -O the unfolding would be used and optimized on the spot, possibly fusing, and producing good code if Call Arity kicks in.

Unfortunately, this is what we see:

last :: forall a_ask. [a_ask] -> a_ask
[GblId,
Arity=1,
Str=DmdType,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 30 60}]
last =
\ (@ a_aTE) ->
foldl
@ a_aTE @ a_aTE (GHC.List.last2 @ a_aTE) (GHC.List.last1 @ a_aTE)


Now the question is: Why is the code not optimized in GHC.List? My guess is that we should have written

last xs = foldl (\_ x -> x) (error "...")) xs


And indeed, this way, we get:

Rec {
GHC.List.last2 [Occ=LoopBreaker]
:: forall a_aTF. [a_aTF] -> a_aTF -> a_aTF
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <S,1*U><L,1*U>]
GHC.List.last2 =
\ (@ a_aTF) (ds_a1Es :: [a_aTF]) (eta_B1 :: a_aTF) ->
case ds_a1Es of _ [Occ=Dead] {
[] -> eta_B1;
: y_a1Ex ys_a1Ey -> GHC.List.last2 @ a_aTF ys_a1Ey y_a1Ex
}
end Rec }

[GblId,
Arity=1,
Str=DmdType <S,1*U>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 30 0}]
last =
\ (@ a_aTF) (xs_ast :: [a_aTF]) ->
GHC.List.last2 @ a_aTF xs_ast (GHC.List.last1 @ a_aTF)


which his nice, but now we lost the original definition.

Next try, also adding {-# INLINEABLE last #-}, which yields.

Rec {
poly_go_r2F3 :: forall a_aTF. [a_aTF] -> a_aTF -> a_aTF
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <S,1*U><L,1*U>]
poly_go_r2F3 =
\ (@ a_aTF) (ds_a1Es :: [a_aTF]) (eta_B1 :: a_aTF) ->
case ds_a1Es of _ [Occ=Dead] {
[] -> eta_B1;
: y_a1Ex ys_a1Ey -> poly_go_r2F3 @ a_aTF ys_a1Ey y_a1Ex
}
end Rec }

[GblId,
Arity=1,
Str=DmdType <S,1*U>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 160 0
Tmpl= \ (@ a_aTF) (xs_ast [Occ=Once] :: [a_aTF]) ->
foldr
@ a_aTF
@ (a_aTF -> a_aTF)
(\ (ds_d1DA [Occ=Once] :: a_aTF)
(ds1_d1DB [Occ=Once!, OS=OneShot] :: a_aTF -> a_aTF)
ds1_d1DB ds_d1DA)
(id @ a_aTF)
xs_ast
(errorEmptyList
@ a_aTF
(build
@ Char (\ (@ b_a1Fd) -> unpackFoldrCString# @ b_a1Fd "last"#)))}]
last =
\ (@ a_aTF) (xs_ast :: [a_aTF]) ->
poly_go_r2F3 @ a_aTF xs_ast (lvl8_r2F2 @ a_aTF)


That looks better. The unfolding is a bit more complicated, as foldl is being lined, but that is presumably ok.

I’ll create a pull request with this in a small while, for review and discussion.

### comment:7 Changed 4 years ago by nomeata

INLINEABLE does not work; probably because the unfolding looks too large. But INLINE seems to work. So with this patch

-last = foldl (\_ x -> x) (errorEmptyList "last")
+last xs = foldl (\_ x -> x) lastError xs
+{-# INLINE last #-}
+lastError = errorEmptyList "last"


we have

• that GHC.List.last is used without -O, with the desired memory behaviour,
• with -O, last gets inlined and fusion can happen.

(lastError pulled out to not copy that wherever last is used).

### comment:8 Changed 4 years ago by nomeata

Differential Rev(s): Phab:D844 → Phab:D846

Second try: Phab:D846

### comment:9 Changed 4 years ago by nomeata

Differential Rev(s): Phab:D846 → Phab:D847

I mean Phab:D847

### comment:10 in reply to:  5 Changed 4 years ago by rwbarton

@rwbarton, just to make sure I don’t get you wrong: Are you talking about compiling a "normal" program using last, using a normally build compiler, or are you talking about compiling the base libraries without optimization, e.g. during GHC development?

I mean using a compiler built with "perf", e.g. the 7.10.1 release, and compiling the user program without optimization.

### comment:11 Changed 4 years ago by simonpj

Re comment:6 I too was surprised that last was not eta-expanded. But see this note in SimplUtils:

Note [Do not eta-expand PAPs]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We used to have old_arity = manifestArity rhs, which meant that we
would eta-expand even PAPs.  But this gives no particular advantage,
and can lead to a massive blow-up in code size, exhibited by Trac #9020.
Suppose we have a PAP
foo :: IO ()
foo = returnIO ()
Then we can eta-expand do
foo = (\eta. (returnIO () |> sym g) eta) |> g
where
g :: IO () ~ State# RealWorld -> (# State# RealWorld, () #)

But there is really no point in doing this, and it generates masses of
coercions and whatnot that eventually disappear again. For T9020, GHC
allocated 6.6G beore, and 0.8G afterwards; and residency dropped from
1.8G to 45M.

But note that this won't eta-expand, say
f = \g -> map g
Does it matter not eta-expanding such functions?  I'm not sure.  Perhaps
strictness analysis will have less to bite on?


The worse code for the un-eta-expanded code for last is perhaps an example of the last paragraph!

And yet the comment is reasonably convincing. I'm not sure what to do here.

Joachim, one thing that might be worth trying is to eta-expand as far as poss without bumping into a coercion. That would expand last but not foo. Want to try that? We may fix last but there are sure to be other cases, and the more robust the optimisations the better.

### comment:12 Changed 4 years ago by nomeata

And yet the comment is reasonably convincing. I'm not sure what to do here.

I think we are doing the right thing here. As far as I know, we also take the number of arguments given by the user as an indication when things should inline or not; if we eta-expand PAPs, we take this possibility of control away from the user.

In this particular case: What if I had written last this way intentional, so that the rule for foldl would _not_ fire here, but only after last was inlined at a position where it is passed an argument?

(Or am I talking non-sense if we are talking about PAPs, in contrast to functions that might be eta-expanded more.)

### comment:13 Changed 4 years ago by simonpj

Well all eta-expansion risks what you say. If the user wants hat kind of control s/he can use INLINE/INLINEABLE. The eta expansion doesn't happen for the template that is inlined; only in the RHS that is compiled. Which is good. So I do think we should eta-expand here. The only reason not to is described in the comment. I think.

### comment:14 Changed 4 years ago by nomeata

Ah, right, the template is stored before this happens; good.

What is our motivation to eta-expand PAPs? In this case, it is to allow for RULES to fire. Is it not likely that in order for RULES to fire, we might have to eta-expand beyond a coercion? So it does not gain a lot in robustness.

I’m inclined to wait for more than just this example to give this a shot, after all, implementing the heuristics (“if it is a PAP, and there is a chance to eta-expand, but only up to the next coercion...”) does not really make the code simpler...

### comment:15 Changed 4 years ago by simonpj

Maybe. But eta expansion is in general a tremendously good thing. I'd love just to try it.

### comment:16 Changed 4 years ago by nomeata

Maybe. But eta expansion is in general a tremendously good thing. I'd love just to try it.

Looks like I forgot to cross-reference the ticket where I started working on this: #10319

### comment:17 Changed 4 years ago by Joachim Breitner <mail@…>

Make sure GHC.List.last is memory-efficient

by eta-expanding its definition so that GHC optmizes the foldl here.
Also make sure that other uses of last go via foldl as well, to allow
list fusion (tested in T9339). Fixes #10260.

### comment:18 Changed 4 years ago by nomeata

Resolution: → fixed patch → closed

Fixing this, as last works as intended now. Further discussion about a more general solution can happen at #10319.

### comment:19 Changed 4 years ago by rwbarton

Status: closed → merge

This should be merged to 7.10 right?

### comment:20 Changed 4 years ago by nomeata

Probably. I’m always unsure about the policy here.

### comment:21 Changed 4 years ago by thoughtpolice

Milestone: → 7.10.2 merge → closed

Merged to ghc-7.10.

Note: See TracTickets for help on using tickets.