Opened 7 years ago

# GHC doesn't optimize (strict) composition with id

Reported by: Owned by: shachaf simonpj normal Compiler 7.6.1 lelf, ekmett@…, jwlato@…, hackage.haskell.org@…, pho@…, dfeuer, wren@… Unknown/Multiple Unknown/Multiple Runtime performance bug

### Description

Newtype constructors and selectors have no runtime overhead, but some uses of them do. For example, given newtype Identity a = Identity { runIdentity :: a }, Identity turns into id, but Identity . f turns into id . f, which is distinct from f, because it gets eta-expanded to \x -> f x.

It would be nice to be able to compose a newtype constructor with a function without any overhead. The obvious thing to try is strict composition:

(#) :: (b -> c) -> (a -> b) -> a -> c
(#) f g = f seq g seq \x -> f (g x)


In theory this should get rid of the eta-expansion. In practice, the generated Core looks like this:

foo :: (a -> b) -> [a] -> [b]
foo f = map (id # f)
-- becomes
foo = \f e -> map (case f of g { __DEFAULT -> \x -> g x }) e


Different variations of (#), and turning -fpedantic-bottoms on, don't seem to affect this. A simpler version, foo f = map (f seq \x -> f x), generates the same sort of Core.

In one library we resorted to defining a bunch of functions of the form identityDot :: (a -> b) -> a -> Identity b; identityDot = unsafeCoerce. It would be better to be able to rely on GHC to do the optimization directly, if we use strict composition anyway.

### comment:1 Changed 7 years ago by simonpj

difficulty: → Unknown new → infoneeded

Can you give a concrete example? With this module

module T7542 where

newtype Id a = MkId a

f1 = map reverse

f2 = map (MkId . reverse)


compiled with ghc-7.6 -O -ddump-stg I get

==================== STG syntax: ====================

T7542.f1 :: forall a_afy. [[a_afy]] -> [[a_afy]]
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
\r [eta_B1] GHC.Base.map GHC.List.reverse eta_B1;
SRT(T7542.f1): []

T7542.f2 :: forall a_afr. [[a_afr]] -> [T7542.Id [a_afr]]
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
\r [eta_B1] GHC.Base.map GHC.List.reverse eta_B1;
SRT(T7542.f2): []


which looks fine to me.

### comment:2 Changed 7 years ago by shachaf

Status: infoneeded → new

Here's an example of the sort of context this comes up in:

module T7542 where

import Unsafe.Coerce

newtype Id a = MkId { unId :: a }

-- Think of mapped as mapM, but restricted to Id (we could make it work
-- with any Functor, rather than just []). over takes the Id wrappers back
-- off. The goal is to make it easy to compose mapped with other functions of
-- the same form. The wrapper should be "free" because it's just newtype noise.

mapped1 :: (a -> Id b) -> [a] -> Id [b]
mapped1 f = MkId . map (unId . f)

over1 :: ((a -> Id b) -> s -> Id t) -> (a -> b) -> s -> t
over1 l f = unId . l (MkId . f)

map1 :: (a -> b) -> [a] -> [b]
map1 f xs = over1 mapped1 f xs
-- Core: map1 = \f xs -> map (\x -> f x) xs

-- over1 mapped1 = unId . MkId . map (unId . MkId . f)
--               ~ map
-- However, if f = ⊥, unId . MkId . f /= f!
-- Therefore over1 mapped1 must turn into \f -> map (\x -> f x)
-- We can't expect GHC to compile it to map because it has different strictness.

-- Let's define strict versions of (MkId .) and (unId .):
mkIdDot2 :: (a -> b) -> a -> Id b
mkIdDot2 f = f seq \x -> MkId (f x)

unIdDot2 :: (a -> Id b) -> a -> b
unIdDot2 f = f seq \x -> unId (f x)

mapped2 :: (a -> Id b) -> [a] -> Id [b]
mapped2 f = mkIdDot2 (map (unIdDot2 f))

over2 :: ((a -> Id b) -> s -> Id t) -> (a -> b) -> s -> t
over2 l f = unIdDot2 (l (mkIdDot2 f))

map2 :: (a -> b) -> [a] -> [b]
map2 f xs = over2 mapped2 f xs
-- map2 should have the same semantics as map. But the Core isn't the same:
-- Without -fpedantic-bottoms: map2 = \f xs -> map (\e -> f e) xs
-- With -fpedantic-bottoms:
-- map2 = \f xs -> map (case f of g { __DEFAULT -> \x -> g x }) xs
-- Ideally, (case f of g { __DEFAULT -> \x -> g x }) would simply be f.

-- Let's try manually telling GHC that our newtype compositions are coercions:
-- (Ideally, this is what mkIdDot2 and unIdDot2 would compile into.)
mkIdDot3 :: (a -> b) -> a -> Id b
mkIdDot3 = unsafeCoerce

unIdDot3 :: (a -> Id b) -> a -> b
unIdDot3 = unsafeCoerce
-- (Note: Due to #7398, we couldn't define a strict composition operator and
-- rely on RULES to turn (MkId dot) into unsafeCoerce -- the MkId itself
-- gets turned into a coercion before any RULES have a chance to fire.)

mapped3 :: (a -> Id b) -> [a] -> Id [b]
mapped3 f = mkIdDot3 (map (unIdDot3 f))

over3 :: ((a -> Id b) -> s -> Id t) -> (a -> b) -> s -> t
over3 l f = unIdDot3 (l (mkIdDot3 f))

map3 :: (a -> b) -> [a] -> [b]
map3 f xs = over3 mapped3 f xs
-- Core: map3 = map


### comment:7 Changed 7 years ago by simonpj@…

Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Tue Jan 22 22:46:33 2013 +0000

Allow CaseElim if the case binder is the next thing to be eval'd

This makes CaseElim happen a bit more often.
See Note [Case binder next] in Simplify.
This came up when investigating Trac #7542.

compiler/simplCore/Simplify.lhs |   32 ++++++++++++++++++++++++++------
1 files changed, 26 insertions(+), 6 deletions(-)


### comment:8 Changed 7 years ago by simonpj

Also this:

commit 7a1480c7c590d4d2fa7a105a4eebf299e921e056
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Tue Jan 22 22:43:22 2013 +0000

Allow eta-reduction of eval'd functions if of arity 1

See Note [Eta reduction of an eval'd function] in CoreUtils.
This doesn't fix Trac #7542, but that was the ticket that
pointed out this infelicity.

>---------------------------------------------------------------

compiler/coreSyn/CoreUtils.lhs |   24 ++++++++++++++++++++++--
1 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/compiler/coreSyn/CoreUtils.lhs b/compiler/coreSyn/CoreUtils.lhs index 7017f70..9b527e7 100644
--- a/compiler/coreSyn/CoreUtils.lhs
+++ b/compiler/coreSyn/CoreUtils.lhs
@@ -1712,8 +1712,14 @@ tryEtaReduce bndrs body

---------------
fun_arity fun             -- See Note [Arity care]
-       | isLocalId fun && isStrongLoopBreaker (idOccInfo fun) = 0
-       | otherwise = idArity fun
+       | isLocalId fun
+       , isStrongLoopBreaker (idOccInfo fun) = 0
+       | arity > 0                           = arity
+       | isEvaldUnfolding (idUnfolding fun)  = 1
+            -- See Note [Eta reduction of an eval'd function]
+       | otherwise                           = 0
+       where
+         arity = idArity fun

---------------
ok_lam v = isTyVar v || isEvVar v
@@ -1737,6 +1743,20 @@ tryEtaReduce bndrs body
ok_arg _ _ _ = Nothing
\end{code}

+Note [Eta reduction of an eval'd function]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In Haskell is is not true that    f = \x. f x
+because f might be bottom, and 'seq' can distinguish them.
+
+But it *is* true that   f = f seq \x. f x
+and we'd like to simplify the latter to the former.  This amounts to
+the rule that
+  * when there is just *one* value argument,
+  * f is not bottom
+we can eta-reduce    \x. f x  ===>  f
+
+This turned up in Trac #7542.


### comment:9 Changed 7 years ago by simonpj

The previous two commits improve the situation a bit; both were triggered by looking at the code from this example, thanks.

I'm far from sure that we get perfect code now. But I'm also convinced that this is the wrong way to solve this problem: check out NewtypeWrappers.

Meanwhile do have a go with HEAD and see if you get better code now.

Simon

### comment:10 Changed 7 years ago by igloo

Milestone: → 7.8.1 set to simonpj

Simon, should this ticket be closed now?

### comment:11 Changed 6 years ago by nomeata

The simple example from the ticket:

(#) :: (b -> c) -> (a -> b) -> a -> c
(#) f g = f seq g seq \x -> f (g x)

foo :: (a -> b) -> [a] -> [b]
foo f = map (id # f)


produces

T7542.foo =
\ (@ a) (@ b) (f :: a -> b) (eta :: [a]) ->
GHC.Base.map @ a @ b (\ (eta1 :: a) -> f eta1) eta


without -fpedantic-bottoms and

T7542.foo = GHC.Base.map


with, which looks fine. The same happens to the more elaborate example in comment:2 (middle section), -fpedantic-bottoms preserves the semantics and with that we get map2 = map.

So it seems that situation has improved and we can close the ticket (which should not discourage anyone who finds a situation that needs fixing to open it again).

### comment:12 Changed 6 years ago by ekmett

@nomeata:

As I understand the situation, the fact that this optimization isn't happening without -fpedantic-bottoms means I'm basically stuck using the existing hacks in lens.

The code in question is set up to INLINE in library into user code, and they very likely aren't compiling with that flag! (Unless I'm misunderstanding how and where that gets tracked and applid.)

We have 3 cases to consider, the existing unsafeCoerce hack, the naive eta expansion we're getting without -fpedantic-bottoms, and the version without the eta expansion we can produce with -fpedantic-bottoms. It appears that the first and the third scenarios are now the same, but the difference in performance between the existing unsafeCoerce hack and the eta expanded code can be asymptotic, if this is closed as is, the fix does me no good, and I'll be left using the existing coercions to ensure a good user experience. =/

### comment:13 Changed 6 years ago by nomeata

@ekmett: Ok, thanks for the input. So clearly it should _not_ be closed then.

So what you’d like to see is to have GHC behave here as if -fpedantic-bottoms is on always.

Obviously, one could make that flag the default, but there must be reasons why it is not the default.

What I find surprising: The core generated for

mkIdDot2 :: (a -> b) -> a -> Id b
mkIdDot2 f = f seq \x -> MkId (f x)


is

mkIdDot2 = (λ f. f) |> (some cast involving Id))


which is the same core that we get with mkIdDot2 = coerce (hey, no unsafe!), but the latter produces the desired map2 = map... hard to see (without knowing the simplifier by heart) where this goes wrong.

### comment:14 Changed 6 years ago by nomeata

Ah, interesting. The problem is not mkIdDot2 (that has Arity 1 allright), the problem is unIdDot2, which suddenly as artiy 2!

T7542.unIdDot2
:: forall a_az8 b_az9. (a_az8 -> T7542.Id b_az9) -> a_az8 -> b_az9
[LclIdX,
Arity=2,
Str=DmdType,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
ConLike=True, WorkFree=True, Expandable=True,
Guidance=IF_ARGS [20 0] 50 0}]
T7542.unIdDot2 =
\ (@ a_aKU)
(@ b_aKV)
(f_aze :: a_aKU -> T7542.Id b_aKV)
(eta_B1 :: a_aKU) ->
case f_aze of f_Xzl { __DEFAULT ->
T7542.unId @ b_aKV (f_Xzl eta_B1)
}


(This is already after InitialPhase [Gentle]). One would expect MkId and unId to behave the same...

### comment:15 Changed 6 years ago by nomeata

Tracing this back I found the Note [Dealing with bottom] in CoreArity, where the arity of case branches is used to calculate the arity of a case. Quote:

We ignore this "problem" (unless -fpedantic-bottoms is on), because being scrupulous would lose an important transformation for many programs. (See Trac #5587 for an example.)

So the next step would be answering: Can we distinguish between cases where this dodgy optimization is useful and required, and cases where it hurts?

### comment:16 Changed 6 years ago by thoughtpolice

Milestone: 7.8.3 → 7.10.1

Moving to 7.10.1

### comment:18 Changed 5 years ago by thoughtpolice

Milestone: 7.10.1 → 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

### comment:19 Changed 4 years ago by thoughtpolice

Milestone: 7.12.1 → 8.0.1

Milestone renamed

### comment:21 Changed 4 years ago by thomie

Milestone: 8.0.1

### comment:23 Changed 2 years ago by spacekitteh

Does this still occur in GHC 8.2?

### comment:24 Changed 13 months ago by chessai

I'm using ghc 8.4.3, and compiling your example exactly with -O2 produces no difference whether i compile with '-fpedantic-bottoms' or '-fno-pedantic-bottoms' produce the same core.

module Foo (foo) where

(#) :: (b -> c) -> (a -> b) -> a -> c
(#) f g = f seq g seq \x -> f (g x)

foo :: (a -> b) -> [a] -> [b]
foo f = map (id # f)


the core shows foo = map.

This is in contrast to @nomeata's comment from 5 years ago, where compilation with -fpedantic-bottoms is necessary.

### comment:25 Changed 13 months ago by chessai

It also produces the same core with -O1.

With -O0, i see the problem, but i would expect that.

### comment:26 Changed 13 months ago by dfeuer

Examples using map are important, but they also need to be taken with a grain of salt. List fusion will tear those things apart and put them together again and that can eliminate some problems we want to look at.

### comment:27 Changed 13 months ago by chessai

Ah, you're right.

look at this:

module Foo (foo) where

data List a = Nil | Cons a (List a)

mapList :: (a -> b) -> List a -> List b
mapList f Nil = Nil
mapList f (Cons a l) = Cons (f a) (mapList f l)

(#) :: (b -> c) -> (a -> b) -> a -> c
(#) f g = f seq g seq \x -> f (g x)

foo :: (a -> b) -> List a -> List b
foo f = mapList (id # f) xs


The core i get looks like:

mapList
mapList
= \ @ a_avr @ b_avs f_atJ ds_d10t ->
case ds_d10t of {
Nil -> Nil;
Cons a1_atL l_atM -> Cons (f_atJ a1_atL) (mapList f_atJ l_atM)

foo = \ @ a_avy @ b_avz f_atQ xs_atR -> mapList f_atQ xs_atR


### comment:28 Changed 13 months ago by dfeuer

That does look good. I don't know how we can determine whether the problem is sufficiently fixed. We have an interaction between a weird language feature (detectably lifted functions) and a somewhat illegitimate compiler transformation that tries to avoid paying too much for those. Everything feels a bit brittle and ill-specified.

Note: See TracTickets for help on using tickets.