Opened 3 years ago

Closed 2 years ago

#12781 closed bug (fixed)

Significantly higher allocation with INLINE vs NOINLINE

Reported by: MikolajKonarski Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.0.1
Keywords: JoinPoints Cc: maurerl@…, pdownen@…, ariola@…
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #12747 #12603 #5775 Differential Rev(s):
Wiki Page:

Description

The following program allocates much more memory ("bytes allocated in the heap") in its current form than with INLINE replaced by NOINLINE, as in the comment. That may be normal, but virtually every feature of this example is needed to trigger the behaviour and so I have no clear idea which language feature is responsible and should be avoided and, consequently, how to optimize my much more complex codebase (in which INLINES of once used functions tend to help and sometimes significantly so). As a comment to this ticket I will post another example, where the difference in allocation is pumped to a factor of 2000, as a proof that the issue is serious.

{-# LANGUAGE RankNTypes #-}
import Control.Monad.ST.Strict
import qualified Data.IntMap.Strict as IM

-- ghc --make -O1 InlineBloat.hs; ./InlineBloat +RTS -s

data P = P Int

instance Enum P where
  fromEnum (P x) = x
  toEnum n = undefined

main = do
  let {-# NOINLINE z #-}
      z = 44
      dis :: Int -> ()
      {-# INLINE dis #-}  -- change here to NOINLINE and observe lower alloc
      dis pI =
        let p0 = let (_, x) = pI `quotRem` z in P x
            p1 = let (_, x) = pI `quotRem` z in P x
            m = IM.lookup (fromEnum p0) IM.empty
            b = IM.member (fromEnum p1) IM.empty
        in m == Just 'c' `seq` b `seq` ()
      {-# NOINLINE l #-}
      l = [0..10000000]
      mapVT :: forall s. () -> ST s ()
      {-# INLINE mapVT #-}
      mapVT _ = mapM_ (\x -> return $! dis x) l
  return $! runST (mapVT ())

Change History (8)

comment:1 Changed 3 years ago by MikolajKonarski

Here is the example where the code with INLINE allocates 2000 times more heap memory that with NOINLINE:

{-# LANGUAGE BangPatterns, RankNTypes #-}
import Control.Monad.ST.Strict
import qualified Data.IntMap.Strict as IM
import Data.List
import Data.Maybe

-- ghc --make -O1 InlineBloat.hs; ./InlineBloat +RTS -s

data P = P !Int

instance Enum P where
  fromEnum (P x) = x
  toEnum n = undefined

main = do
  let {-# NOINLINE z #-}
      z = 44
      dis :: Int -> ()
      {-# INLINE dis #-}  -- change here to NOINLINE
--
-- with INLINE:
--           384,409,080 bytes allocated in the heap
-- with NOINLINE:
--               169,080 bytes allocated in the heap
      dis pI =
        let p0 = let (_, x) = pI `quotRem` z in P x
            p1 = let (y, _) = pI `quotRem` z in P y
            !_ = isJust $ IM.lookup (fromEnum p0) IM.empty
            !_ = isJust $ IM.lookup (fromEnum p1) IM.empty
        in ()
      {-# NOINLINE l #-}
      l = [0..1600]
      mapVT :: forall s. Int -> ST s ()
      {-# INLINE mapVT #-}
      mapVT _ = mapM_ (\x -> return $! dis x) l
      !runRes = foldl' (\() n -> runST (mapVT n)) () [1..10000]
  return ()

comment:3 Changed 3 years ago by MikolajKonarski

comment:4 Changed 3 years ago by MikolajKonarski

comment:5 Changed 3 years ago by simonpj

Cc: maurerl@… pdownen@… ariola@… added

Interesting. I know exactly what is going on.

With NOINLINE dis we get

dis = \p. let $w$j ww = blah
          in case p of
             -1#     -> $w$j (...)
             DEFAULT -> $w$j (..)

go xs z = case xs of
            []     -> ...
            (y:ys) -> case dis y of () -> go ys z

Note that $w$j is a join point, so incurs no allocation cost.

There is just one call to dis in the body of go. So without the NOINLINE dis, the function dis inlines in the body of go. But then we get something like

go xs z = case xs of
            [] -> ...
            (y:ys) -> let $w$j ww = blah
                       in case p of
                          -1#     -> case ($w$j (...)) of
                                        () -> go ys z
                          DEFAULT -> case ($w$j (...)) of
                                        () -> go yz z

Yikes! $w$j is no longer a join point, so allocation goes up.

Six months ago I would have felt sad. But today I feel happy. In our paper Compiling without continuations, we explain how to ensure that join points are never destroyed.

Moreover, Luke Maurer is well advanced on a full implementation in GHC. Luke, how is it going? When will it land in GHC? We need this!

This ticket would make a lovely example to add to the paper, because it's a true "from the wild" example.

comment:6 Changed 3 years ago by simonpj

Keywords: JoinPoints added

comment:7 Changed 3 years ago by lukemaurer

Confirmed—with join points, INLINE vs. NOINLINE makes no difference. And indeed the key is that we move the case into $w$j.

comment:8 Changed 2 years ago by MikolajKonarski

I've tried both examples with 8.2.1-rc1 and indeed INLINE vs. NOINLINE makes no difference. My original big program is 10% faster than with 8.0.1 (I guess most of that is due to GHC, but some may also be due to newer containers, etc.). Congratulations! Closing.

comment:9 Changed 2 years ago by MikolajKonarski

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.