Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#9203 closed bug (fixed)

Perf regression in 7.8.2 relative to 7.6.3, possibly related to HashMap

Reported by: simonmar Owned by:
Priority: high Milestone: 7.8.3
Component: Compiler Version: 7.8.2
Keywords: Cc: tibbe
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

$ git clone http://github.com/facebook/Haxl.git
Cloning into 'Haxl'...
remote: Counting objects: 77, done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 77 (delta 17), reused 69 (delta 9)
Unpacking objects: 100% (77/77), done.
$ cd Haxl 
$ ghc-7.6.3 -O2 tests/Bench.hs -main-is Bench -o tests/Bench-76
[1 of 6] Compiling Haxl.Core.StateStore ( Haxl/Core/StateStore.hs, Haxl/Core/StateStore.o )
[2 of 6] Compiling Haxl.Core.Show1  ( Haxl/Core/Show1.hs, Haxl/Core/Show1.o )
[3 of 6] Compiling Haxl.Core.Util   ( Haxl/Core/Util.hs, Haxl/Core/Util.o )
[4 of 6] Compiling Haxl.Core.Types  ( Haxl/Core/Types.hs, Haxl/Core/Types.o )
[5 of 6] Compiling Haxl.Core.DataCache ( Haxl/Core/DataCache.hs, Haxl/Core/DataCache.o )
[6 of 6] Compiling Bench            ( tests/Bench.hs, tests/Bench.o )
Linking tests/Bench-76 ...
$ ./tests/Bench-76 500000                                      
Just (Right 0)
insert: 0.87s
500000
lookup: 0.24s
$ ./tests/Bench-76 500000
Just (Right 0)
insert: 0.87s
500000
lookup: 0.26s
$ ghc-7.8.2 -O2 tests/Bench.hs -main-is Bench -o tests/Bench-78
[1 of 5] Compiling Haxl.Core.StateStore ( Haxl/Core/StateStore.hs, Haxl/Core/StateStore.o )
[2 of 5] Compiling Haxl.Core.Show1  ( Haxl/Core/Show1.hs, Haxl/Core/Show1.o )
[3 of 5] Compiling Haxl.Core.Types  ( Haxl/Core/Types.hs, Haxl/Core/Types.o )
[4 of 5] Compiling Haxl.Core.DataCache ( Haxl/Core/DataCache.hs, Haxl/Core/DataCache.o )
[5 of 5] Compiling Bench            ( tests/Bench.hs, tests/Bench.o )
Linking tests/Bench-78 ...
$ ./tests/Bench-78 500000                                      
Just (Right 0)
insert: 1.09s
500000
lookup: 0.44s
$ ./tests/Bench-78 500000
Just (Right 0)
insert: 1.08s
500000
lookup: 0.44s

Change History (13)

comment:1 Changed 5 years ago by simonmar

Cc: tibbe added

I've narrowed this down further. It seems to be something to do with HashMap.

With the following source file:

{-# LANGUAGE RankNTypes, GADTs, BangPatterns, DeriveDataTypeable,
    StandaloneDeriving #-}
{-# OPTIONS_GHC -fno-warn-unused-do-bind -fno-warn-type-defaults #-}

module Bench where

import Prelude hiding (mapM)

import Control.Concurrent
import Data.Hashable
import Data.Time.Clock
import Data.Traversable
import Data.Typeable
import System.Environment
import Text.Printf
import qualified Data.HashMap.Strict as HashMap
import Data.HashMap.Strict (HashMap)
import Unsafe.Coerce

data TestReq a where
  ReqInt    :: {-# UNPACK #-} !Int -> TestReq Int
  deriving Typeable

deriving instance Eq (TestReq a)
deriving instance Show (TestReq a)

instance Hashable (TestReq a) where
  hashWithSalt salt (ReqInt i) = hashWithSalt salt (0::Int, i)

main = do
  [n] <- fmap (fmap read) getArgs
  t0 <- getCurrentTime
  let
     f 0  !cache = cache
     f !n !cache = f (n-1) (dcinsert (ReqInt n) 0 cache)
  --
  let !cache = f n dcempty
  let m = dclookup (ReqInt (n `div` 2)) cache
  print m
  t1 <- getCurrentTime
  printf "insert: %.2fs\n" (realToFrac (t1 `diffUTCTime` t0) :: Double)

  t0 <- getCurrentTime
  let
     f 0  !m = m
     f !n !m = case dclookup (ReqInt n) cache of
                 Nothing -> f (n-1) m
                 Just _  -> f (n-1) (m+1)
  print (f n 0)
  t1 <- getCurrentTime
  printf "lookup: %.2fs\n" (realToFrac (t1 `diffUTCTime` t0) :: Double)


newtype DataCache = DataCache (HashMap TypeRep SubCache)

-- | The implementation is a two-level map: the outer level maps the
-- types of requests to 'SubCache', which maps actual requests to their
-- results.  So each 'SubCache' contains requests of the same type.
-- This works well because we only have to store the dictionaries for
-- 'Hashable' and 'Eq' once per request type.
data SubCache =
  forall req a . (Hashable (req a), Eq (req a)) =>
       SubCache ! (HashMap (req a) a)
       -- NB. the inner HashMap is strict, to avoid building up
       -- a chain of thunks during repeated insertions.

-- | A new, empty 'DataCache'.
dcempty :: DataCache
dcempty = DataCache HashMap.empty

-- | Inserts a request-result pair into the 'DataCache'.
dcinsert
  :: (Hashable (r a), Typeable (r a), Eq (r a))
  => r a
  -- ^ Request
  -> a
  -- ^ Result
  -> DataCache
  -> DataCache

dcinsert req result (DataCache m) =
      DataCache $
        HashMap.insertWith fn (typeOf req)
                              (SubCache (HashMap.singleton req result)) m
  where
    fn (SubCache new) (SubCache old) =
        SubCache (unsafeCoerce new `HashMap.union` old)

-- | Looks up the cached result of a request.
dclookup
  :: Typeable (r a)
  => r a
  -- ^ Request
  -> DataCache
  -> Maybe a

dclookup req (DataCache m) =
      case HashMap.lookup (typeOf req) m of
        Nothing -> Nothing
        Just (SubCache sc) ->
           unsafeCoerce (HashMap.lookup (unsafeCoerce req) sc)

GHC 7.6.3:

Just 0
insert: 0.73s
500000
lookup: 0.23s

GHC 7.8.2:

Just 0
insert: 1.01s
500000
lookup: 0.53s

insert is a bit slower, but lookup is more than twice as slow with 7.8.2.

Looking at the Core, at lookup in particular, the code in 7.8.2 looks reasonable. But in both cases we end up calling Data.HashMap.Base.lookup for the inner lookup, and I'm guessing that is where the inefficiency lies.

@tibbe, want to take a look?

comment:2 Changed 5 years ago by simonmar

Summary: Perf regression in 7.8.2 relative to 7.6.3, possibly related to TypeablePerf regression in 7.8.2 relative to 7.6.3, possibly related to HashMap

comment:3 Changed 5 years ago by simonmar

Hang on, I'm just doing some more investigation in TypeRep, I think it might be in there after all.

comment:4 Changed 5 years ago by simonmar

Priority: normalhigh

Fix: https://phabricator.haskell.org/D19

It would be nice to get this into 7.8.3.

comment:5 Changed 5 years ago by simonpj

Yes, great fix, let's get it in as soon as poss. I've added Phab comment thought.

Simon

comment:6 Changed 5 years ago by simonpj

Status: newmerge

I'm going to change the status to 'merge' even though it's not in HEAD, so that Austin doesn't miss it.

Simon

comment:7 Changed 5 years ago by Simon Marlow <marlowsd@…>

In 5ffc68bb75d34414987b5d1e5aa4f9061a7a7383/ghc:

Fix recomputation of TypeRep in the instance for Typeable (s a) (#9203)

Summary:
Every time we called typeRep on a type application, we were
recomputing the TypeRep.  This showed up in a benchmark I had: #9203.

Test Plan: Benchmark from #9203.

Reviewers: simonpj, austin

Subscribers: simonmar, relrod

Differential Revision: https://phabricator.haskell.org/D19

comment:8 Changed 5 years ago by simonmar

Simon: I don't think you added a comment on Phabricator (you probably need to click the big button at the bottom of the page).

I've been digging a bit more into this: it turns out we had fixed this once before, see #3245, and there was even a big Note in the code that was *removed* by Pedro in 3d53407f9c6e87bcff30d51f0a048e92b2745a32.

There's also a perf test (testsuite/tests/perf/should_run/T3245.hs), but that didn't trigger because (a) there are no bounds defined for it in all.T, and (b) that test doesn't trigger the bug any more anyway. I'm going to add a new perf test, and some comments to Internal.hs.

comment:9 Changed 5 years ago by simonpj

Ha ha! My attempt to use Phab fails at the first fence. I'm jolly glad I only wrote one short comment, not a whole lot!!

Anyway you read my mind: I was suggesting a clear Note to explain the issue, but I see you are already on it.

Simon

comment:10 Changed 5 years ago by Iavor S. Diatchki <diatchki@…>

In 836981c7dec5c794ca94408468535cc018dc2e82/ghc:

Redo instance to be more efficient (see #8778, #9203)

comment:11 Changed 5 years ago by simonmar

Simon - you didn't lose your comment, if you go back to Phabricator it will still be there ready to submit. Phabricator auto-saves comments (unlike Trac).

comment:12 Changed 5 years ago by thoughtpolice

Resolution: fixed
Status: mergeclosed

comment:13 Changed 5 years ago by jstolarek

Perhaps it's also worth to resurrect the deleted Note?

Note: See TracTickets for help on using tickets.