Opened 4 years ago

Closed 4 years ago

#11365 closed bug (duplicate)

Worse performance with -O

Reported by: facundo.dominguez Owned by:
Priority: normal Milestone:
Component: Compiler Version:
Keywords: optimization performance concurrency Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #1168 Differential Rev(s):
Wiki Page:

Description

The running time of the following program worsens when compiled with -O, and worsens more when compiled with ghc-7.10.2.

-- /opt/ghc-7.8.3/bin/ghc --make -threaded -fforce-recomp test.hs
-- time ./test: 3 seconds
--
-- /opt/ghc-7.8.3/bin/ghc --make -threaded -O -fforce-recomp test.hs
-- time ./test: 11 seconds
--
-- /opt/ghc-7.10.2/bin/ghc --make -threaded -fforce-recomp test.hs
-- time ./test: 5 seconds
--
-- /opt/ghc-7.10.2/bin/ghc --make -threaded -O -fforce-recomp test.hs
-- time ./test: 13 seconds
--

import Control.Concurrent
import Control.Monad
import Data.List

main :: IO ()
main = do
  let x = foldl' (+) 0 [1 .. 100000000]
  mv <- newEmptyMVar
  replicateM_ 4 $ forkIO $ putMVar mv $! x
  nums <- replicateM 4 $ takeMVar mv
  print (nums :: [Integer])

The following variant which doesn't share x improves with -O for ghc-7.10.2, but ghc-7.8.3 still produces a faster program.

-- /opt/ghc-7.8.3/bin/ghc --make -threaded -fforce-recomp test.hs
-- time ./test: 10 seconds
--
-- /opt/ghc-7.8.3/bin/ghc --make -threaded -O -fforce-recomp test.hs
-- time ./test: 11 seconds
--
-- /opt/ghc-7.10.2/bin/ghc --make -threaded -fforce-recomp test.hs
-- time ./test: 18 seconds
--
-- /opt/ghc-7.10.2/bin/ghc --make -threaded -O -fforce-recomp test.hs
-- time ./test: 15 seconds
--

import Control.Concurrent
import Control.Monad
import Data.IORef
import Data.List

main :: IO ()
main = do
  mv <- newEmptyMVar
  ref <- newIORef 0
  replicateM_ 4 $ forkIO $ do
    i <- readIORef ref
    putMVar mv $! foldl' (+) i [1 .. 100000000]
  nums <- replicateM 4 $ takeMVar mv
  print (nums :: [Integer])

Some related discussion here.

Change History (4)

comment:1 Changed 4 years ago by rwbarton

The "state hack" seems to be responsible here. Without -O, the argument to replicateM_ is shared and therefore the expensive computation occurs only once. With -O, replicateM_ is inlined and then due to the "state hack" GHC thinks it is okay to duplicate the expensive computation.

Try building with -fno-state-hack. I made other small adjustments to the program in testing; you may also need -fno-full-laziness.

comment:2 Changed 4 years ago by simonpj

Ah yes! Just search for "replicateM" and you'll see a raft of tickets about this one problem!

If someone would like to dig further, I'd be happy to help.

Simon

comment:3 Changed 4 years ago by nomeata

#1168 has a list of related tickets, and #9388 has ideas and preliminary work on how to limit the scope of the hack.

comment:4 Changed 4 years ago by thomie

Resolution: duplicate
Status: newclosed

Ah yes! Just search for "replicateM" and you'll see a raft of tickets about this one problem!

There is a link back to this ticket from #1168, so this example can still be found and used as a test if necessary.

Note: See TracTickets for help on using tickets.