Opened 4 years ago

Last modified 4 years ago

#11677 new bug

Dramatic de-optimization with "-O", "-O1", "-O2" options

Reported by: malphunction Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.10.3
Keywords: optimization deoptimization Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by malphunction)

Look for this simple program:

import Control.Monad
import Data.Maybe

-- import qualified Data.HashMap.Strict as M
-- import qualified Data.Map.Lazy as M
import qualified Data.Map.Strict as M

-- import Control.DeepSeq
-- import Control.Exception


main :: IO ()
main = do
    putStrLn "Start"

    n <- read <$> getLine
    q <- read <$> getLine

    dict' <- M.fromList <$> replicateM n ((\(k:v:_) -> (k,v)) <$> words <$> getLine)

    -- dict <- evaluate $ force dict'
    let dict = dict'

    count <- length <$> catMaybes <$> replicateM q (flip M.lookup dict <$> getLine)
    print count

When compiled without "-O2" it runs about 0.07 sec on my computer. But when compiled with "-O2" it runs about 77 sec (1100 times slowly!).

Look: compile and run without "-O2":

% rm -rf ./mime_type mime_type.{o,hi} && ghc mime_type.hs -o mime_type && cat my_data.txt | time ./mime_type
[1 of 1] Compiling Main             ( mime_type.hs, mime_type.o )
Linking mime_type ...
Start
4738
./mime_type  0,06s user 0,01s system 97% cpu 0,069 total

And with "-O2":

% rm -rf ./mime_type mime_type.{o,hi} && ghc -O2 mime_type.hs -o mime_type && cat my_data.txt | time ./mime_type
[1 of 1] Compiling Main             ( mime_type.hs, mime_type.o )
Linking mime_type ...
Start
4738
./mime_type  76,73s user 0,10s system 99% cpu 1:17,12 total

But when force dict variable (dict <- evaluate $ force dict'), it runs fast in both cases (with and without "-O2").

Also this bug is reproductable with "-O", "-O1" options.

Also this bug is reproductable with .Strict and .Lazy versions; and with Data.HashMap, .Strict and .Lazy

Also this bug is reproductable with GHC 7.10.2 and GHC 8.0.1-rc2 (The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204).

Data file my_data.txt is attached. It has simple structure:

  • N, number of key-value pairs
  • K, number of keys for searching
  • N key-value pairs
  • K kes

You can generate it with this Ruby script:

#!/usr/bin/env ruby
lst = ('a'..'z').to_a

N = 10000
K = 10000

File.open('my_data.txt', 'w') do |f|
    f.puts(N)
    f.puts(K)
    N.times do
        f.puts("#{lst.sample(3).join} #{lst.sample(5).join}")
        # f.puts("(\"#{lst.sample(3).join}\",\"#{lst.sample(5).join}\")")
    end
    K.times do
        f.puts("#{lst.sample(3).join}")
    end
end

Attachments (1)

my_data.txt (136.7 KB) - added by malphunction 4 years ago.
Test data

Download all attachments as: .zip

Change History (7)

Changed 4 years ago by malphunction

Attachment: my_data.txt added

Test data

comment:1 Changed 4 years ago by malphunction

Description: modified (diff)

comment:2 Changed 4 years ago by malphunction

Description: modified (diff)

comment:3 Changed 4 years ago by malphunction

Update: if disable -fenable-rewrite-rules (i.e. -O2 -fno-enable-rewrite-rules), then compiled program runs fast!

Last edited 4 years ago by malphunction (previous) (diff)

comment:4 Changed 4 years ago by bgamari

Thanks for the excellent description and testcase!

The problem here is that GHC is inlining the definition of dict with -O1. This means that your transformed program looks like,

dict' <- replicateM n ((\(k:v:_) -> (k,v)) <$> words <$> getLine)
count <- length <$> catMaybes <$> replicateM q (flip M.lookup (M.fromList dict) <$> getLine)

Meaning that the Map is being reconstructed with every line that is read.

You can easily discourage GHC from performing this inlining by placing a bang on the dict' binding,

!dict' <- M.fromList <$> replicateM n ((\(k:v:_) -> (k,v)) <$> words <$> getLine)

You were accomplishing this same end with your evaluate $ deepseq, but more "forcefully".

Indeed it is a bit unfortunate that GHC decides that this inlining is beneficial, but I'm not entirely sure how it could know otherwise.

comment:5 Changed 4 years ago by bgamari

For the record, I concluded this by comparing the output of -ddump-simpl -dsuppress-all from the testcases compiled with -O1 with and without the evaluate $ deepseq. The delta in the Core produced in these two cases is extremely small and immediately reveals the problem (cutting out a few spurious differences),

  • .dump-simpl

    old new  
    296301            case x of _ { I# ww1 ->
    297302            case $wa @ (String, String) ww1 (lvl3 `cast` ...) ipv4
    298303            of _ { (# ipv6, ipv7 #) ->
     304            case seq#
     305                   @ (Map String String)
     306                   @ RealWorld
     307                   (let {
     308                      dict' :: Map String String
     309                      dict' = $sfromList @ [Char] ipv7 } in
     310                    case $fNFDataMap_$crnf
     311                           @ [Char] @ [Char] ($sforce3 `cast` ...) ($sforce3 `cast` ...) dict'
     312                    of _ { () ->
     313                    dict'
     314                    })
     315                   ipv6
     316            of _ { (# ipv8, ipv9 #) ->
    299317            case readEither6 @ Int (run @ Int lvl2 ipv5) of _ {
    300318              [] ->
    301319                case error
     
    315333                           ((\ (eta :: State# RealWorld) ->
    316334                               case wantReadableHandle_1
    317335                                      @ String hGetLine4 stdin (hGetLine2 `cast` ...) eta
    318                                of _ { (# ipv8, ipv9 #) ->
    319                                (# ipv8, $slookup1 @ [Char] ipv9 ($sfromList @ [Char] ipv7) #)
     336                               of _ { (# ipv10, ipv11 #) ->
     337                               (# ipv10, $slookup1 @ [Char] ipv11 ipv9 #)
    320338
Last edited 4 years ago by bgamari (previous) (diff)

comment:6 Changed 4 years ago by simonpj

This is another in the long chain of tickets involving the "state hack" and replicateM. See #1168 for a list and #9388 for ideas.

Sadly -fno-state-hack doesn't make any difference. Reason: the fmap for IO is begin called, and looks like

fmapIO = \f a s. case a s of
                   (r, s') -> (f r, s')

So if we have replicateM (fmapIO (g (expensive x)) getLine), we'll inline fmapIO to

replicateM (let f = g (expensive x)
            in \s. case getLine s of
                     (r, s') -> (f r, s'))

Now if that \s which come from fmapIO is treated as one-shot, we'll inline g (expensive x) inside; disaster.

The -fno-state-hack doesn't make any difference because Joachim arranged to persist one-shot-ness in interface files, so what matters is the setting in GHC.Base where fmapIO was defined.

Anyway that's the reason, and we have many examples of it. The right solution is sketched in #9388 but it needs someone to pick up the cudgels.

Note: See TracTickets for help on using tickets.