Opened 9 years ago

Closed 7 years ago

#4138 closed bug (fixed)

Performance regression in overloading

Reported by: simonmar Owned by: pcapriotti
Priority: high Milestone: 7.6.1
Component: Compiler Version: 6.13
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: simplCore/shouldCompile/T4138
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The following program goes 25% slower with HEAD compared to 6.12.3:

module Main (main) where

import DeepSeq

main :: IO ()
main = do
  rnf [ mk x | x <- [ 1 .. 1024 ] ] `seq` return ()
  where
    mk :: Float -> [(Float,Float)]
    mk x = [ (x+i,x+i+1) | i <- [ 1 .. 2048] ]

using the attached DeepSeq module, or indeed the standard Control.DeepSeq.

Simon and I diagnosed the problem to be the following dictionary for NFData (Float,Float) (this is HEAD):

Main.main6 :: DeepSeq.NFData (GHC.Types.Float, GHC.Types.Float)
Main.main6 =
  DeepSeq.$fNFData(,)
    @ GHC.Types.Float
    @ GHC.Types.Float
    DeepSeq.$fNFDataFloat
    DeepSeq.$fNFDataFloat

GHC has not inlined the dictionary function or the arguments here, even though this class is in fact just a single-method dictionary. With 6.12 we got:

Main.main6 =
  \ (ds_dBc :: (GHC.Types.Float, GHC.Types.Float)) ->
    case ds_dBc of _ { (x_awr, y_aws) ->
    case x_awr of _ { GHC.Types.F# _ ->
    case y_aws of _ { GHC.Types.F# _ -> GHC.Unit.() }
    }
    }

i.e. everything fully inlined and a nice efficient definition.

This is currently affecting parallel programs where we typically use rnf quite a lot.

Attachments (1)

DeepSeq.hs (1.3 KB) - added by simonmar 9 years ago.

Download all attachments as: .zip

Change History (19)

Changed 9 years ago by simonmar

Attachment: DeepSeq.hs added

comment:1 Changed 9 years ago by simonpj

Owner: changed from simonpj to igloo

WIth HEAD (and hence STABLE I guess), the performance (looking at allocation) is the same as 6.12, so I declare this fixed.

Ian: might you turn this into a performance test? Thanks

Simon

comment:2 Changed 9 years ago by simonmar

IIRC the difference here doesn't show up in allocations, just in runtime.

comment:3 Changed 9 years ago by simonpj

Ahem, ok. Well, Ian can you see if it's still a problem and add a test anyway?

comment:4 Changed 9 years ago by simonpj

Owner: changed from igloo to simonpj

comment:5 Changed 9 years ago by simonpj

Milestone: 7.0.17.0.2

comment:6 Changed 9 years ago by simonmar

Milestone: 7.0.27.2.1

We think the performance is almost back to where it was with 6.12, but not quite, and we'll take another look for 7.2.1.

comment:7 Changed 8 years ago by simonpj

Owner: changed from simonpj to igloo

Ian: could you see if this is ok now, and if not how not? Thanks

Simon

comment:8 Changed 8 years ago by igloo

difficulty: Unknown
Milestone: 7.4.17.4.2

comment:9 Changed 8 years ago by igloo

Milestone: 7.4.27.6.1

comment:10 Changed 7 years ago by igloo

Owner: changed from igloo to simonpj
Test Case: T4138

We're currently getting a strange middle result:

  \ (ds_dvm :: (GHC.Types.Float, GHC.Types.Float)) ->
    case ds_dvm of _ { (x_aux, y_auy) ->
    case x_aux of _ { GHC.Types.F# ipv_svw ->
    DeepSeq.$fNFDataFloat_$crnf y_auy
    }
    }

I'd have expected $fNFDataFloat_$crnf to have been inlined.

I've added a test (T4138).

Simon, I wonder if it would make sense to look at this at the same time as #6104?

comment:11 Changed 7 years ago by simonpj

Test Case: T4138simplCore/shouldCompile/T4138

comment:12 Changed 7 years ago by simonpj

Why do you think $fNFDataFloat_$crnf should be inlined? It is not called in an interesting context; it is not applied to an argument we know something about. Inlining would duplicate a case expression to elimiate a single jump. Is this really the source of the regression, if any?

Simon

comment:13 Changed 7 years ago by igloo

I was only looking at the core generated, not looking at run-time, so if that is the expected core then we can just close this ticket.

comment:14 Changed 7 years ago by simonpj

Well 17 months ago Simon M wrote "We think the performance is almost back to where it was with 6.12, but not quite, and we'll take another look for 7.2.1". Could you possibly check the performance as of today? It would be good to know whether we have a regression or not; and if so, whether it's due to this extra jump (which I rather doubt). Thanks

comment:15 Changed 7 years ago by igloo

Owner: changed from simonpj to pcapriotti

Paolo, can you take a look at whether there is a performance issue here please?

comment:16 Changed 7 years ago by pcapriotti

Here are some timings:

ghc-6.12.3: 287 ms
ghc-7.0.4:  316 ms
ghc-7.2.1:  269 ms
ghc-7.4.2:  265 ms
ghc-HEAD:   267 ms

It appears that the regression has been fixed sometime before 7.2.

Before we close the ticket, should we remove the test case which checks whether rnf is inlined?

comment:17 Changed 7 years ago by igloo

Just change it to check for 1 F# rather than 2

comment:18 Changed 7 years ago by pcapriotti

Resolution: fixed
Status: newclosed

Pushed:

commit 8f3d6a513ebba2c0a1fb15f835af78ad9e0023ff
Author: Paolo Capriotti <p.capriotti@gmail.com>
Date:   Thu Jul 5 20:04:37 2012 +0100

    Fix test case for #4138.
    
    As per the comments on that ticket, *not* inlining rnf is correct, so
    make this test pass.

Closing the ticket now.

Note: See TracTickets for help on using tickets.