Opened 9 years ago

Closed 8 years ago

#4414 closed bug (fixed)

scc001 fails: one SCC entered 0 times; another not mentioned when optimisation is on

Reported by: igloo Owned by: simonmar
Priority: normal Milestone: 7.4.1
Component: Profiling Version: 6.12.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

With optimisation off, g is entered 0 times, and with optimisation on f disappears from the profile:

--- ./scc001.stdout.normalised  2010-10-18 16:36:20.000000000 +0100
+++ ./scc001.run.stdout.normalised      2010-10-18 16:36:20.000000000 +0100
@@ -2,7 +2,10 @@
 True
 3
 'a'
+No single-entry for g found in profile
+   g                     Main                                                 246           0   0.0    0.2     0.0    0.2
 Compiling with -O
 True
 3
 'a'
+No single-entry for f found in profile
*** unexpected failure for scc001(normal)

Change History (14)

comment:1 Changed 9 years ago by simonpj

I looked at (a), the missing entry to g with -O0. The Core code looks right, and STG:

      let {
        sat_sht :: GHC.Types.IO ()
        [LclId]
        sat_sht =
          let {
            x_sha :: GHC.Types.Int
            [LclId, Unf=OtherCon []]
            x_sha = GHC.Types.I# 3 } in
          let {
            sat_shu :: GHC.Types.Int
            [LclId]
            sat_shu = __scc {g main:Main} x_sha } in
          let {
            sat_shv :: GHC.Types.Int -> GHC.Types.IO ()
            [LclId]
            sat_shv = System.IO.print @ GHC.Types.Int GHC.Show.$fShowInt } in
          GHC.Base.$ @ GHC.Types.Int @ (GHC.Types.IO ()) sat_shv sat_shu } in

Note the _scc for "g".

The sat_shu closure gets compiled into a "standard form ap closure", and the code looks a bit suspicious. I think it'll turn out to be an interaction between these standard-form closures and profiling, but I need to look at it with Simon.

comment:2 Changed 9 years ago by simonpj

For the -O case I've located the bug. We were optimising

  case _scc_ "foo" (C a b) of
            C a b -> e

to e, which loses the entry count on "foo". I'll disable that, which will make profiled programs less optimised, but should get the right counts!

Simon

comment:3 Changed 9 years ago by simonpj

Owner: set to simonmar

This is a fix for the -O case:

Tue Oct 26 01:07:54 PDT 2010  simonpj@microsoft.com
  * Don't look through SCC in exprIsConApp_maybe
  
  By not taking account of SCC notes we were inadvertently
  discarding some, which led to mis-counted entries with -O
  
  Should fix (part of) Trac #4414

    M ./compiler/coreSyn/CoreUnfold.lhs -4 +5

Profiled programs compiled with -O will run a bit slower, but the entry count should be right.

Probably worth merging.

I'll change ownership to Simon M for the -O0 bug.

Simon

comment:4 Changed 9 years ago by simonmar

Getting the entry counts right adds more constraints to what we can do with SCCs, and I think might end up costing quite a lot. e.g. in the example above we have

 let x = scc "g" y

and if it weren't for the entry count we could eliminate it altogether (the bug is that we are compiling this closure as a standard form when it shouldn't be, but I bet there are other similar bugs in this area).

Arguably HPC is a better way to get this information. I tentatively propose that we dispense with entry counts in the profiler.

comment:5 Changed 9 years ago by igloo

I'm not sure I follow. Having the counts in the .prof file is convenient. Why can't the profiler do what hpc does to get it?

comment:6 in reply to:  5 Changed 9 years ago by simonmar

Replying to igloo:

I'm not sure I follow. Having the counts in the .prof file is convenient. Why can't the profiler do what hpc does to get it?

We could make cost centres behave like HPC ticks and thus get reliable entry counts, but the downside is that it adds overhead. Adding overhead is fine, except that it adds overhead in a non-uniform way, and thus distorts the profile. If you inline a function and it gets almost completely optimised away, you really don't want to have the SCCs stick around just for the purposes of counting entries. Ideally profiling should stay out of the way of the optimiser as far as possible. HPC has slightly different goals, because it is not primarily a profiler.

So we have a few options:

  • get the entry counts right
  • don't do entry counts at all
  • make a best effort to get entry counts, but don't compromise optimisation

comment:7 Changed 9 years ago by igloo

Would it be possible to have accurate counts if also compiled with -fhpc, and no counts otherwise?

comment:8 Changed 9 years ago by simonmar

We could certainly make -fhpc work with profiling (maybe it already does). We could also have a mode for -fhpc which only puts ticks on top-level functions rather than every subexpression, to reduce the overhead.

However, I suppose the downside for using HPC for entry counts is that the HPC data is flat: no call stacks like we get with profiling.

comment:9 Changed 9 years ago by igloo

Perhaps another option is for SCCs to be parameterised by whether they keep a count of entries or not. Then an optimisation could eliminate an __scc NoCounts ... but not an __scc WithCounts ....

comment:10 Changed 9 years ago by igloo

Milestone: 7.0.27.2.1

comment:11 Changed 9 years ago by simonmar

See also #5137

comment:12 Changed 9 years ago by augustss

The entry counts are invaluable, please don't remove them. Performance bugs are often about something getting evaluated more times than you expect, and without the entry counts there's no way to determine that. I'd settle for a flag, though. Exact entry count and slower code, or inaccurate entry counts and fast code. Sometimes you need one, sometimes the other.

I'd also like to urge you to fix this, because profiling is the only way to speed up large program; staring at core only works for kernels.

comment:13 Changed 8 years ago by simonmar

Milestone: 7.2.17.4.1

comment:14 Changed 8 years ago by simonmar

difficulty: Unknown
Resolution: fixed
Status: newclosed

scc001 now succeeds, and there is a flag to turn off entry counts if you don't want the overhead: -fno-prof-count-entries.

Note: See TracTickets for help on using tickets.