Opened 21 months ago

Last modified 11 months ago

#14626 new bug

No need to enter a scrutinised value

Reported by: heisenbug Owned by: heisenbug
Priority: normal Milestone:
Component: Compiler Version: 8.2.2
Keywords: performance, CodeGen Cc: simonpj, osa1, simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: #14677 Blocking:
Related Tickets: #13861 #14677 Differential Rev(s):
Wiki Page:

Description

While analysing the output of #13861 I stumbled over an unnecessary pessimisation in handling of scrutinised values. With words of Simon (from https://phabricator.haskell.org/D4267 with minor edits added):

Interesting. Yes, please make a ticket! (And transfer the info below into it.)

I think the issue is this. Given (the STG-ish code)

data Colour = Red | Green | Blue
f x = case x of y
           Red -> Green
           DEFAULT -> y

(here y is the case binder) we can just return x rather than entering it in DEFAULT branch, because y will be fully evaluated and its pointer will be correctly tagged.

You absolutely can't check for an OccName of "wild"!! That is neither necessary nor sufficient :-).

Instead, check isEvaldUnfolding (idUnfolding y). See Note [Preserve evaluatedness] in CoreTidy.hs. And be sure to augment that note if you make the change.

I would expect perf benefits to be small on average, but it's simple to implement.

Change History (47)

comment:1 Changed 21 months ago by heisenbug

Owner: set to heisenbug

@spj Yes the perf changes will be small, but I hope to get more by the time #13861 is ready. Anyway this will result in better (less branchy) code, so I think it is worth it.

Thanks for the isEvaldUnfolding hint! I knew there must be a correct way to do it. I'll test, and come back with a phabricator.

comment:2 Changed 21 months ago by heisenbug

Cc: simonpj added

@simonpj: unfortunately the isEvaldUnfolding (idUnfolding y) criterion does not hold for case scrutinees. Any idea why? I pprTrace-d them, and while some traces show up, none is wild_*. What's going on here?

Last edited 21 months ago by heisenbug (previous) (diff)

comment:3 Changed 21 months ago by heisenbug

Hmm, I made a run with traces about which Ids the condition holds for: https://circleci.com/api/v1.1/project/github/ghc/ghc/951/output/108/0?file=true

you have to grep getCallMethod to find them. If I treat those values as tagged, and simply return them, the ghc-stage2 crashes. :-(

So something is definitely fishy here. For completeness, this is what I changed: https://github.com/ghc/ghc/compare/92f6a671a6a8...8f627c9f25b6

comment:4 Changed 21 months ago by simonpj

Puzzled by comment:2 I tried it myself. Sure enough, the evaluted-ness flags, so carefully attached by CoreTidy were being lost in CorePrep.

This turned out to date back at least 7 years.

  • See Note [dataToTag magic] in CorePrep. It relies evaluated-ness flags.
  • But those flags were being killed off in cpCloneBndr, for reasons described in Note [Dead code in CorePrep].
  • This was just a bug: it means that the dataToTag magic doesn't work properly. Sure enough we get a redundant case (after CorePrep) with this program
    data T = MkT !Bool
    
    f v = case v of
             MkT y -> dataToTag# y
    
    After CorePrep we end up with
    f v = case v of
             MkT y -> case y of z -> dataToTag# z
    
    which is silly.

I'll fix all this and add suitable comments

comment:5 Changed 21 months ago by simonpj

So something is definitely fishy here.

I think that there's something different about top-level binders. Consider

data T = MkT !Bool
top_x = True

f True  (MkT local_y) = local_y
f False _             = top_x

Here I think that local_y gets bound to a correctly-tagged pointer, fetched out of the MkT constructor.

But, in contrast, I think that top_x is bound to the label for the top-level closure for top_x, which is 8-byte aligned. So the label isn't tagged; instead, the code generator has to tag it. Is that happening in your "return it instead of eantering it" path?

Other than that I have no idea why it crashes. Things you might try:

  • Switch off the new optimisation when building stage2 and the libraries. Use it only for the test suite: these are small programs and easier to debug.
  • Maybe add an assertion in the Cmm: the claim is that on the "return it" path, the thing beign returned is a correctly-tagged pointer. So, the assertion can follow the pointer and check that the thing pointed to is a value, with the right tag, etc. There must be some code in the RTS (or somewhere) that checks that.

comment:6 in reply to:  5 Changed 21 months ago by heisenbug

Replying to simonpj:

So something is definitely fishy here.

I think that there's something different about top-level binders. Consider

data T = MkT !Bool
top_x = True

f True  (MkT local_y) = local_y
f False _             = top_x

Here I think that local_y gets bound to a correctly-tagged pointer, fetched out of the MkT constructor.

But, in contrast, I think that top_x is bound to the label for the top-level closure for top_x, which is 8-byte aligned. So the label isn't tagged; instead, the code generator has to tag it. Is that happening in your "return it instead of eantering it" path?

No, I am not returning top-level bindings, I would be okay with entering those. What I want is to avoid entering local bindings from the StgCase:

f = \inp -> case inp of wildBind { True -> ...;  False -> wildBind }

wildBind is known to be evaled and properly tagged. I want to ReturnIt.

Other than that I have no idea why it crashes. Things you might try:

  • Switch off the new optimisation when building stage2 and the libraries. Use it only for the test suite: these are small programs and easier to debug.
  • Maybe add an assertion in the Cmm: the claim is that on the "return it" path, the thing beign returned is a correctly-tagged pointer. So, the assertion can follow the pointer and check that the thing pointed to is a value, with the right tag, etc. There must be some code in the RTS (or somewhere) that checks that.

comment:7 Changed 21 months ago by Simon Peyton Jones <simonpj@…>

In bd438b2/ghc:

Get evaluated-ness right in the back end

See Trac #14626, comment:4.  We want to maintain evaluted-ness
info on Ids into the code generateor for two reasons
(see Note [Preserve evaluated-ness in CorePrep] in CorePrep)

- DataToTag magic
- Potentially using it in the codegen (this is Gabor's
  current work)

But it was all being done very inconsistently, and actually
outright wrong -- the DataToTag magic hasn't been working for
years.

This patch tidies it all up, with Notes to match.

comment:8 Changed 21 months ago by simonpj

OK I've fixed the stuff in comment:4

comment:9 Changed 21 months ago by heisenbug

@simonpj, thanks this is a good start. When restricting myself to wild* vars, I already get a nice performance diff on dynamic instruction counts: https://perf.haskell.org/ghc/#compare/7a25659efc4d22086a9e75dc90e3701c1706c625/9ad6982e4074c1fdeff967cafa51e436023d69bb

Then I tried to add ds* variables too, and (some of) those made the stage-2 crash. It is checked in on the branch if you want to have a try... https://github.com/ghc/ghc/commit/b30d61f64772d744e06e2acbab21895cb20d9bf7

Any hints appreciated!

comment:10 Changed 21 months ago by simonpj

The names of the variables should not make any difference! That's bizarre.

Guessing is usually fruitless; you need data.

Did you add that assertionn I suggested?

As I say, a stage2 compiler is a huge program. I urge you to first build the libraries and compiler without the change; then switch the change on and run the testsuite. Any bugs must be in those little programs. Then nofib.

comment:11 in reply to:  10 Changed 21 months ago by heisenbug

Replying to simonpj:

The names of the variables should not make any difference! That's bizarre.

Of course it is not the names, per se! Rather, one of the names starting with ds is misbehaving in a specific way, that trips over stage2.

I built a quick compiler with the given revision, that is stage2 compiled with -O0, and it behaves well. Testsuite passes without any strangeness.

Then I bisected the sources in compiler/* and isolated these two (mutually dependent) sources, which when built with -O1 (and all others with -O0) break stage 2. Here is what I did (for the record):

nice make V=1 GhcStage2HcOpts="-O0 -g" -j8
rm compiler/stage2/build/StgCmmBind.*
nice make V=1 GhcStage2HcOpts="-O1 -g" -j8 ./compiler/stage2/build/StgCmmBind.hi
nice make V=1 GhcStage2HcOpts="-O0 -g" -j8

I'll have a look which differences occur here between -O0 and -O1. Both interesting sources are >500 LOC, so it won't be trivial :-)

Guessing is usually fruitless; you need data.

Did you add that assertionn I suggested?

As I say, a stage2 compiler is a huge program. I urge you to first build the libraries and compiler without the change; then switch the change on and run the testsuite. Any bugs must be in those little programs. Then nofib.

comment:12 Changed 21 months ago by simonpj

Once more, I urge you not to try debugging a stage2 compiler, unless you have a great deal of time on your hands. Instead, compile the libraries and stage2 compiler without the optimisation; and then try the testsuite and nofib. These are small programs and much easier to debug.

comment:13 in reply to:  12 Changed 20 months ago by heisenbug

Replying to simonpj:

Once more, I urge you not to try debugging a stage2 compiler, unless you have a great deal of time on your hands. Instead, compile the libraries and stage2 compiler without the optimisation; and then try the testsuite and nofib. These are small programs and much easier to debug.

Yup, I entertained this idea of an assertion for some time, but it looked like too much of a hassle compared to some quick debugging session. Turns out you are right. So I now plant in taggedness assertions (pushed to wip/T14626 for your reviewing pleasure) for suspicious constellations, and they fire indeed!

  * frame #0: 0x000000010a0fcbc8 libHSrts_thr-ghc8.5.20180103.dylib`checkTagged
    frame #1: 0x000000010073e2c1 libHSghc-8.5-ghc8.5.20180103.dylib`ghc_Name_nzuocc_info [inlined] _cpOv + 17 at Name.hs:111
    frame #2: 0x000000010073e2b0 libHSghc-8.5-ghc8.5.20180103.dylib`ghc_Name_nzuocc_info + 72
    frame #3: 0x000000010a104fb0 libHSrts_thr-ghc8.5.20180103.dylib`stg_upd_frame_info_dsp + 16

Name.hs:111 is a strict record field BTW. Does this ring a bell? Why is it OtherCon _ <- idUnfolding id but not tagged? Is it possibly implicitly unpacked?

comment:14 Changed 20 months ago by simonpj

Name.hs:111 is a strict record field BTW. Does this ring a bell? Why is it OtherCon _ <- idUnfolding id but not tagged? Is it possibly implicitly unpacked?

Can you explain more? I can't make sense of this paragraph. What is "it" that might be implicitly unpacked? What does it mean to be "implicitly unpacked" ?

One good thing would be to distill a tiny example, and it sounds as if you have enough insight to do that now. E.g. perhaps you are saying that

data T = MkT ![Int]
f (MkT xs) = xs

returns a badly-tagged pointer? If so, just compile that tiny program and see. etc.

comment:15 Changed 20 months ago by alexbiehl

Indeed GHC seems to unnecessarily enter MkTs argument!

    ...
c1ab: // global
   R1 = P64[R1 + 7] & (-8); -- load xs and untag(!) it  
   Sp = Sp + 8;
   call (I64[R1])(R1) args: 8, res: 0, upd: 8; -- and enter it!
Last edited 20 months ago by alexbiehl (previous) (diff)

comment:16 in reply to:  15 Changed 20 months ago by heisenbug

Replying to alexbiehl:

Indeed GHC seems to unnecessarily enter MkTs argument!

    ...
c1ab: // global
   R1 = P64[R1 + 7] & (-8); -- load xs and untag(!) it  
   Sp = Sp + 8;
   call (I64[R1])(R1) args: 8, res: 0, upd: 8; -- and enter it!

Yes, my branch is supposed to fix (soon) many (if not all) of these cases. Simon fixed the lost tracking of evaled-ness in core-prep already, now I am building on that patch. Which GHC do you use to check? My branch currently won't bootstrap, so you can only do your experiments with stage1 (or roll your own modifications).

comment:17 Changed 20 months ago by simonpj

Yes, the whole point heisenbug's patch is to return xs without entering it. But we we are returning a badly-tagged pointer. If in the above code we return R1, that will be bad. Really we should un-tag it only if/when we enter it. For example if we had

data MkS = MkS ![Int]
f (MkT xs) = MkS xs

then when we build MkS xs we need a correctly tagged xs, not a de-tagged one.

comment:18 in reply to:  14 ; Changed 20 months ago by heisenbug

Replying to simonpj:

Name.hs:111 is a strict record field BTW. Does this ring a bell? Why is it OtherCon _ <- idUnfolding id but not tagged? Is it possibly implicitly unpacked?

Can you explain more? I can't make sense of this paragraph. What is "it" that might be implicitly unpacked? What does it mean to be "implicitly unpacked" ?

No, this is a red herring. Just me, desperately looking for hints. The n_occ is not unpacked.

But I have a theory now. It may explain why I never run into the problem with toy testcases.

I set a few breakpoints:

(gdb) info breakpoints
Num     Type           Disp Enb Address            What
8       breakpoint     keep n   0x00007ffff52e39e0 in ghc_Name_nzuocC_info
                                                   at
compiler/basicTypes/Name.hs:111
        breakpoint already hit 1 time
11      breakpoint     keep y   0x00007ffff52e3a18 in ghc_Name_nzuocC_info
                                                   at
compiler/basicTypes/Name.hs:111
        breakpoint already hit 46 times
12      breakpoint     keep y   0x00007ffff2332a58 <checkTagged>
        breakpoint already hit 1 time

bp11 is set as breakpoint *ghc_Name_nzuocC_info+56

bp11 is the continuation of bp8 which is jumped at when the Name is in WHNF. Then the n_occ field is being extracted.

It turns out that bp11 needs to be hit 46 times in order to find it untagged!

Then I looked into why it is untagged at all. Here is a transcript from gdb 0x420006f4a9 is the (tagged) Name:

(gdb) x/x 0x420006f4a9-1
0x420006f4a8:   0xf52ece08
(gdb) x/x 0x420006f4a9+7
0x420006f4b0:   0x0006f451
(gdb) x/x 0x420006f4a9+15
0x420006f4b8:   0x000662a0
(gdb) x/x 0x420006f4a9+19
0x420006f4bc:   0x00000042
### the `n_occ` field is 0x42000662a0

(gdb) x/x 0x420006f4a9+23
0x420006f4c0:   0xf74a17e8
(gdb) x/x 0x420006f4a9+31
0x420006f4c8:   0x00001818

Then I follow the closure pointer of the OccName:

(gdb) x/2x 0x42000662a0
0x42000662a0:   0xf2335878      0x00007fff
(gdb) x/x 0x00007ffff2335878
0x7ffff2335878 <stg_BLACKHOLE_info>:    0x08438b48

Ooops! How can a strict field point to a BLACKHOLE?

So this is my findings. Are strict fields able to hold blackholes? And if so, why do they carry an isEvaldUnfolding when pattern matched against?

Any hints appreciated! Do I have a (black)hole in my reasoning?

One good thing would be to distill a tiny example, and it sounds as if you have enough insight to do that now. E.g. perhaps you are saying that

data T = MkT ![Int]
f (MkT xs) = xs

returns a badly-tagged pointer? If so, just compile that tiny program and see. etc.

Last edited 20 months ago by heisenbug (previous) (diff)

comment:19 in reply to:  18 ; Changed 20 months ago by heisenbug

Replying to heisenbug:

[snippety]

So this is my findings. Are strict fields able to hold blackholes? And if so, why do they carry an isEvaldUnfolding when pattern matched against?

Any hints appreciated! Do I have a (black)hole in my reasoning?

Following up my own question...

I had luck and could set a few watchpoints in lldb, with the fourth one capturing the history of the OccName's closure. I'll leave it here for the record...

Watchpoint 4 hit:
new value: 4313987056
Process 2009 stopped
* thread #1: tid = 0xa7e8d1, 0x0000000101223ddd libHSghc-8.5-ghc8.5.20180103.dylib`sxWx_info [inlined] _cA5r + 12 at BinIface.hs:149, queue = 'com.apple.main-thread', stop reason = watchpoint 4
    frame #0: 0x0000000101223ddd libHSghc-8.5-ghc8.5.20180103.dylib`sxWx_info [inlined] _cA5r + 12 at BinIface.hs:149
   146 	
   147 	    -- Initialise the user-data field of bh
   148 	    bh <- do
-> 149 	        bh <- return $ setUserData bh $ newReadState (error "getSymtabName")
   150 	                                                     (getDictFastString dict)
   151 	        symtab_p <- Binary.get bh     -- Get the symtab ptr
   152 	        data_p <- tellBin bh          -- Remember where we are now

This is presumably when the OccName got allocated on the heap.

The next change happened here:

Watchpoint 4 hit:
old value: 4313987056
new value: 4463800616
Process 2009 stopped
* thread #1: tid = 0xa7e8d1, 0x000000010a104fc2 libHSrts_thr-ghc8.5.20180103.dylib`stg_upd_frame_info + 18, queue = 'com.apple.main-thread', stop reason = watchpoint 4
    frame #0: 0x000000010a104fc2 libHSrts_thr-ghc8.5.20180103.dylib`stg_upd_frame_info + 18
libHSrts_thr-ghc8.5.20180103.dylib`stg_upd_frame_info:
->  0x10a104fc2 <+18>: movq   %rax, %rcx
    0x10a104fc5 <+21>: andq   $-0x100000, %rcx
    0x10a104fcc <+28>: movq   %rax, %rdx
    0x10a104fcf <+31>: andl   $0xff000, %edx

Let's disassemble the change:

(lldb) disassemble -s 4313987056
libHSghc-8.5-ghc8.5.20180103.dylib`sxXT_info:
    0x1012237f0 <+0>:  leaq   -0x18(%rbp), %rax
    0x1012237f4 <+4>:  cmpq   %r15, %rax
    0x1012237f7 <+7>:  jb     0x10122388c               ; <+156> [inlined] _cA3D
    0x1012237fd <+13>: movq   0x1d6b6fc(%rip), %rax     ; (void *)0x000000010a104fb0: stg_upd_frame_info
    0x101223804 <+20>: movq   %rax, -0x10(%rbp)
    0x101223808 <+24>: movq   %rbx, -0x8(%rbp)

This gets overwritten with a BLACKHOLE:

(lldb) disassemble -s 4463800616
libHSrts_thr-ghc8.5.20180103.dylib`stg_BLACKHOLE_info:
    0x10a103128 <+0>:  movq   0x8(%rbx), %rax
    0x10a10312c <+4>:  testb  $0x7, %al
    0x10a10312e <+6>:  jne    0x10a103230               ; <+264>
    0x10a103134 <+12>: movq   (%rax), %rcx
    0x10a103137 <+15>: cmpq   0x190c2(%rip), %rcx       ; (void *)0x000000010a1030c8: stg_IND_info
    0x10a10313e <+22>: je     0x10a103128               ; <+0>
    0x10a103140 <+24>: cmpq   0x19121(%rip), %rcx       ; (void *)0x000000010a103390: stg_TSO_info

Now, maybe the Binary instance breaks the invariant that the n_occ field is strict?

Last edited 20 months ago by heisenbug (previous) (diff)

comment:20 in reply to:  19 ; Changed 20 months ago by heisenbug

Replying to heisenbug:

Replying to heisenbug:

[snippety]

Now, maybe the Binary instance breaks the invariant that the n_occ field is strict?

Actually it seems to be

-- | Assumes that the 'Name' is a non-binding one. See
-- 'IfaceSyn.putIfaceTopBndr' and 'IfaceSyn.getIfaceTopBndr' for serializing
-- binding 'Name's. See 'UserData' for the rationale for this distinction.
instance Binary Name where
   put_ bh name =
      case getUserData bh of
        UserData{ ud_put_nonbinding_name = put_name } -> put_name bh name

   get bh =
      case getUserData bh of
        UserData { ud_get_name = get_name } -> get_name bh

which breaks the contract. Let's see what getUserData does...

comment:21 in reply to:  20 Changed 20 months ago by heisenbug

Replying to heisenbug:

Replying to heisenbug:

Replying to heisenbug:

[snippety]

which breaks the contract. Let's see what getUserData does...

getSymtabName should somehow check the getTag# of the presumably strict fields and if any of them is 0, seq it?

comment:22 Changed 20 months ago by simonpj

I'm totally lost. I believe that the n_occ field you are referring to is in the Name data type

data Name = Name {
                n_sort :: NameSort,     -- What sort of name it is
                n_occ  :: !OccName,     -- Its occurrence name
                n_uniq :: {-# UNPACK #-} !Unique,
                n_loc  :: !SrcSpan      -- Definition site
            }

I think you believe you have found some code that is allocating a Name without evaluating the n_occ field first. But which code? I see no Name allocation above?

Also I am still perplexed about why you are trying to debug GHC itself. Why not run the testsuite with the stage1 compiler; and the nofib; and see if/when your assertions trip? Starting with the largest Haskell program in the world seems... ambitious.

Or are you saying that you did all that, and not a single assertion tripped?

comment:23 Changed 20 months ago by simonpj

getSymtabName should somehow check the getTag# of the presumably strict fields and if any of them is 0, seq it?

I don't think so. Here's its code:

getSymtabName _ncu _dict symtab bh = do
    i :: Word32 <- get bh
    case i .&. 0xC0000000 of
      0x00000000 -> return $! symtab ! fromIntegral i

      0x80000000 ->
        let
          tag = chr (fromIntegral ((i .&. 0x3FC00000) `shiftR` 22))
          ix  = fromIntegral i .&. 0x003FFFFF
          u   = mkUnique tag ix
        in
          return $! case lookupKnownKeyName u of
                      Nothing -> pprPanic "getSymtabName:unknown known-key unique"
                                          (ppr i $$ ppr (unpkUnique u))
                      Just n  -> n

      _ -> pprPanic "getSymtabName:unknown name tag" (ppr i)

The only way it can return a Name is either as a result of lookupKownKeyName or as ar result of indexing symtab. Both should return properly formed names. Maybe somehow one of them isn't? But which one?

comment:24 in reply to:  23 Changed 20 months ago by heisenbug

Replying to simonpj:

getSymtabName should somehow check the getTag# of the presumably strict fields and if any of them is 0, seq it?

I don't think so. Here's its code:

getSymtabName _ncu _dict symtab bh = do
    i :: Word32 <- get bh
    case i .&. 0xC0000000 of
      0x00000000 -> return $! symtab ! fromIntegral i

      0x80000000 ->
        let
          tag = chr (fromIntegral ((i .&. 0x3FC00000) `shiftR` 22))
          ix  = fromIntegral i .&. 0x003FFFFF
          u   = mkUnique tag ix
        in
          return $! case lookupKnownKeyName u of
                      Nothing -> pprPanic "getSymtabName:unknown known-key unique"
                                          (ppr i $$ ppr (unpkUnique u))
                      Just n  -> n

      _ -> pprPanic "getSymtabName:unknown name tag" (ppr i)

The only way it can return a Name is either as a result of lookupKownKeyName or as ar result of indexing symtab. Both should return properly formed names. Maybe somehow one of them isn't? But which one?

I bet it comes from the call to getDictionary in BinIface.hs. EDIT: or symtab <- getSymbolTable bh ncu just a few lines below...

EDIT2: Maybe I am losing this bet. https://circleci.com/gh/ghc/ghc/1100 still fails. Will debug more tomorrow.

Last edited 20 months ago by heisenbug (previous) (diff)

comment:25 in reply to:  22 Changed 20 months ago by heisenbug

Replying to simonpj:

I'm totally lost. I believe that the n_occ field you are referring to is in the Name data type

data Name = Name {
                n_sort :: NameSort,     -- What sort of name it is
                n_occ  :: !OccName,     -- Its occurrence name
                n_uniq :: {-# UNPACK #-} !Unique,
                n_loc  :: !SrcSpan      -- Definition site
            }

I think you believe you have found some code that is allocating a Name without evaluating the n_occ field first. But which code? I see no Name allocation above?

Also I am still perplexed about why you are trying to debug GHC itself. Why not run the testsuite with the stage1 compiler; and the nofib; and see if/when your assertions trip? Starting with the largest Haskell program in the world seems... ambitious.

Well, hubris :-)

Seriously, it only happens with GHC!

Or are you saying that you did all that, and not a single assertion tripped?

I tried a lot of things. All seemed to work.

comment:26 Changed 20 months ago by simonpj

I think you believe that there is a heap-allocated Name with an un-evaluated n_occ field. There is only one place in the code generator namely StgCmmCon.buildDynCon'. You could perhaps add (runtime) assertions there to see if any strict fields had un-tagged pointers.

comment:27 in reply to:  26 ; Changed 20 months ago by heisenbug

Replying to simonpj:

I think you believe that there is a heap-allocated Name with an un-evaluated n_occ field.

Yes. n_occ is a strict field and it appears to be non-WHNF when reading it. I was suspecting that this is because of deserialisation doing some dirty unsafeCoerce tricks when loading dictionaries, which contain the Name values, but there are other culprits too. I had to remove two further bangs on datatype fields on order to get a quick build with make GhcStage2HcOpts="-O1 -g" through. (https://github.com/ghc/ghc/commit/57a57f2f8cef2ea67588edd1f09f73981e86c889) So all evidence point towards a GHC defect in allocation of heap objects with strict fields.

There is only one place in the code generator namely StgCmmCon.buildDynCon'. You could perhaps add (runtime) assertions there to see if any strict fields had un-tagged pointers.

I'll look into the assertion which you suggest. Btw., this might be a recent regression or something ancient. I'll find out, and file another ticket if this gets confirmed.

Last edited 20 months ago by heisenbug (previous) (diff)

comment:28 in reply to:  27 Changed 20 months ago by heisenbug

There is only one place in the code generator namely StgCmmCon.buildDynCon'. You could perhaps add (runtime) assertions there to see if any strict fields had un-tagged pointers.

After adding a runtime assertion here, I found one smoking gun. Banged IORefs (a.k.a newtyped STRefs) should be tagged, but end up untagged e.g. in Handle__. I am investigating why this happens.

comment:29 Changed 20 months ago by alexbiehl

An STRef is a wrapper around MutVar# which is has UnliftedRep (represented through a pointer, never bottom) thus it doesn't need a tag to note its evaluatedness. GHC probably only tags value with LiftedRep. I think you should exclude UnliftedReped values from your assertions.

Edit: c.f. https://github.com/ghc/ghc/blob/fe6848f544c2a14086bcef388c46f4070c22d287/compiler/prelude/TysPrim.hs#L874

Last edited 20 months ago by alexbiehl (previous) (diff)

comment:30 Changed 20 months ago by simonpj

We have

data STRef s a = STRef (MutVar# s a)
newtype IORef a = IORef (STRef RealWorld a)
data Handle__ = H !(IORef Char)

Indeed the pointer to the MutVarf is untagged; but heisenbug is claiming that the pointer to the IORef, stored in the Handle__ is untagged. And that pointer should certainly be tagged. Smoking gun, I say, if I have correctly understood the claim.

comment:31 Changed 20 months ago by alexbiehl

Right, though GHC.IO.Handle.Types, which contains Handle__, has

{-# OPTIONS_GHC -funbox-strict-fields #-}

in its header.

Given this, our example data H = H !(IORef Char) turns into

$WH
$WH
  = \ dt_aV9 ->
      case dt_aV9 `cast` <Co:2> of { STRef dt_aVb -> H dt_aVb }

So we are indeed storing a MutVar# here.

Edit: Ah this isn't true. A -O slipped into my tests. So my statement above isn't true.

Last edited 20 months ago by alexbiehl (previous) (diff)

comment:32 in reply to:  31 Changed 20 months ago by heisenbug

Replying to alexbiehl:

Edit: Ah this isn't true. A -O slipped into my tests. So my statement above isn't true.

So strict field unboxing (without the explicit pragma) happens in -O1 (and above) only?

(That would explain why I often get crashes when mixing -O0 and -O1)

comment:33 Changed 20 months ago by heisenbug

I have a nice one:

libHSghc-8.5-ghc8.5.20180103.dylib`sSTG_info + 98 [inlined] _c147g + 1
   503 	        -- force evaluation all this stuff to avoid space leaks
   504 	        {-# SCC "seqString" #-} evaluate $ seqString (showSDoc dflags $ vcat $ map ppr imports)
   505 	
    0x102a5b6ca <+98>:  leal   -0x179cf8(%rip), %eax     ; ghc_Outputable_SDC_con_info
    0x102a5b6d0 <+104>: movq   %rax, -0x18(%r12)
    0x102a5b6d5 <+109>: leaq   0x95471c(%rip), %rax      ; ghc_Outputable_defaultUserStyle1_closure
    0x102a5b6dc <+116>: movq   %rax, -0x10(%r12)
    0x102a5b6e1 <+121>: movq   %rbx, -0x8(%r12)
    0x102a5b6e6 <+126>: movq   0x8(%rbp), %rax
    0x102a5b6ea <+130>: movq   %rax, (%r12)
    0x102a5b6ee <+134>: movq   -0x10(%r12), %rax
    0x102a5b6f3 <+139>: testb  $0x7, %al
    0x102a5b6f5 <+141>: jne    0x102a5b70b               ; <+163> [inlined] _c147A
libHSghc-8.5-ghc8.5.20180103.dylib`sSTG_info + 143 [inlined] _c147B
   503 	        -- force evaluation all this stuff to avoid space leaks
   504 	        {-# SCC "seqString" #-} evaluate $ seqString (showSDoc dflags $ vcat $ map ppr imports)
   505 	
    0x102a5b6f7 <+143>: subq   $0x8, %rsp
    0x102a5b6fb <+147>: movq   -0x10(%r12), %rdi
    0x102a5b700 <+152>: xorl   %eax, %eax
    0x102a5b702 <+154>: callq  0x102eb8604               ; symbol stub for: checkTagged

Here the first (banged) field of SDC gets initialised to a global closure. Clearly it is non-tagged and not evaluated. It gets caught by my assertion a bit later. I think this is a clear bug. The closure should be entered and evaluated to a WHNF (tagged) constructor before saving it into the SDC constructor.

I did a quick compilation with make -j5 GhcStage2HcOpts="-O1 -g" stage=2 in this case.

I still don't understand how GHC manages to create a standalone closure (nullary thunk, statically allocated, PC-relative) for defaultUserStyle dflags.

showSDoc dflags sdoc = renderWithStyle dflags sdoc (defaultUserStyle dflags)

It looks unary to me!

comment:34 in reply to:  33 Changed 20 months ago by heisenbug

Replying to heisenbug:

It looks unary to me!

Interesting, there is a saturated static constructor defined in Outputable

==================== Cmm produced by codegen ====================
2018-01-13 13:02:23.081236 UTC

[section ""data" . Outputable.defaultUserStyle1_closure" {
     Outputable.defaultUserStyle1_closure:
         const Outputable.PprUser_con_info;
         const Outputable.neverQualify_closure+1;
         const Outputable.AllTheWay_closure+1;
         const Outputable.Uncoloured_closure+1;
         const 3;
 }]


==================== Cmm produced by codegen ====================
2018-01-13 13:02:23.081545 UTC

[section ""data" . Outputable.defaultUserStyle_closure" {
     Outputable.defaultUserStyle_closure:
         const Outputable.defaultUserStyle_info;
         const 0;
 },
 Outputable.defaultUserStyle_entry() //  [R2]
         { info_tbl: [(chd7,
                       label: Outputable.defaultUserStyle_info
                       rep:HeapRep static { Fun {arity: 1 fun_type: ArgSpec 5} })]
           stack_info: arg_space: 8 updfr_space: Just 8
         }
     {offset ...

Side question: which optimisation creates this guy?

All uses of the Outputable.defaultUserStyle1_closure in Outputable are tagged with 1, e.g.:

           R1 = Outputable.defaultUserStyle1_closure+1;

But this is not the case in the module AsmCodeGen:

           P64[Hp - 16] = Outputable.defaultUserStyle1_closure;

This explains why there is no tag sometimes, but the tag being present most of the time.

comment:35 Changed 20 months ago by simonpj

Here the first (banged) field of SDC gets initialised to a global closure. Clearly it is non-tagged and not evaluated.

That is very wrong.

Can you show the -ddump-simpl -ddump-stg of the module that allocates an SDC closure with a non-tagged, non-evaluated argument? Thanks.

comment:36 Changed 20 months ago by simonpj

(That would explain why I often get crashes when mixing -O0 and -O1)

The strict-field unboxing choice should be made once and for all at the module declaring the data constructor. If client modules made a different choice there'd be chaos. If you think that is happening can you demonstrate?

comment:37 Changed 20 months ago by alexbiehl

I think I found a reproducer for this:

Trac14626_1.hs

module Trac14626_1 where

data Style = UserStyle Int
           | PprDebug

data SDC = SDC !Style !Int

defaultUserStyle :: Bool -> Style
defaultUserStyle True = UserStyle 123
defaultUserStyle False = PprDebug

Trac14626_2.hs

module Trac14626_2 where

import Trac14626_1

f :: Int -> SDC
f x = SDC (defaultUserStyle (x > 1)) x

Compiling with ghc Trac14626_1 Trac14626_2 -ddump-simpl -O results in a similar scenario than the one described by Heisenbug:

-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
defaultUserStyle2
defaultUserStyle2 = I# 123#

-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
defaultUserStyle1
defaultUserStyle1 = UserStyle defaultUserStyle2

-- RHS size: {terms: 7, types: 2, coercions: 0, joins: 0/0}
defaultUserStyle
defaultUserStyle
  = \ ds_dZ7 ->
      case ds_dZ7 of {
        False -> PprDebug;
        True -> defaultUserStyle1
      }

Our UserStyle 123 constant has been lifted to top-level, just like in Heisenbugs example.

Now looking at the Core of f

f
f = \ x_a1dk ->
      case x_a1dk of { I# x1_a2gV ->
      case ># x1_a2gV 1# of {
        __DEFAULT -> SDC PprDebug x1_a2gV;
        1# -> SDC defaultUserStyle1 x1_a2gV
      }
      }

(Note how f doesn't scrutinise defaultUserStyle1)

Looking at the CMM for f we can see

           ... 
           if (%MO_S_Le_W64(_s2hT::I64, 1)) goto c2ip; else goto c2is;
       c2ip:
           I64[Hp - 16] = SDC_con_info;
           P64[Hp - 8] = PprDebug_closure+2;
           I64[Hp] = _s2hT::I64;
           R1 = Hp - 15;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
       c2is:
           I64[Hp - 16] = SDC_con_info;
           P64[Hp - 8] = defaultUserStyle1_closure; -- defaultUserStyle1 isn't tagged!
           I64[Hp] = _s2hT::I64;
           R1 = Hp - 15;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;

The strange thing: Putting the definitions into one module Core/Stg look the same but the CMM correctly tags the closure.

comment:38 Changed 20 months ago by alexbiehl

I think I know what is happening:

  • When generating code for f the CodeGenerator wants to know the LambdaFormInfo (the closure type) of defaultUserStyle1.
  • Since defaultUserStyle1 is defined in another module we end up calling mkLFImported in StgCmmClosure which ultimatively gives an LFUnknown which always gets a DynTag 0 from lfDynTag.

I think we lack a bit of information here to give defaultUserStyle1 the correct LFCon lambda form. Maybe top-level binders should know its LambdaForm and include them in their interfaces.

comment:39 Changed 20 months ago by simonpj

Alex, you've nailed it. Thank you! I'll think about what to do. I'm astonished it hasn't led to more serious problems already.

comment:40 in reply to:  39 Changed 20 months ago by heisenbug

Replying to simonpj:

Alex, you've nailed it. Thank you! I'll think about what to do. I'm astonished it hasn't led to more serious problems already.

I'll second that! Great work Alex! Will there be another ticket I can block this one on?

Simon, do you think we should insert taggedness-checks into runtime when the compiler (resp. a binary) is built with some debug flag? My current solution is rather weak and won't work for constructors that have constraints (e.g. class dictionaries) in them. It may be a good way to detect similar hiccups in the future. I'll happily invest some effort to make my current checks water-proof, though.

comment:41 in reply to:  36 Changed 20 months ago by heisenbug

Replying to simonpj:

(That would explain why I often get crashes when mixing -O0 and -O1)

The strict-field unboxing choice should be made once and for all at the module declaring the data constructor. If client modules made a different choice there'd be chaos. If you think that is happening can you demonstrate?

I'll watch out for it. I've crashed my GHC in many very different ways in the last weeks, so it is impossible to remember. My taggedness-check on strict constructor fields will need to deal with this anyway, so we'll possibly get hard data soon.

comment:42 Changed 20 months ago by simonpj

Will there be another ticket I can block this one on?

I opened #14677. And I offer a patch there. Give it a try!

comment:43 Changed 20 months ago by heisenbug

Blocked By: 14677 added

comment:44 Changed 20 months ago by heisenbug

With the fix in #14677 I mostly get

  HC [stage 2] utils/ghctags/dist-install/build/Main.dyn_o
  HC [stage 2] utils/check-api-annotations/dist-install/build/Main.dyn_o
  HC [stage 2] utils/check-ppr/dist-install/build/Main.dyn_o
epollControl: does not exist (No such file or directory)
epollControl: does not exist (No such file or directory)
epollControl: does not exist (No such file or directory)

But at least no assertion failure related to tagging any more. (x86-64 darwin still has a snag).

Update: Having recompiled as make GhcStage2HcOpts="-g" GhcLibHcOpts="-O0 -g", I no more get the epollControl problems. Testsuite looks more-or-less sane, but I see changes with strictness signatures. Digging deeper now.

Last edited 20 months ago by heisenbug (previous) (diff)

comment:45 Changed 16 months ago by simonpj

Keywords: CodeGen added

comment:46 Changed 12 months ago by osa1

Cc: osa1 added

comment:47 Changed 11 months ago by simonmar

Cc: simonmar added
Note: See TracTickets for help on using tickets.