Opened 20 months ago

Last modified 9 months ago

#14677 new bug

Code generator does not correctly tag a pointer

Reported by: simonpj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.2.2
Keywords: CodeGen Cc: heisenbug, alexbiehl, osa1, simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: #15155 Blocking: #14626
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Consider

data T a = MkT ![a]

The pointer stored in a MkT constructor should always be correctly tagged, never tagged with un-evaluated 00. C.f. Commentary/Rts/HaskellExecution/PointerTagging

But this invariant is broken. Example taken from #14626, comment:37-39.

Trac14626_1.hs

module Trac14626_1 where

data Style = UserStyle Int
           | PprDebug

data SDC = SDC !Style !Int

defaultUserStyle :: Bool -> Style
defaultUserStyle True = UserStyle 123
defaultUserStyle False = PprDebug

Trac14626_2.hs

module Trac14626_2 where

import Trac14626_1

f :: Int -> SDC
f x = SDC (defaultUserStyle (x > 1)) x

Compiling with ghc Trac14626_1 Trac14626_2 -ddump-simpl -O results in a similar scenario than the one described by Heisenbug:

-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
defaultUserStyle2
defaultUserStyle2 = I# 123#

-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
defaultUserStyle1
defaultUserStyle1 = UserStyle defaultUserStyle2

-- RHS size: {terms: 7, types: 2, coercions: 0, joins: 0/0}
defaultUserStyle
defaultUserStyle
  = \ ds_dZ7 ->
      case ds_dZ7 of {
        False -> PprDebug;
        True -> defaultUserStyle1
      }

Our UserStyle 123 constant has been lifted to top-level, just like in Heisenbugs example.

Now looking at the Core of f

f
f = \ x_a1dk ->
      case x_a1dk of { I# x1_a2gV ->
      case ># x1_a2gV 1# of {
        __DEFAULT -> SDC PprDebug x1_a2gV;
        1# -> SDC defaultUserStyle1 x1_a2gV
      }
      }

(Note how f doesn't scrutinise defaultUserStyle1)

Looking at the CMM for f we can see

           ... 
           if (%MO_S_Le_W64(_s2hT::I64, 1)) goto c2ip; else goto c2is;
       c2ip:
           I64[Hp - 16] = SDC_con_info;
           P64[Hp - 8] = PprDebug_closure+2;
           I64[Hp] = _s2hT::I64;
           R1 = Hp - 15;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
       c2is:
           I64[Hp - 16] = SDC_con_info;
           P64[Hp - 8] = defaultUserStyle1_closure; -- defaultUserStyle1 isn't tagged!
           I64[Hp] = _s2hT::I64;
           R1 = Hp - 15;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;

When generating code for f the code generator wants to know the LambdaFormInfo (the closure type) of defaultUserStyle1.

Since defaultUserStyle1 is defined in another module we end up calling mkLFImported in StgCmmClosure which ultimatively gives an LFUnknown which always gets a DynTag 0 from lfDynTag.

I think we lack a bit of information here to give defaultUserStyle1 the correct LFCon lambda form. Maybe top-level binders should know its LambdaForm and include them in their interfaces.

Attachments (1)

lf-imported-patch (3.3 KB) - added by simonpj 20 months ago.
Untested patch

Download all attachments as: .zip

Change History (35)

Changed 20 months ago by simonpj

Attachment: lf-imported-patch added

Untested patch

comment:1 Changed 20 months ago by simonpj

Try this (untested) patch. It works for the particular example.

In haste...

comment:2 Changed 20 months ago by heisenbug

Thanks, I am building now...

comment:3 Changed 20 months ago by heisenbug

Blocking: 14626 added

comment:4 in reply to:  1 Changed 20 months ago by heisenbug

Replying to simonpj:

Try this (untested) patch. It works for the particular example.

In haste...

Looks like this is a definitive improvement. At least I get a non-crasching stage2 compiler. Can you bring this to master?

On circleci only the x86-64 darwin buildbot shows the "NOT TAGGED!" panic now. I'll investigate the residual problem(s).

Last edited 20 months ago by heisenbug (previous) (diff)

comment:5 Changed 20 months ago by heisenbug

Cc: heisenbug added

comment:6 Changed 20 months ago by alexbiehl

Gabor, I am on OSX. Can you tell me where your code lives and what your build settings are? I will try to reproduce.

comment:7 in reply to:  6 Changed 20 months ago by heisenbug

Replying to alexbiehl:

Gabor, I am on OSX. Can you tell me where your code lives and what your build settings are? I will try to reproduce.

I am in the context of branch wip/T14626 in the central repository. There I build with basically unchanged mk/build.mk (i.e. -O2). You can see the logs here: https://circleci.com/gh/ghc/ghc/tree/wip%2FT14626. I am pretty sure this patch from Simon alone works on all platforms. But I have no resources to validate it myself.

Last edited 20 months ago by heisenbug (previous) (diff)

comment:8 in reply to:  1 Changed 20 months ago by heisenbug

Replying to simonpj:

Try this (untested) patch. It works for the particular example.

In haste...

Simon, the branch has problems, as it turns out:

  HC [stage 2] utils/ghctags/dist-install/build/Main.dyn_o
  HC [stage 2] utils/check-api-annotations/dist-install/build/Main.dyn_o
  HC [stage 2] utils/check-ppr/dist-install/build/Main.dyn_o
epollControl: does not exist (No such file or directory)
epollControl: does not exist (No such file or directory)
epollControl: does not exist (No such file or directory)

I have sent the patch to wip/T14677 and hopefully circleci will reproduce it.

Quoting #14626 : Having recompiled as make clean; make GhcStage2HcOpts="-g" GhcLibHcOpts="-O0 -g", I no more get the epollControl problems.

Update: I looked at the circleci logs. Darwin buildbot fails very similarly with your patch:

ghc-stage2: reworkFd: event not found
ghc-stage2: reworkFd: event not found
ghc-stage2: reworkFd: event not found
ghc-stage2: reworkFd: event not found
ghc-stage2: ghc-stage2: reworkFd: event not foundioManagerDie: write

Interestingly, this does not seem to be fatal on OSX. So the build continues to link haddock and run it. I see the same traces from haddock too. So the base library is definitely miscompiled when with -O2. Upon running the tests, however, the problem becomes serious and the test stops.

Last edited 20 months ago by heisenbug (previous) (diff)

comment:9 in reply to:  6 Changed 20 months ago by heisenbug

Replying to alexbiehl:

Gabor, I am on OSX. Can you tell me where your code lives and what your build settings are? I will try to reproduce.

Looks like it is a CAF in OS X:

Watchpoint 1 hit:
old value: 4420539208
new value: 4420538632
Process 92235 stopped
* thread #1: tid = 0xc753fd, 0x00000001077b525e libHSrts_thr-ghc8.5.20180103.dylib`newCAF(reg=<unavailable>, caf=0x00000001037286b8) + 142 at Storage.c:429, queue = 'com.apple.main-thread', stop reason = watchpoint 1
    frame #0: 0x00000001077b525e libHSrts_thr-ghc8.5.20180103.dylib`newCAF(reg=<unavailable>, caf=0x00000001037286b8) + 142 at Storage.c:429
   426 	    bh = lockCAF(reg, caf);
   427 	    if (!bh) return NULL;
   428 	
-> 429 	    if(keepCAFs)
   430 	    {
   431 	        // Note [dyn_caf_list]
   432 	        // If we are in GHCi _and_ we are using dynamic libraries,
(lldb) 
Process 92235 resuming
Process 92235 stopped
* thread #1: tid = 0xc753fd, 0x00000001077babc8 libHSrts_thr-ghc8.5.20180103.dylib`checkTagged, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001077babc8 libHSrts_thr-ghc8.5.20180103.dylib`checkTagged
libHSrts_thr-ghc8.5.20180103.dylib`checkTagged:
->  0x1077babc8 <+0>: testb  $0x7, %bl
    0x1077babcb <+3>: jne    0x1077babe3               ; <+27>
    0x1077babcd <+5>: subq   $0x8, %rsp
    0x1077babd1 <+9>: leaq   0x165a4(%rip), %rdi       ; "NOT TAGGED! "
(lldb) p/x $rbx
(unsigned long) $8 = 0x00000001037286b8
(lldb) watchpoint list 
Number of supported hardware watchpoints: 4
Current watchpoints:
Watchpoint 1: addr = 0x1037286b8 size = 8 state = enabled type = w
    old value: 4420539208
    new value: 4420538632
(lldb) dis -s 4420538632
libHSrts_thr-ghc8.5.20180103.dylib`stg_IND_STATIC_info:
    0x1077c1108 <+0>:  movq   0x8(%rbx), %rbx
    0x1077c110c <+4>:  andq   $-0x8, %rbx
    0x1077c1110 <+8>:  jmpq   *(%rbx)
    0x1077c1112 <+10>: adcb   %al, (%rax)
    0x1077c1114 <+12>: addb   %al, (%rax)
    0x1077c1116 <+14>: nop    

libHSrts_thr-ghc8.5.20180103.dylib`stg_BLACKHOLE_info_dsp:
    0x1077c1118 <+0>:  addl   %eax, (%rax)
    0x1077c111a <+2>:  addb   %al, (%rax)
    0x1077c111c <+4>:  addb   %al, (%rax)
    0x1077c111e <+6>:  addb   %al, (%rax)
    0x1077c1120 <+8>:  addb   %al, %es:(%rax)
    0x1077c1123 <+11>: addb   %al, (%rax)
    0x1077c1125 <+13>: addb   %al, (%rax)
(lldb) dis -s 4420538632+16
libHSrts_thr-ghc8.5.20180103.dylib`stg_BLACKHOLE_info_dsp:
    0x1077c1118 <+0>:  addl   %eax, (%rax)
    0x1077c111a <+2>:  addb   %al, (%rax)
    0x1077c111c <+4>:  addb   %al, (%rax)
    0x1077c111e <+6>:  addb   %al, (%rax)
    0x1077c1120 <+8>:  addb   %al, %es:(%rax)
    0x1077c1123 <+11>: addb   %al, (%rax)
    0x1077c1125 <+13>: addb   %al, (%rax)
    0x1077c1127 <+15>: addb   %cl, -0x75(%rax)

libHSrts_thr-ghc8.5.20180103.dylib`stg_BLACKHOLE_info:
    0x1077c112a <+2>:  orb    %bpl, -0x37af0f9(%r8)
    0x1077c1131 <+9>:  addb   %al, (%rax)
    0x1077c1133 <+11>: addb   %cl, -0x75(%rax)
(lldb) p/x $rbx
(unsigned long) $10 = 0x00000001037286b8

So it is probably another bug that is easier to trigger on OS X.

Update:

When compiling make GhcStage2HcOpts="-g" GhcLibHcOpts="-O0 -g" on MacOS X, I get

  HC [stage 1] compiler/stage2/build/Finder.o
WARNING: file compiler/stgSyn/CoreToStg.hs, line 291
  multiple_found_rgXp False True
ghc-stage1: panic! (the 'impossible' happened)
  (GHC version 8.5.20180103 for x86_64-apple-darwin):
	ASSERT failed!
  multiple_found_rgXp
  Call stack:
      CallStack (from HasCallStack):
        callStackDoc, called at compiler/utils/Outputable.hs:1208:22 in ghc:Outputable
        assertPprPanic, called at compiler/stgSyn/CoreToStg.hs:257:59 in ghc:CoreToStg

So the assertion tripped is

    ASSERT2(consistentCafInfo id bind, ppr id )
Last edited 20 months ago by heisenbug (previous) (diff)

comment:10 Changed 20 months ago by heisenbug

Cc: alexbiehl added

alexbiehl: I added you to the CC list just in case you want to see what is happening.

comment:11 Changed 20 months ago by simonpj

Any progress, heisenbug?

comment:12 in reply to:  11 Changed 20 months ago by heisenbug

Replying to simonpj:

Any progress, heisenbug?

EDIT: Below analysis is inaccurate. Please skip this comment.

Sure! With my latest git push wip/T14677 I have added a testcase (at GHC root directory for the interim), which shows that invoking a function with a newtype Event = Event Int argument now (with the patch applied) incorrectly assumes that Event constructor is passed unpacked.

If you build the ghc-stage1 on this branch and compile using -O2 you'll get failures. With -O0 the test passes, (as does master and GHC-8.2.2). So something is still wrong.

In the buggy case the tagged pointer is checked at bit position 0 so the tagged I# constructor is being detected, but interpreted as an Event = [evtRead].

I don't understand enough about argument unpacking to resolve this. Maybe you can have a look?

The -g problems above are a red herring, and probably another bug. Please disregard for now.

I did not try defining Event locally yet, maybe that changes the situation.

Last edited 16 months ago by heisenbug (previous) (diff)

comment:13 Changed 20 months ago by simonpj

Could you possibly give the example that goes wrong, in source code, tidy-core, and the wrong Cmm we get? That would be easier to discuss. I can't make sense of your question without a bit more context. Thanks!

comment:14 Changed 20 months ago by alexbiehl

Gabor,

I have reduced your example down to

module Main where

import GHC.Event
import Prelude hiding (mod)

f e2 = do
  print e2
  print (e2 == evtRead)

main = f evtWrite

Expected output

[evtWrite]
False

With Gabors branch we get

[evtRead]
False

I compared CMM from a stock ghc-8.4 alpha vs. a GHC built from your branch. Really the only significant difference in CMM is

(GHC from your branch)

          ... 
          R2 = PicBaseReg + (GHC.Event.Internal.evtWrite_closure+1);
          call Main.f1_info(R2) args: 8, res: 0, upd: 8;
          ... 

(Stock ghc-8.4 alpha)

          ...
          R2 = PicBaseReg + GHC.Event.Internal.evtWrite_closure;
          call Main.f1_info(R2) args: 8, res: 0, upd: 8;
          ...

comment:15 Changed 20 months ago by alexbiehl

Ha, got it!

evtWrite is defined as

-- | The file descriptor is ready to accept a write.
evtWrite :: Event
evtWrite = Event 2

This compiles to this CMM:

==================== Optimised Cmm ====================
2018-01-26 08:16:02.207998 UTC

section ""data" . evtWrite1_r3aU_closure" {
    evtWrite1_r3aU_closure:
        const GHC.Types.I#_con_info;
        const 2;
}


==================== Optimised Cmm ====================
2018-01-26 08:16:02.209627 UTC

section ""data" . T14677_1.evtWrite_closure" {
    T14677_1.evtWrite_closure:
        const stg_IND_STATIC_info;
        const evtWrite1_r3aU_closure+1;
        const 0;
        const 0;
}

Now as seen in my previous comment we refer to evtWrite_closure like this:

          ... 
          R2 = PicBaseReg + (GHC.Event.Internal.evtWrite_closure+1);
          call Main.f1_info(R2) args: 8, res: 0, upd: 8;
          ...

Now, take a look at the CMM for evtWrite_closure again. evtWrite_closure is basically an IND_STATIC closure pointing to the actual closure data! So by tagging the pointer we don't enter and never get the desired value! Our tests were working on the address of evtWrite1_r3aU_closure+1. What follows is a test which confirms the theory:

T14677_1.hs

module T14677_1 where

import Data.Bits
import Data.List

newtype Event = Event { toInt :: Int }
	deriving (Eq)

evtNothing :: Event
evtNothing = Event 0
{-# INLINE evtNothing #-}

-- | Data is available to be read.
evtRead :: Event
evtRead = Event 1
{-# INLINE evtRead #-}

-- | The file descriptor is ready to accept a write.
evtWrite :: Event
evtWrite = Event 2
{-# INLINE evtWrite #-}

-- | Another thread closed the file descriptor.
evtClose :: Event
evtClose = Event 4
{-# INLINE evtClose #-}

eventIs :: Event -> Event -> Bool
eventIs (Event a) (Event b) = a .&. b /= 0

-- | @since 4.3.1.0
instance Show Event where
    show e = '[' : (intercalate "," . filter (not . null) $
                    [evtRead `so` "evtRead",
                     evtWrite `so` "evtWrite",
                     evtClose `so` "evtClose"]) ++ "]"
        where ev `so` disp | e `eventIs` ev = disp
                           | otherwise      = ""

T14677_2

module Main where

import T14677_1

f e2 = do
  print (toInt e2)
  print e2
  print (e2 == evtRead)

main = f evtWrite

This prints for my machine:

$ inplace/bin/ghc-stage1 T14677_2.hs T14677_1.hs -O2
$ ./T14677_2
4307143777
[evtRead]
False
Last edited 20 months ago by alexbiehl (previous) (diff)

comment:16 in reply to:  15 ; Changed 20 months ago by heisenbug

Replying to alexbiehl:

Ha, got it!

evtWrite is defined as

[snip]

Yeah, a simple main = print evtWrite will show the issue. I just had a long dog walk, thanks for reducing this! Yesterday I checked the diffs in the generated Cmm and the only change was that the evtWrite_closure+1 gets passed (instead of evtWrite_closure). So your analysis sounds reasonable. The callee thinks it got the dereferenced Int, but actually it is a tagged pointer. This is somehow related to the fact that the newtype is defined in another module.

comment:17 in reply to:  16 Changed 20 months ago by heisenbug

Replying to heisenbug:

Replying to alexbiehl:

Alex, if you manage to put together a tentative patch, feel free to check it in to the branch and push. That will trigger the CI machinery.

Ha, got it!

evtWrite is defined as

[snip]

Yeah, a simple main = print evtWrite will show the issue. I just had a long dog walk, thanks for reducing this! Yesterday I checked the diffs in the generated Cmm and the only change was that the evtWrite_closure+1 gets passed (instead of evtWrite_closure). So your analysis sounds reasonable. The callee thinks it got the dereferenced Int, but actually it is a tagged pointer. This is somehow related to the fact that the newtype is defined in another module.

comment:18 Changed 20 months ago by alexbiehl

I think a patch might be more involved: This is the core for evtWrite

-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
evtWrite1_r335 :: Int
[GblId, Caf=NoCafRefs, Str=m]
evtWrite1_r335 = GHC.Types.I# 2#

-- RHS size: {terms: 1, types: 0, coercions: 2, joins: 0/0}
evtWrite [InlPrag=INLINE (sat-args=0)] :: Event
[GblId,
 Str=m,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=0,unsat_ok=False,boring_ok=False)
         Tmpl= (GHC.Types.I# 2#)
               `cast` (Sym T14677_1.N:Event[0] :: (Int :: *) ~R# (Event :: *))}]
evtWrite
  = evtWrite1_r335
    `cast` (Sym T14677_1.N:Event[0] :: (Int :: *) ~R# (Event :: *))

So it's already an "indirection" in Core! Maybe we shouldn't lift the integer out of evtWrite. Expressions of the form

f = e `cast` co

where

  • f is top level
  • e is a satured constructor application

could be left intact. CoreToStg could spot this case and make it a proper StgConApp which will then be codegened into a proper constant.

Not sure how much of an edge case this would be.

Last edited 20 months ago by alexbiehl (previous) (diff)

comment:19 in reply to:  18 Changed 20 months ago by heisenbug

Replying to alexbiehl:

What happens when stg_IND_STATIC_info is entered? There seems to be no machine code, will that crash?

comment:20 Changed 20 months ago by alexbiehl

Gabor,

see https://github.com/ghc/ghc/blob/master/rts/StgMiscClosures.cmm#L268. Take the indirectee, untag and enter. (Btw. why do we untag the value and enter, wouldn't we want to just return R1 if it is tagged?)

edit: (entering the indirectee makes sense to reduce chains of IND_STATICs to the static value)

Last edited 20 months ago by alexbiehl (previous) (diff)

comment:21 in reply to:  20 Changed 20 months ago by heisenbug

Replying to alexbiehl:

Gabor,

see https://github.com/ghc/ghc/blob/master/rts/StgMiscClosures.cmm#L268. Take the indirectee, untag and enter. (Btw. why do we untag the value and enter, wouldn't we want to just return R1 if it is tagged?)

So StgInd_indirectee grabs the value evtWrite1_r3aU_closure+1 (why the two zeroes behind that???).

If we (could?) know that evtWrite_closure just redirects to that, why not perform the redirection at Cmm-gen time? The unfolding Tmpl surely gives a hint how to do it!

EDIT: Generate something like

  R2 = PicBaseReg + (GHC.Event.Internal.evtWrite_closure);
  R2 = StgInd_indirectee(R2);
  call Main.f1_info(R2) args: 8, res: 0, upd: 8;

edit: (entering the indirectee makes sense to reduce chains of IND_STATICs to the static value)

Last edited 20 months ago by heisenbug (previous) (diff)

comment:22 Changed 20 months ago by heisenbug

I have something along the lines of my last comment, that seems to work :-)

It doesn't :-(

I am right now looking into cgTopRhsClosure in StgCmmBind.hs to figure out why the indirection is required at all, and why we tag the pointer to that indirection at all.

Last edited 20 months ago by heisenbug (previous) (diff)

comment:23 Changed 20 months ago by heisenbug

Okay, I got it working. I now simply exclude casted constructors (by looking at the unfolding template), as they seem to be implemented by IND_STATIC. (They should not, rather they could be simply alias labels?) Thus those won't get tagged closure pointers, and will be entered as before. There are 66 cases where this pessimisation is triggered in GHC, so I guess this is a low price to pay. All of those are related to the Int -> Event casting.

Here is my fix: https://github.com/ghc/ghc/commit/eef0c057551ef860c1ace2e1c7509bcdc3c8eb91

Simon, any idea how to do this better?

comment:24 Changed 17 months ago by heisenbug

Blocked By: 15155 added

comment:25 Changed 16 months ago by simonpj

Keywords: CodeGen added

comment:26 Changed 12 months ago by osa1

Cc: osa1 added

comment:27 Changed 12 months ago by simonmar

Cc: simonmar added

comment:28 Changed 9 months ago by simonpj

I'm getting a bit lost.

  1. This ticket, #14677, started with "GHC does not correctly tag the pointer of a strict data constructor".
  1. But in ticket:15155#comment:11 we decided that we do NOT have the invariant that a strict constructor always has a correctly-tagged pointer in it.

An example of the difficulty is in ticket:15696#comment:36

  1. Simon Marlow suggested one way to restore the invariant in ticket:15155#comment:17. But we have not implemented this.

So the premise of this ticket is wrong: until we do (3) there is no invariant that is broken by the current IND_STATIC story.

So the MR for this ticket, which uses LLVM aliases, might reduce code size a tiny bit, but will not (of itself) change the situation for tagged pointers in strict constructors.

Does everyone agree?

I'm hazy about the status of #14626, which is mentioned as a loose end in ticket:15696#comment:60

Last edited 9 months ago by simonpj (previous) (diff)

comment:29 in reply to:  28 Changed 9 months ago by heisenbug

We will probably not be able to restore the invariant for all cases. Here is a curious case

module A where
newtype A = A Int
num :: Int
num = 42
{-# NOINLINE a #-}
a = A num
module Main where
import A
data T = MkT A
t = MkT a

In the absence of your patch MkT gets passed an untagged pointer, because the unfolding of a is wrongly interpreted. GHC should be able to know that a has the I# constructor.

Your patch improves on the situation. MkT now gets passed a tagged pointer. BUT: when the pointee a is an IND_STATIC closure (in the above case it is), we get incorrect behaviour, as the closure has the wrong layout and wants to be entered. #15155 recovers correctness.

I don't really care about the bang invariant. What I care is that GHC is not coerced into deliberately throwing away perfectly good hints (leading to suboptimal tagging) because of the IND_STATIC hack.

comment:30 Changed 9 months ago by simonpj

Great. So I think we now agree that the IND_STATIC patch is about (a) code size and (b) performance; and that both are very small (albeit worthwhile) improvements. It is not about correctness. OK so far?

When you say "your patch", what do you mean?

When you say "incorrect behaviour" do you meant that we get a seg-fault? Or just that we enter a closure that could be properly tagged? To be concrete, suppose we have

f x = case t of
        MkT (A a) -> case a of
                      I# y -> (I# (y +# 1) + x

main = print (map f [1..1000])

In the code for the case a in f, we'll have a conditional that tests the tag-bits on a. If they are 00 we'll enter a; and this path must exist in case we put a thunk in MkT; indeed MkT is not even strict in this case.

However, we'd prefer not to take that code path, so it'd be better to put a correctly-tagged pointer into the MkT. To do this we need your patch; and it must get exploited in two different places in the code generator.

  • When allocating a static closure, like in your example
    t = MkT a
    
  • When allocating a dynamic closure, like this:
    data S = MkS Int A
    f x = ...(MkS x a)...
    
    Here that MkS x a should get a correctly-tagged second field.

It'd be good to ensure that you test both situations.

Now it is much clearer. If you could reflect the clarity in the Note, that would be good.

comment:31 in reply to:  30 Changed 9 months ago by heisenbug

Replying to simonpj:

Great. So I think we now agree that the IND_STATIC patch is about (a) code size and (b) performance; and that both are very small (albeit worthwhile) improvements. It is not about correctness. OK so far?

Agree, we may have different definitions of "correct". No crashes. Yep.

When you say "your patch", what do you mean?

This one: https://ghc.haskell.org/trac/ghc/attachment/ticket/14677/lf-imported-patch

Applying this causes crashes. #15155 helps with that. #16039 helps more.

comment:32 in reply to:  30 Changed 9 months ago by heisenbug

When you say "incorrect behaviour" do you meant that we get a seg-fault? Or just that we enter a closure that could be properly tagged?

I mean the latter. Compiling the following code

module T15155 (a, B(..), b) where

newtype A = A Int
newtype B = B A

{-# NOINLINE a #-}
a = A 42
b = B a

to STG, I see:

T15155.a1 :: GHC.Types.Int
[GblId, Caf=NoCafRefs, Str=m, Unf=OtherCon []] =
    CCS_DONT_CARE GHC.Types.I#! [42#];

T15155.a [InlPrag=NOINLINE] :: T15155.A
[GblId, Str=m, Unf=OtherCon []] =
    [] \u [] T15155.a1;

T15155.b :: T15155.B
[GblId, Str=m, Unf=OtherCon []] =
    [] \u [] T15155.a;

a and b look reasonably similar from their structure. The CMM picture is quite different:

[section ""data" . T15155.a1_closure" {
     T15155.a1_closure:
         const GHC.Types.I#_con_info;
         const 42;
 }]

[section ""data" . T15155.a_closure" {
     T15155.a_closure:
         const stg_IND_STATIC_info;
         const T15155.a1_closure+1;
         const 0;
         const 0;
 }]

[section ""data" . T15155.b_closure" {
     T15155.b_closure:
         const stg_IND_STATIC_info;
         const T15155.a_closure;
         const 0;
         const 0;
 }]

There is no tag on the indirectee of T15155.b_closure, even if they possess equivalent unfoldings. Of course there is some obscure rule somewhere that strips the tag, so that no crash happens. For me "correct" would mean that T15155.a_closure+1 is allowed, and being the consequence of

T15155.a [InlPrag=NOINLINE] :: T15155.A
[GblId, Str=m, >>>>Unf=OtherCon []<<<<] =
    [] \u [] T15155.a1;

Maybe the absence of Caf=NoCafRefs is the reason why the tag is missing.

OTOH "not crashing" is merely law-abiding in my eyes.

Last edited 9 months ago by heisenbug (previous) (diff)

comment:33 Changed 9 months ago by simonpj

If you didn't have that {-# NOINLINE a #-}, then the IND_STATIC would point directly to a1_closure, and would be correctly tagged. And in general, absent {-# NOINLINE #-} directives, I believe that all that cases your patch optimises would be optimised by the simplifier to get the code you want.

So that seems to narrow the scope of the optimisation considerably:

  • You save code-size on many (or perhaps all?) IND_STATIC closures.
  • You save entering any such IND_STATICs that are marked NOINLINE.

So yes, it's an improvement, but I wonder whether it's worth the extra code and maintenance burden? I'm thinking of someone looking at this code in 5 yrs time.

Do you think it is?

Thanks for doing this -- I'm not criticising, just trying to understand with precision what is and is not happening. Clarity is good!

comment:34 Changed 9 months ago by heisenbug

I am going to dump here the differences between 8.6.3-created libs and HEAD's libs, w.r.t. exported nullary symbols:

$ nm ghc-8.6.3/lib/ghc-8.6.3/base-4.12.0.0/libHSbase-4.12.0.0.a | grep evtRead
                 U base_GHCziEventziInternal_evtRead_closure
                 U base_GHCziEventziInternal_evtRead_closure
0000000000000000 D base_GHCziEventziInternal_evtRead_closure
                 U base_GHCziEventziInternal_evtRead_closure
                 U base_GHCziEventziInternal_evtRead_closure
                 U base_GHCziEventziInternal_evtRead_closure

(lib/ghc-8.4.4/base-4.11.1.0/libHSbase-4.11.1.0.a is identical to above.)

Below happens after removing

  • libraries/base/GHC/Event/Internal.hs

    diff --git a/libraries/base/GHC/Event/Internal.hs b/libraries/base/GHC/Event/Internal.hs
    index 5778c6f..bfea391 100644
    a b evtNothing = Event 0 
    4949-- | Data is available to be read.
    5050evtRead :: Event
    5151evtRead = Event 1
    52 {-# INLINE evtRead #-}
$ nm _build/stage1/lib/x86_64-linux-ghc-8.7.20190106/base-4.12.0.0/libHSbase-4.12.0.0.a  | grep evtRe
                 U base_GHCziEventziInternal_evtRead1_closure
                 U base_GHCziEventziInternal_evtRead1_closure
0000000000000290 D base_GHCziEventziInternal_evtRead1_closure
00000000000002a0 D base_GHCziEventziInternal_evtRead_closure
                 U base_GHCziEventziInternal_evtRead1_closure
                 U base_GHCziEventziInternal_evtRead1_closure
                 U base_GHCziEventziInternal_evtRead1_closure

The unfolding now refers to a (haskell and assembly-level) exported symbol:

  evtRead :: Event
  {- Strictness: m,
     Unfolding: InlineRule (0, True, True)
                evtRead1 `cast` (Sym (N:Event[0])) -}

  evtRead1 :: GHC.Types.Int
  {- HasNoCafRefs, Strictness: m, Unfolding: (GHC.Types.I# 1#) -}

Not sure that this is right.

OTOH, v8.6.3 exports only one (caveat: with INLINE pragma) :

  evtRead :: Event
  {- Strictness: m, Inline: (sat-args=0),
     Unfolding: InlineRule (0, False, False)
                (GHC.Types.I# 1#) `cast` (Sym (N:Event[0])) -}

(from .stack/programs/x86_64-linux/ghc-8.6.3/bin/ghc --show-iface .stack/programs/x86_64-linux/ghc-8.6.3/lib/ghc-8.6.3/base-4.12.0.0/GHC/Event/Internal.hi)

N.B. 8.4.4 generates equal:

  evtRead :: Event
  {- Strictness: m, Inline: (sat-args=0),
     Unfolding: InlineRule (0, False, False)
                (I# 1#) `cast` (Sym (N:Event[0])) -}

Now a direct comparison (HEAD, evtRead is INLINE, evtWrite not)

  evtRead :: Event
  {- Strictness: m, Inline: (sat-args=0),
     Unfolding: InlineRule (0, False, True)
                (GHC.Types.I# 1#) `cast` (Sym (N:Event[0])) -}

  evtWrite :: Event
  {- Strictness: m,
     Unfolding: InlineRule (0, True, True)
                evtWrite1 `cast` (Sym (N:Event[0])) -}

Just wanted to add that http://ghc.haskell.org/trac/ghc/changeset/64c54fff2d6534e1229359a8d357ec1dc6c21b73/ghc makes local symbols to disappear from the symbol table, while here we explicitly export some, so that inlining won't create unresolved references.

Last edited 9 months ago by heisenbug (previous) (diff)
Note: See TracTickets for help on using tickets.