Opened 17 months ago

Last modified 9 months ago

#15113 patch bug

Do not make CAFs from literal strings

Reported by: simonpj Owned by:
Priority: normal Milestone: 8.10.1
Component: Compiler Version: 8.2.2
Keywords: CAFs, CodeGen Cc: osa1, MikolajKonarski
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #16014 Differential Rev(s): Phab:D4717
Wiki Page:

Description (last modified by simonpj)

Currently (as I discovered in #15038), we get the following code for GHC.Exception.Base.patError:

lvl2_r3y3 :: [Char]
[GblId]
lvl2_r3y3 = unpackCString# lvl1_r3y2

-- RHS size: {terms: 7, types: 6, coercions: 2, joins: 0/0}
patError :: forall a. Addr# -> a
[GblId, Arity=1, Str=<B,U>x, Unf=OtherCon []]
patError
  = \ (@ a_a2kh) (s_a1Pi :: Addr#) ->
      raise#
        @ SomeException
        @ 'LiftedRep
        @ a_a2kh
        (Control.Exception.Base.$fExceptionPatternMatchFail_$ctoException
           ((untangle s_a1Pi lvl2_r3y3)
            `cast` (Sym (Control.Exception.Base.N:PatternMatchFail[0])
                    :: (String :: *) ~R# (PatternMatchFail :: *))))

That stupid lvl2_r3y3 :: String is a CAF, and hence patError has CAF-refs, and hence so does any function that calls patError, and any function that calls them.

That's bad! Lots more CAF entries in SRTs, lots more work traversing those SRTs in the garbage collector. And for what? To share the work of unpacking a C string! This is nuts.

What to do?

  1. Somehow refrain from floating unpackCSTring# lit to top level, even if you could otherwise do so. But that seems very ad-hoc, and it make the function bigger and less inlinable.
  1. Treat a top level definition
    x :: [Char]
    x = unpackCString# y
    
    as NOT a CAF, and make it single-entry so that the thunk is not updated. Then every use of x will unpack the string afresh, which is probably a good idea anyhow.

I like this more. It would be implemented somewhere in the code generator.

Change History (12)

comment:1 Changed 17 months ago by osa1

Cc: osa1 added

comment:2 Changed 17 months ago by simonpj

See ticket:15038#comment:33 for why this ticket (useful though it may be) will not actually benefit patError and friends.

Last edited 16 months ago by bgamari (previous) (diff)

comment:3 Changed 17 months ago by simonpj

Keywords: CAFs added

comment:4 Changed 15 months ago by bgamari

Differential Rev(s): Phab:D4717
Milestone: 8.6.18.8.1
Status: newpatch

Phab:D4717 marks top-level strings as single-entry. I haven't yet had a chance to characterise the effect of this.

comment:5 Changed 15 months ago by simonpj

I like the idea of trying this out.

It might actually make programs allocate more (because we start using call-by-name), but it's usually very short-lived allocation.

To judge the effect, you might want to add a way to show the total SRT table sizes (statically); and the number of SRT links traversed by the garbage collector (dynamically). Reducing these two is the main point, so we should measure them. For the latter, perhaps add to the -ticky stats?

Variants to explore

  • Broaden scope: do this for non-top-level bindings as well as top-level ones. (The Phab patch has bug on this point; it does it only for the non-top-level ones!)
  • Broaden scope: do this for any function marked CONLIKE. Currently those functions (in libraries/) are
    ./ghc-prim/GHC/CString.hs:74:{-# NOINLINE CONLIKE unpackCString# #-}
    ./ghc-prim/GHC/CString.hs:124:{-# NOINLINE CONLIKE unpackCStringUtf8# #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:74:#define CONLIKE
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:157:{-# INLINE CONLIKE size #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:161:{-# INLINE CONLIKE runF #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:166:{-# INLINE CONLIKE emptyF #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:171:{-# INLINE CONLIKE pairF #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:185:{-# INLINE CONLIKE contramapF #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:190:{-# INLINE CONLIKE toB #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:195:{-# INLINE CONLIKE liftFixedToBounded #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:199:{-# INLINE CONLIKE storableToF #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:204:{-# INLINE CONLIKE liftIOF #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:218:{-# INLINE CONLIKE sizeBound #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:225:{-# INLINE CONLIKE runB #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:238:{-# INLINE CONLIKE contramapB #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:243:{-# INLINE CONLIKE emptyB #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:248:{-# INLINE CONLIKE pairB #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:264:{-# INLINE CONLIKE eitherB #-}
    ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:277:{-# INLINE CONLIKE condB #-}
    
    I have no idea if this idea would be a good one for those functions in bytestring; maybe not.

comment:6 Changed 12 months ago by osa1

I started looking into this, but I realized that nofib can't be run with GHC 8.6 (and probably with HEAD) so I'll need to fix nofib first.

What I've done so far is I implemented two counters:

  • A runtime counter for SRT traversals done by the GC
  • A compiler counter to count how many SRTs are generated (I'm also recording sum of sizes of the SRTs although this is probably not too useful)

The runtime counter is printed with +RTS -t:

<<ghc: 98529880 bytes, 63 GCs, 3481032/7949096 avg/max bytes residency (6 samples), 20M in use, 0.000 INIT (0.000 elapsed), 0.038 MUT (0.198 elapsed), 0.059 GC (0.059 elapsed), 226752 SRT scavs :ghc>>

The "226752 SRT scavs" part is new, and it means that we scavenged 226752 SRTs.

The compiler counter prints once per binding group and it looks like this:

SRTs: 4 SRT(s), total size: 17

This means that 4 SRTs are generated, and total size of all those SRTs are 17 words.

The patch is here: https://github.com/osa1/ghc/commit/c46fe24a02591edd3ce7b6aa70246493826d218d

I'll fix nofib first and then try to collect these numbers from nofib output.

comment:7 Changed 12 months ago by osa1

I managed to get nofib running. Here's an example output for runtime SRT traversals:

NoFib Results

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem  SRTScavs
--------------------------------------------------------------------------------
...
        circsim           0.0%      0.0%      0.0%      0.0%      0.0%      0.0%
       clausify           0.0%      0.0%     0.013     0.013      0.0%      0.0%
  comp_lab_zift           0.0%      0.0%     0.068     0.068      0.0%      0.0%
       compress           0.0%      0.0%     0.054     0.054      0.0%      0.0%
      compress2           0.0%      0.0%     0.047     0.047      0.0%      0.0%
    constraints           0.0%      0.0%      0.0%      0.0%      0.0%      0.0%
...


Scavenged SRTs

-------------------------------------------------------------------------------
        Program            nofib-log        nofib-log
-------------------------------------------------------------------------------
             CS                  627             0.0%
            CSD                  622             0.0%
             FS                  629             0.0%
              S             16363319             0.0%
             VS                 2529             0.0%
            VSD                  310             0.0%
            VSM                  628             0.0%
           anna                 7951             0.0%
           ansi                  310             0.0%
           atom                 2102             0.0%
         awards                  310             0.0%
...

(all 0.0% becuase I used same file twice to run nofib-analyse)

I have another program to parse compile-time numbers and generate this:

PGM                  | SRTs  | SRT size
fluid                | 5     | 25
wave4main            | 25    | 112
integer              | 17    | 74
...

The code is here: https://github.com/osa1/nofib/tree/print_srt_stuff

I also have a GHC branch that implements a new flag -favoid-cafs=[0|1|2|3] where 0 means compile as before, 1 means avoid making CAFs for top-level unpackCString#s, 2 means in addition to 1 avoid making CAFs for non-top-level unpackCString#s, 3 means in addition to 2 avoid making CAFs for CONLIKEs too. Code is here: https://github.com/osa1/ghc/tree/print_srt_stuff

However for some reason the flag doesn't actually change number of SRTs or SRT sizes. I don't know why yet, I'll debug further. I think it may be because it's too late to change CAFFY-ness of values in CoreToStg (remember that CAFFY-enss of values are computed in CorePrep, which is before CoreToStg). I'll debug further.

comment:8 Changed 12 months ago by MikolajKonarski

Cc: MikolajKonarski added

comment:9 Changed 10 months ago by bgamari

Milestone: 8.8.18.10.1

Bumping to 8.10.

comment:10 Changed 10 months ago by AndreasK

The string caf code also contributes significantly to code size. See #16014 for details.

comment:11 Changed 9 months ago by simonpj

Description: modified (diff)

Looking at #16014, I like alternative (2) from the Description better and better. If we spot

x = unpackCString# "blah"#

in the code generator, we could allocate a top-level closure with

  • Info-pointer: rtsUnpackString_info
  • One word of payload, a pointer to the literal string "blah"#.

Now we can hand-write the single blob of code (plus info table) rtsUnpackString_info to unpack the string. Easy! And the overhead per string is only two words (for the closure) rather than all the stuff described in #16014.

comment:12 Changed 9 months ago by simonpj

Keywords: CodeGen added

Omer: would you like to try this when you are done with CafInfo?

Note: See TracTickets for help on using tickets.