Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#9848 closed bug (fixed)

List.all does not fuse

Reported by: klapaucius Owned by: ekmett
Priority: normal Milestone: 8.0.1
Component: Core Libraries Version: 7.9
Keywords: Cc: hvr, ekmett, dfeuer, core-libraries-committee@…, akio
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: libraries/base/tests/T9848
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

primes = 2:3:filter isPrime [5,7..] :: [Int]
isPrime x = all (/= 0) . map (x `rem`) . takeWhile ((<= x) . (^2)) $ primes

main = print . length . takeWhile (<= 2^24) $ primes
  12,133,812,164 bytes allocated in the heap
      53,433,372 bytes copied during GC
      14,235,488 bytes maximum residency (7 sample(s))
       1,110,916 bytes maximum slop
              30 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     56357 colls,     0 par    0.094s   0.125s     0.0000s    0.0001s
  Gen  1         7 colls,     0 par    0.031s   0.034s     0.0049s    0.0154s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    8.094s  (  8.069s elapsed)
  GC      time    0.125s  (  0.159s elapsed)
  EXIT    time    0.000s  (  0.003s elapsed)
  Total   time    8.219s  (  8.231s elapsed)

  %GC     time       1.5%  (1.9% elapsed)

  Alloc rate    1,499,158,259 bytes per MUT second

  Productivity  98.5% of total user, 98.3% of total elapsed

Rec {
$sgo1_r2RE :: GHC.Prim.Int# -> [Int] -> Data.Monoid.All
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <L,U><L,U>]
$sgo1_r2RE =
  \ (sc_s2PS :: GHC.Prim.Int#) (sc1_s2PT :: [Int]) ->
    case sc_s2PS of _ [Occ=Dead] {
      __DEFAULT -> go_r2RF sc1_s2PT;
      0 ->
        GHC.Types.False
        `cast` (Sym Data.Monoid.NTCo:All[0] :: Bool ~R# Data.Monoid.All)
    }

go_r2RF :: [Int] -> Data.Monoid.All
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType <S,1*U>]
go_r2RF =
  \ (ds_a1YK :: [Int]) ->
    case ds_a1YK of _ [Occ=Dead] {
      [] ->
        GHC.Types.True
        `cast` (Sym Data.Monoid.NTCo:All[0] :: Bool ~R# Data.Monoid.All);
      : y_a1YP ys_a1YQ ->
        case y_a1YP of _ [Occ=Dead] { GHC.Types.I# x_a1Tk ->
        case x_a1Tk of _ [Occ=Dead] {
          __DEFAULT -> go_r2RF ys_a1YQ;
          0 ->
            GHC.Types.False
            `cast` (Sym Data.Monoid.NTCo:All[0] :: Bool ~R# Data.Monoid.All)
        }
        }
    }
end Rec }

lvl4_r2RG :: Int -> Data.Monoid.All
[GblId, Arity=1, Str=DmdType]
lvl4_r2RG =
  \ (x_aqY [OS=ProbOneShot] :: Int) ->
    case x_aqY of _ [Occ=Dead] { GHC.Types.I# y_a1Uc ->
    case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# 4 y_a1Uc)
    of _ [Occ=Dead] {
      False ->
        GHC.Types.True
        `cast` (Sym Data.Monoid.NTCo:All[0] :: Bool ~R# Data.Monoid.All);
      True ->
        $sgo1_r2RE
          (GHC.Prim.remInt# y_a1Uc 2)
          (letrec {
             go1_a1S5 [Occ=LoopBreaker] :: [Int] -> [Int]
             [LclId, Arity=1, Str=DmdType <S,1*U>]
             go1_a1S5 =
               \ (ds_a1S6 :: [Int]) ->
                 case ds_a1S6 of _ [Occ=Dead] {
                   [] -> GHC.Types.[] @ Int;
                   : y1_X1T4 ys_X1T6 ->
                     case y1_X1T4 of _ [Occ=Dead] { GHC.Types.I# x1_X1VM ->
                     case GHC.Prim.tagToEnum#
                            @ Bool (GHC.Prim.<=# (GHC.Prim.*# x1_X1VM x1_X1VM) y_a1Uc)
                     of _ [Occ=Dead] {
                       False -> GHC.Types.[] @ Int;
                       True ->
                         GHC.Types.:
                           @ Int
                           (case x1_X1VM of wild5_a1TE {
                              __DEFAULT ->
                                case GHC.Prim.remInt# y_a1Uc wild5_a1TE
                                of wild6_a1TJ { __DEFAULT ->
                                GHC.Types.I# wild6_a1TJ
                                };
                              (-1) -> GHC.Real.$fIntegralInt1;
                              0 -> GHC.Real.divZeroError @ Int
                            })
                           (go1_a1S5 ys_X1T6)
                     }
                     }
                 }; } in
           go1_a1S5 Main.main3)
    }
    }


foldr, however, fuse just fine:

primes = 2:3:filter isPrime [5,7..] :: [Int]
isPrime x = foldr (&&) True . map (/= 0) . map (x `rem`) . takeWhile ((<= x) . (^2)) $ primes

main = print . length . takeWhile (<= 2^24) $ primes
     365,770,752 bytes allocated in the heap
      48,197,488 bytes copied during GC
      13,031,232 bytes maximum residency (7 sample(s))
       1,570,524 bytes maximum slop
              28 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       694 colls,     0 par    0.016s   0.029s     0.0000s    0.0005s
  Gen  1         7 colls,     0 par    0.031s   0.032s     0.0046s    0.0146s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    3.438s  (  3.439s elapsed)
  GC      time    0.047s  (  0.062s elapsed)
  EXIT    time    0.000s  (  0.003s elapsed)
  Total   time    3.484s  (  3.504s elapsed)

  %GC     time       1.3%  (1.8% elapsed)

  Alloc rate    106,406,036 bytes per MUT second

  Productivity  98.7% of total user, 98.1% of total elapsed
lvl4_r2qr :: Int -> Bool
[GblId, Arity=1, Str=DmdType]
lvl4_r2qr =
  \ (x_aqW [OS=ProbOneShot] :: Int) ->
    case x_aqW of _ [Occ=Dead] { GHC.Types.I# y_a1tq ->
    case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# 4 y_a1tq)
    of _ [Occ=Dead] {
      False -> GHC.Types.True;
      True ->
        case GHC.Prim.remInt# y_a1tq 2 of _ [Occ=Dead] {
          __DEFAULT ->
            letrec {
              go_a1ud [Occ=LoopBreaker] :: [Int] -> Bool
              [LclId, Arity=1, Str=DmdType <S,1*U>]
              go_a1ud =
                \ (ds_a1ue :: [Int]) ->
                  case ds_a1ue of _ [Occ=Dead] {
                    [] -> GHC.Types.True;
                    : y1_X1vf ys_X1vh ->
                      case y1_X1vf of _ [Occ=Dead] { GHC.Types.I# x1_X1x9 ->
                      case GHC.Prim.tagToEnum#
                             @ Bool (GHC.Prim.<=# (GHC.Prim.*# x1_X1x9 x1_X1x9) y_a1tq)
                      of _ [Occ=Dead] {
                        False -> GHC.Types.True;
                        True ->
                          case x1_X1x9 of wild6_X1x3 {
                            __DEFAULT ->
                              case GHC.Prim.remInt# y_a1tq wild6_X1x3 of _ [Occ=Dead] {
                                __DEFAULT -> go_a1ud ys_X1vh;
                                0 -> GHC.Types.False
                              };
                            (-1) -> GHC.Types.False;
                            0 -> case GHC.Real.divZeroError of wild7_00 { }
                          }
                      }
                      }
                  }; } in
            go_a1ud Main.main3;
          0 -> GHC.Types.False
        }
    }
    }

And List.all from ghc 7.8 base library does fuse, so this is regression.

Windows 8.1 x64, ghc --info:

 [("Project name","The Glorious Glasgow Haskell Compilation System")
 ,("GCC extra via C opts"," -fwrapv")
 ,("C compiler command","$topdir/../mingw/bin/gcc.exe")
 ,("C compiler flags"," -U__i686 -march=i686 -fno-stack-protector")
 ,("C compiler link flags","")
 ,("Haskell CPP command","$topdir/../mingw/bin/gcc.exe")
 ,("Haskell CPP flags","-E -undef -traditional ")
 ,("ld command","$topdir/../mingw/bin/ld.exe")
 ,("ld flags","")
 ,("ld supports compact unwind","YES")
 ,("ld supports build-id","NO")
 ,("ld supports filelist","NO")
 ,("ld is GNU ld","YES")
 ,("ar command","$topdir/../mingw/bin/ar.exe")
 ,("ar flags","q")
 ,("ar supports at file","YES")
 ,("touch command","$topdir/touchy.exe")
 ,("dllwrap command","$topdir/../mingw/bin/dllwrap.exe")
 ,("windres command","$topdir/../mingw/bin/windres.exe")
 ,("libtool command","")
 ,("perl command","$topdir/../perl/perl.exe")
 ,("target os","OSMinGW32")
 ,("target arch","ArchX86")
 ,("target word size","4")
 ,("target has GNU nonexec stack","False")
 ,("target has .ident directive","True")
 ,("target has subsections via symbols","False")
 ,("Unregisterised","NO")
 ,("LLVM llc command","llc")
 ,("LLVM opt command","opt")
 ,("Project version","7.9.20141129")
 ,("Project Git commit id","447f592697fef04d1e19a2045ec707cfcd1eb59f")
 ,("Booter version","7.8.3")
 ,("Stage","2")
 ,("Build platform","i386-unknown-mingw32")
 ,("Host platform","i386-unknown-mingw32")
 ,("Target platform","i386-unknown-mingw32")
 ,("Have interpreter","YES")
 ,("Object splitting supported","YES")
 ,("Have native code generator","YES")
 ,("Support SMP","YES")
 ,("Tables next to code","YES")
 ,("RTS ways","l debug thr thr_debug thr_l thr_p ")
 ,("Support dynamic-too","NO")
 ,("Support parallel --make","YES")
 ,("Support reexported-modules","YES")
 ,("Support thinning and renaming package flags","YES")
 ,("Uses package keys","YES")
 ,("Dynamic by default","NO")
 ,("GHC Dynamic","NO")
 ,("Leading underscore","YES")
 ,("Debug on","False")
 ,("LibDir","D:\\msys32\\usr\\local\\lib")
 ,("Global Package DB","D:\\msys32\\usr\\local\\lib\\package.conf.d")
 ]

Change History (9)

comment:1 Changed 5 years ago by thomie

Architecture: x86Unknown/Multiple
Cc: core-libraries-committee@… added
Component: libraries/baseCore Libraries
Operating System: WindowsUnknown/Multiple
Owner: set to ekmett

comment:2 Changed 4 years ago by akio

Cc: akio added

comment:3 Changed 4 years ago by akio

It looks like the problem is in how foldMap for lists is defined.

Now all is defined in terms of foldMap, and foldMap is defined in terms of foldr. However, the unfolding for foldMap for lists contains a recursive function, rather than a reference to foldr. This means any list function defined in terms of foldMap has no chance of fusion.

Adding an INLINE pragma to the default definition for foldMap seems to fix the issue.

comment:4 Changed 4 years ago by akio

I'm preparing a patch, but what is a good way for testing this? Should I add a performance test?

comment:5 Changed 4 years ago by nomeata

Should I add a performance test?

Yes. Usually with list fusion you can create a test that allocates almost nothing if fusion kicks in, but allocates a lot of it does not.

Alternatively, you can copy the code from http://hackage.haskell.org/package/list-fusion-probe-0.1.0.3/docs/src/Data-List-Fusion-Probe.html into the test case and wrap the argument to all with fuseThis.

comment:6 Changed 4 years ago by akio

Status: newpatch

Thank you, I created Phab:D1126.

comment:7 Changed 4 years ago by Ben Gamari <ben@…>

In 22bbc1cf/ghc:

Make sure that `all`, `any`, `and`, and `or` fuse (#9848)

Test Plan: validate

Reviewers: hvr, austin, bgamari, simonpj

Reviewed By: simonpj

Subscribers: simonpj, thomie

Differential Revision: https://phabricator.haskell.org/D1126

GHC Trac Issues: #9848

comment:8 Changed 4 years ago by thomie

Milestone: 7.12.1
Resolution: fixed
Status: patchclosed
Test Case: libraries/base/tests/T9848

Nice improvements in nofib also:

nofib/allocs/circsim 	1332233568	- 4.02%	1278641568	bytes
nofib/allocs/multiplier 248700640	- 8.70%	227052640	bytes

comment:9 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

Note: See TracTickets for help on using tickets.