Opened 5 years ago

Closed 20 months ago

#9885 closed bug (fixed)

ghc-pkg parser eats too much memory

Reported by: gnezdo Owned by:
Priority: low Milestone: 8.4.1
Component: ghc-pkg Version: 7.8.3
Keywords: Cc:
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Parsing of spec files in ghc-pkg scales very poorly. The following script demonstrates memory consumption growth as a function of the number of tokens in ld-options (16K leads to ~6G)

#!/bin/bash # Demonstrates memory consumption behavior of ghc-pkg as a function of # the number of ld-options arguments.

for i in {10..14}; do

size=$((1 << $i)) echo $size rm -fr a.packages /usr/bin/ghc-pkg init a.packages cat > a.spec <<EOF

name: project id: project license: AllRightsReserved version: 1.0 EOF

echo -n ld-options: >> a.spec for i in $(seq 1 $size); do echo -n "x "; done >> a.spec /usr/bin/time /usr/bin/ghc-pkg --global-package-db a.packages register --force a.spec

done exit 0

# This was collected with ghc-7.6.3. 7.8.3 fares as badly. % bash a.sh 1024 Reading package info from "a.spec" ... done. 0.03user 0.00system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 17848maxresident)k 0inputs+32outputs (0major+4973minor)pagefaults 0swaps 2048 Reading package info from "a.spec" ... done. 0.09user 0.01system 0:00.10elapsed 99%CPU (0avgtext+0avgdata 60872maxresident)k 0inputs+56outputs (0major+15723minor)pagefaults 0swaps 4096 Reading package info from "a.spec" ... done. 0.41user 0.07system 0:00.49elapsed 99%CPU (0avgtext+0avgdata 294340maxresident)k 0inputs+104outputs (0major+74090minor)pagefaults 0swaps 8192 Reading package info from "a.spec" ... done. 1.72user 0.30system 0:02.04elapsed 99%CPU (0avgtext+0avgdata 1168836maxresident)k 0inputs+192outputs (0major+292716minor)pagefaults 0swaps 16384 Reading package info from "a.spec" ... done. 9.11user 1.38system 0:10.51elapsed 99%CPU (0avgtext+0avgdata 5396932maxresident)k 0inputs+376outputs (0major+1349736minor)pagefaults 0swaps

Change History (6)

comment:1 Changed 5 years ago by gnezdo

I did some profiling. To save somebody a bit of digging, reproducing the build/profile/plot script here (would love to hear how to do this more optimally):

#!/bin/bash
# Demonstrates memory consumption behavior of ghc-pkg as a function of
# the number of ld-options arguments.

set -eu

(cd ~/ghc-copy/Cabal/Cabal && cabal install --enable-library-profiling --enable-executable-profiling --force-reinstalls --ghc-option=-rtsopts --ghc-option=-prof --ghc-option=-fprof-auto)
(cd ~/ghc-copy/ghc/libraries/bin-package-db && cabal install --enable-library-profiling --enable-executable-profiling --force-reinstalls --ghc-option=-rtsopts --ghc-option=-prof --ghc-option=-fprof-auto)
(cd ~/ghc-copy/ghc/utils/ghc-pkg && cabal install --enable-library-profiling --enable-executable-profiling --force-reinstalls --ghc-option=-rtsopts --ghc-option=-prof --ghc-option=-fprof-auto --ghc-option=-DBOOTSTRAPPING)

ghcpkg=~/.cabal/bin/ghc-pkg
for i in {13..13}; do
  size=$((1 << $i))
  echo $size
  rm -fr a.packages
  $ghcpkg init a.packages
  cat > a.spec <<EOF
name: project
id: project
license: AllRightsReserved
version: 1.0
EOF
  echo -n ld-options: >> a.spec
  for i in $(seq 1 $size); do echo -n "x "; done >> a.spec
   rm -f ghc-pkg.{hp,ps}
   /usr/bin/time $ghcpkg -v2 --global-package-db a.packages register --force a.spec +RTS -hc -L100 || true
   hp2ps -b -c ghc-pkg.hp
   evince ghc-pkg.ps
done

comment:2 Changed 5 years ago by gnezdo

The most expensive place seems to be this (quoted from .hp file, lightly munged). The parsing code is courtesy of Cabal/Distribution/ParseUtils.hs.

(953)>>=.\.\.(...)
	>>=.\.\
	>>=.\
	>>=
	many1
	many
	sepBy1
	sepBy
	parseSepList
	parseOptCommaList
	listField
	installedFieldDescrs
	accumFields.setField
	accumFields
	parseFieldsFlat.\
	parseFieldsFlat
	parseInstalledPackageInfo
	parsePackageInfo
	registerPackage
	runit
	main	794023536

Last edited 5 years ago by gnezdo (previous) (diff)

comment:4 Changed 5 years ago by gnezdo

Priority: normallow

My workaround for this was to split the long comma separated list of ld-options into multiple ld-options with one item per line.

comment:5 Changed 20 months ago by Ben Gamari <ben@…>

In 2671ccc/ghc:

Update Cabal submodule

- Cabal-2.2 uses SPDX license identifiers, so I had to update
  `cabal-version: 2.1` packages `license: BSD3` to `license: BSD-3-Clause`
- `ghc-cabal` used old ReadP parsec, now it uses `parsec` too
- InstalledPackageInfo pretty-printing have changed a little,
  fields with default values aren't printed. This can be changed in
  `Cabal` still, but I haven't found problems with omitting them.

Note: `BSD-3-Clause` is parsed as "name = BSD, version = 3" by old
parser (because 3-Clause looks like version 3 with tag Clause).
If you see *"BSD-3" is not a valid license*, then something is using
old parser still.

Fixes #9885.

comment:6 Changed 20 months ago by bgamari

Milestone: 8.4.1
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.