Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#11116 closed bug (invalid)

GC reports memory in use way below the actual

Reported by: facundo.dominguez Owned by:
Priority: normal Milestone:
Component: Runtime System Version: 7.10.2
Keywords: Cc: simonmar
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The following program encodes and decodes a long list of words. The memory in use reported by the GC seems to be off by multiple gigabytes when compared to the reports of the OS. Results shown below. ghc-7.10.2, binary-0.7.6.1.

import Control.Exception (evaluate)
import Control.Monad (void)
import Data.Binary (encode, decode)
import qualified Data.ByteString.Lazy as BSL
import Data.List (isPrefixOf, foldl')
import Data.Word (Word32)
import GHC.Stats
import System.Mem (performGC)

type T = (Word32,[Word32])

main :: IO ()
main = do
  let sz = 1024 * 1024 * 15
      xs = [ (i,[i]) :: T | i <- [0 .. sz] ]
      bs = encode xs

  void $ evaluate $ sum' $ map (\(x, vs) -> x + sum' vs) xs
  putStrLn "After building the value to encode:"
  printMem

  putStrLn $ "Size of the encoded value: " ++
    show (BSL.length bs `div` (1024 * 1024)) ++ " MB"
  putStrLn ""

  putStrLn "After encoding the value:"
  printMem

  let xs' = decode bs :: [T]
  void $ evaluate $ sum' $ map (\(x, vs) -> x + sum' vs) xs'
  putStrLn "After decoding the value:"
  printMem

  -- retain the original list so it is not GC'ed
  void $ evaluate $ last xs
  -- retain the decoded list so it is not GC'ed
  void $ evaluate $ last xs'

printMem :: IO ()
printMem = do
  performGC
  readFile "/proc/self/status" >>=
    putStr . unlines . filter (\x -> any (`isPrefixOf` x) ["VmHWM", "VmRSS"])
           . lines
  stats <- getGCStats
  putStrLn $ "In use according to GC stats: " ++
    show (currentBytesUsed stats `div` (1024 * 1024)) ++ " MB"
  putStrLn $ "HWM according the GC stats: " ++
    show (maxBytesUsed stats `div` (1024 * 1024)) ++ " MB"
  putStrLn ""

sum' :: Num a => [a] -> a
sum' = foldl' (+) 0

Here are the results:

# ghc --make -O -fno-cse -fforce-recomp -rtsopts test.hs
# time ./test +RTS -T
After building the value to encode:
VmHWM:	 2782700 kB
VmRSS:	 2782700 kB
In use according to GC stats: 1320 MB
HWM according the GC stats: 1320 MB

Size of the encoded value: 240 MB

After encoding the value:
VmHWM:	 3064976 kB
VmRSS:	 3064976 kB
In use according to GC stats: 1560 MB
HWM according the GC stats: 1560 MB

After decoding the value:
VmHWM:	 7426784 kB
VmRSS:	 7426784 kB
In use according to GC stats: 2880 MB
HWM according the GC stats: 2880 MB


real	0m24.348s
user	0m22.316s
sys	0m1.992s

At the end of the program the OS reports 7 GB while the GC reports less than 3G of memory in use.

Running the program with +RTS -M3G keeps VmHWM bounded at the expense of doubling the execution time.

Change History (4)

comment:1 Changed 4 years ago by facundo.dominguez

Architecture: Unknown/Multiplex86_64 (amd64)
Cc: simonmar added
Component: CompilerRuntime System
Operating System: Unknown/MultipleLinux
Type of failure: None/UnknownRuntime performance bug

comment:2 Changed 4 years ago by rwbarton

Resolution: invalid
Status: newclosed

currentBytesUsed and maxBytesUsed are, as documented, "Current number of live bytes" on the heap and "Maximum number of live bytes seen so far" respectively. They are just calculated as the sum of the sizes of all live objects on the heap. Due to the way GHC's copying garbage collector works, the actual space used by the heap will typically be double this size. Then of course there will be additional space used by the runtime system or other C libraries (though that is not significant in this example).

peakMegabytesAllocated counts everything allocated through the RTS (including any blocks used for heap) and will be closer to the figure you are looking for.

comment:3 Changed 4 years ago by simonmar

Also note that the RTS only tracks memory that is allocated on the Haskell heap, it doesn't track memory allocated by C libraries, malloc, or mmap. So there are several reasons why the memory figure reported by peakMegabytesAllocated might be less than the RSS figure from the OS.

comment:4 Changed 4 years ago by facundo.dominguez

I was a bit surprised of having the application use only 40% of the space for live data, but then I know very little of how the GC is supposed to work. Thanks for taking a look.

Note: See TracTickets for help on using tickets.