Opened 3 years ago

Last modified 3 years ago

#12566 new bug

Memory leak

Reported by: igloo Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.0.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

With the attached write.hs (which generates a file called "data") and read.hs (which consumes that file), if I run

ghc -Wall -Werror -O write.hs -o write
ghc -Wall -Werror -O -prof -auto-all read.hs -o read1
ghc -Wall -Werror -O -prof -auto-all read.hs -o read2 -DLEAK
./write
./read1 +RTS -h
./read2 +RTS -h

then read2's heap profile shows that it is retaining a lot of extra data. Perhaps I am missing something, but I can't see why this needs to be retained. I would expect the two heap profiles to look the same.

Sources and heap profiles (using GHC 8.0.1) attached. I've copied the sources below for convenience:

write.hs:

module Main (main) where

import qualified Data.ByteString.Lazy.Char8 as L

main :: IO ()
main = L.writeFile "data" $ L.concat $ map mkByteString [1..100000]

mkByteString :: Int -> L.ByteString
mkByteString i = L.concat (L.pack ("#" ++ show i ++ "\n")
               : replicate 100 (L.pack "Something else\n"))

read.hs:

{-# LANGUAGE BangPatterns, CPP #-}

module Main (main) where

import Data.List
import Data.Set (Set)
import qualified Data.Set as Set

import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Lazy.Char8 as L

main :: IO ()
main = do bs <- L.readFile "data"
          let stats = getMaybes bs
              s = mkSet stats
          print $ Set.size s

mkSet :: [Maybe S.ByteString] -> Set S.ByteString
mkSet ms = foldl' f Set.empty ms
    where f s (Just l) = Set.insert l s
          f s _ = s

getMaybes :: L.ByteString -> [Maybe S.ByteString]
getMaybes bs = if L.null bs then []
               else case getMaybe bs of
                    (stat, bs') ->
                        stat : getMaybes bs'

getMaybe :: L.ByteString -> (Maybe S.ByteString, L.ByteString)
getMaybe bs = case L.uncons bs of
              Just ('#', bs') ->
                  case L.break ('\n' ==) bs' of
                  (l, bs'') ->
                      let !l' = copy l
                      in (Just l', bs'')
              _ ->
                  case L.break ('\n' ==) bs of
                  (_x, bs') ->
#ifdef LEAK
                      copy _x `seq`
#endif
                      (Nothing, L.tail bs')

copy :: L.ByteString -> S.ByteString
copy bs = S.copy $ L.toStrict bs

Attachments (4)

write.hs (315 bytes) - added by igloo 3 years ago.
read.hs (1.2 KB) - added by igloo 3 years ago.
read1.png (19.4 KB) - added by igloo 3 years ago.
read2.png (16.5 KB) - added by igloo 3 years ago.

Download all attachments as: .zip

Change History (5)

Changed 3 years ago by igloo

Attachment: write.hs added

Changed 3 years ago by igloo

Attachment: read.hs added

Changed 3 years ago by igloo

Attachment: read1.png added

Changed 3 years ago by igloo

Attachment: read2.png added

comment:1 Changed 3 years ago by simonpj

Type of failure: None/UnknownRuntime performance bug

It's unclear whether GHC is at fault or the bytestring library. Needs investigation. Any volunteers?

Note: See TracTickets for help on using tickets.