Opened 5 years ago

Last modified 4 years ago

#9809 new bug

Overwhelming the TimerManager

Reported by: fryguybob Owned by: simonmar
Priority: normal Milestone:
Component: Runtime System Version: 7.10.1
Keywords: Cc: simonmar, AndreasVoellmy, davean
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I was talking on IRC with davean about an issue that potentially could have been related to STM (I don't think it is at all after investigating) and I reduced the issue to the following program:

-- Main.hs
module Main where

import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
import System.Environment

main :: IO ()
main = do
    as <- getArgs
    case as of
        ["-f"]  -> replicateM_ 100000 . void . forkIO . void $ registerDelay 10
        _       -> replicateM_ 100000 .                 void $ registerDelay 10

    threadDelay 1000

This ends up registering a lot of events with the TimerManager. With the "-f" flag it does so from many threads and when run that way it appears to eventually overwhelm the TimerManager and causing over 350 MB total memory in use.

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.9.20141115
$ ghc -O2 -threaded -debug Main.hs -o Main-head
...
$ ./Main-head -f +RTS -s
   3,566,966,936 bytes allocated in the heap
   4,200,021,784 bytes copied during GC
     118,273,720 bytes maximum residency (96 sample(s))
      12,649,480 bytes maximum slop
             354 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6350 colls,     0 par    2.430s   2.434s     0.0004s    0.0073s
  Gen  1        96 colls,     0 par    7.438s   7.441s     0.0775s    0.2526s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.002s  (  0.002s elapsed)
  MUT     time    0.461s  (  0.460s elapsed)
  GC      time    9.869s  (  9.875s elapsed)
  EXIT    time    0.003s  (  0.003s elapsed)
  Total   time   10.336s  ( 10.340s elapsed)

  Alloc rate    7,741,472,461 bytes per MUT second

  Productivity   4.5% of total user, 4.5% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

Running without forking many threads and the total memory in use stays low (3 MB).

     154,305,648 bytes allocated in the heap
      16,922,272 bytes copied during GC
       1,378,608 bytes maximum residency (3 sample(s))
          28,520 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       298 colls,     0 par    0.056s   0.056s     0.0002s    0.0015s
  Gen  1         3 colls,     0 par    0.005s   0.005s     0.0017s    0.0047s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    0.148s  (  0.148s elapsed)
  GC      time    0.061s  (  0.061s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.211s  (  0.210s elapsed)

  Alloc rate    1,042,557,595 bytes per MUT second

  Productivity  70.8% of total user, 71.2% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

Using 7.6.3, things don't get out of hand, also with 3 MB total memory use.

$ ./Main-7.6.3 -f +RTS -s
     213,519,392 bytes allocated in the heap
     116,111,712 bytes copied during GC
         505,080 bytes maximum residency (11 sample(s))
         113,032 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       403 colls,     0 par    0.33s    0.33s     0.0008s    0.0028s
  Gen  1        11 colls,     0 par    0.01s    0.01s     0.0011s    0.0019s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.99s  (  1.50s elapsed)
  GC      time    0.34s  (  0.34s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    2.33s  (  1.84s elapsed)

  Alloc rate    107,426,859 bytes per MUT second

  Productivity  85.5% of total user, 108.1% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

I looked for causes and eliminated any STM interactions causing problems (STM shows up in the Unique values and in creating a TVar for the registered delay) but did discover that the emTimeouts queue gets very large at some point when executing with "-f". If I print the size of expired on this line:

https://github.com/ghc/ghc/blob/021b7978d14799bae779907faf7490cfd21b3f46/libraries/base/GHC/Event/TimerManager.hs#L226

I end up seeing somewhere from 4 to 20 events for a while then eventually it jumps up to 80000 or so. Perhaps davean can chime in about the particular use case that he has that I reduced to this simple program for a more real world use.

Change History (5)

comment:1 Changed 5 years ago by AndreasVoellmy

Cc: AndreasVoellmy added

comment:2 Changed 5 years ago by davean

Cc: davean added

comment:3 Changed 5 years ago by davean

My specific use case involved TCP connections which were waiting on data from a TChan. If the data did not arrive soon enough, it needed to stop waiting on the TChan and do some other work. After which it would either come back and listen again or not.

Previously I was managing in the range of 40k per instance for months at a time without issue (excursions to 100k had occurred). Currently (with 7.8.3 and admittedly slightly different server code) I'm not safely able to handle more then around 6k per instance.

comment:4 Changed 4 years ago by davean

Confirming this still occurs on 7.10.1.

$ ./Main -f +RTS -s
   3,260,792,712 bytes allocated in the heap
   3,797,542,040 bytes copied during GC
     103,056,944 bytes maximum residency (88 sample(s))
      11,062,480 bytes maximum slop
             309 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5589 colls,     0 par    0.164s   0.160s     0.0000s    0.0004s
  Gen  1        88 colls,     0 par    1.844s   1.909s     0.0217s    0.0656s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.220s  (  0.266s elapsed)
  GC      time    2.008s  (  2.069s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    2.228s  (  2.335s elapsed)

  Alloc rate    14,821,785,054 bytes per MUT second

  Productivity   9.9% of total user, 9.4% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
./Main +RTS -s
     182,472,160 bytes allocated in the heap
      35,704,856 bytes copied during GC
       3,270,336 bytes maximum residency (7 sample(s))
          89,368 bytes maximum slop
               8 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       349 colls,     0 par    0.020s   0.016s     0.0000s    0.0002s
  Gen  1         7 colls,     0 par    0.000s   0.008s     0.0011s    0.0020s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.112s  (  0.152s elapsed)
  GC      time    0.020s  (  0.024s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.132s  (  0.177s elapsed)

  Alloc rate    1,629,215,714 bytes per MUT second

  Productivity  84.8% of total user, 63.4% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

comment:5 Changed 4 years ago by davean

Version: 7.97.10.1
Note: See TracTickets for help on using tickets.