Opened 6 years ago

Closed 6 years ago

#8124 closed bug (fixed)

Possible leaks when using foreign export.

Reported by: augustss Owned by: simonmar
Priority: high Milestone: 7.8.1
Component: Runtime System Version: 7.8.1-rc2
Keywords: Cc: ndmitchell@…, simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

When a Haskell function exported with foreign export is called from outside Haskell a new Task may be allocated for it. There will be one Task allocated for each OS thread calling into Haskell (this happens in rts_lock()). This Task is, by design, never deallocated. This works fine with a small number of threads, but if the external caller generates a lot of threads that die the Haskell RTS will just get more and more Task objects that are never freed. The Task also contains a condition variable and a mutex, which incurs a HANDLE leak on Windows.

I don't see an elegant fix to this, but the attached code remembers the OS thread for "incall" Task, and every so often (every 100 Tasks created) if will scan the Tasks looking for ones where the OS thread is dead, and remove these. The fix only works on Windows, I'm afraid.

Attachments (1)

task-reap.patch (7.2 KB) - added by augustss 6 years ago.

Download all attachments as: .zip

Change History (24)

Changed 6 years ago by augustss

Attachment: task-reap.patch added

comment:1 Changed 6 years ago by simonmar

Lennart: is calling something explicitly not a possibility for you?

comment:2 Changed 6 years ago by ezyang

Agreed, it seems that the proper thing to do is ask the external caller to free a task when it is done.

comment:3 Changed 6 years ago by NeilMitchell

Cc: ndmitchell@… simonmar added

comment:4 Changed 6 years ago by augustss

Having something to call might be possible. It would certainly be a start.

comment:5 Changed 6 years ago by NeilMitchell

Having to call in to dealloc has some issues:

  1. When destroying a thread, you need to know if it was ever used for GHC or incurr some overhead. As long as the dealloc is cheap and doesn't require you to have done the alloc, that's OK.
  2. You need to know when the thread goes away. If you are using a thread pool library, that's often very tricky. In our particular case it's feasible.
  3. If a thread dies unexpectedly the memory will be leaked. That's hopefully not a big issue in practice.

So an explicit dealloc is OK for us, with the caveat that is must work successfully on a thread that was never used by GHC.

comment:6 Changed 6 years ago by simonmar

Milestone: 7.8.2
Owner: set to simonmar
Priority: normalhighest

Thinking about it a bit more, it ought to be possible to clean these up automatically. The tricky bit is knowing when to do so - too eager and we penalize threads that are doing frequent in-calls, too lazy and we have the memory leak again. Perhaps just sweeping the table during a major GC would be good enough. Better might be to only free Tasks that have survived one major GC cycle without being used, but that's a bit more work.

I'll look into this when I've got some time.

comment:7 Changed 6 years ago by Simon Marlow <marlowsd@…>

In af6746fb6b5adb5ba5be6e0f647c4ebe767ce084/ghc:

Add hs_thread_done() (#8124)

See documentation for details.

comment:8 Changed 6 years ago by simonmar

Milestone: 7.8.27.8.1
Status: newmerge
Version: 7.6.37.8.1-rc1

Let's get this into 7.8.1 RC2. I don't think it will break anything that isn't using it.

comment:9 Changed 6 years ago by nomeata

Simon’s commit broke the CI builds, see: https://s3.amazonaws.com/archive.travis-ci.org/jobs/19838358/log.txt

=====> T8124(normal) 2080 of 3905 [0, 0, 0] 
Compile failed (status 256) errors were:
<command line>: does not exist: T8124_c.c

*** unexpected failure for T8124(normal)

comment:10 Changed 6 years ago by thoughtpolice

Merged in 7.8 for RC2.

I'm probably going to go ahead and cut RC2 - it seems like a missing file was simply not committed, not a huge error.

comment:11 Changed 6 years ago by thoughtpolice

Priority: highesthigh

comment:12 Changed 6 years ago by Simon Marlow <marlowsd@…>

In 3fba87599378afbcf425a0fc2a5a61d21e3719d4/ghc:

add missing files (#8124)

comment:13 Changed 6 years ago by simonmar

Sorry about this. It should be fixed now: 3fba87599378afbcf425a0fc2a5a61d21e3719d4/ghc and 176205cf0b89f76d904d381bdcd61e8685116bb7/ghc

comment:14 Changed 6 years ago by nomeata

Thanks, travis is green again for master. This still needs to be merged into ghc-7.8.

comment:15 Changed 6 years ago by nomeata

Sorry, me again. On one of my development machines, I get

Compile failed (status 256) errors were:
[1 of 1] Compiling T8124            ( T8124.hs, T8124.o )
Linking T8124 ...
Warning: -rtsopts and -with-rtsopts have no effect with -no-hs-main.
    Call hs_init_ghc() from your main() function to set these options.
/usr/bin/ld: T8124_c.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

*** unexpected failure for T8124(normal)

Any ideas?

comment:16 Changed 6 years ago by simonmar

Could you try adding -optl-lpthread after -no-hs-main and see if that helps?

comment:17 Changed 6 years ago by nomeata

No, doesn’t help, sorry. Note that it does succeeds in the threaded ways:

====> Scanning ./all.T
=====> T8124(normal) 47 of 48 [0, 0, 0] 
cd . && $MAKE -s --no-print-directory T8124_setup
cd . && '/data1/breitner/ghc/ghc-T7994/inplace/bin/ghc-stage2' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-ghci-history -o T8124 T8124.hs  T8124_c.c -no-hs-main -optl-lpthread  >T8124.comp.stderr 2>&1
Compile failed (status 256) errors were:
[1 of 1] Compiling T8124            ( T8124.hs, T8124.o )
Linking T8124 ...
Warning: -rtsopts and -with-rtsopts have no effect with -no-hs-main.
    Call hs_init_ghc() from your main() function to set these options.
/usr/bin/ld: T8124_c.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

*** unexpected failure for T8124(normal)
=====> T8124(hpc) 47 of 48 [0, 1, 0] 
cd . && $MAKE -s --no-print-directory T8124_setup
cd . && '/data1/breitner/ghc/ghc-T7994/inplace/bin/ghc-stage2' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-ghci-history -o T8124 T8124.hs -O -fhpc -hpcdir .hpc.T8124 T8124_c.c -no-hs-main -optl-lpthread  >T8124.comp.stderr 2>&1
Compile failed (status 256) errors were:
[1 of 1] Compiling T8124            ( T8124.hs, T8124.o )
Linking T8124 ...
Warning: -rtsopts and -with-rtsopts have no effect with -no-hs-main.
    Call hs_init_ghc() from your main() function to set these options.
/usr/bin/ld: T8124_c.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

*** unexpected failure for T8124(hpc)
=====> T8124(optasm) 47 of 48 [0, 2, 0] 
cd . && $MAKE -s --no-print-directory T8124_setup
cd . && '/data1/breitner/ghc/ghc-T7994/inplace/bin/ghc-stage2' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-ghci-history -o T8124 T8124.hs -O -fasm T8124_c.c -no-hs-main -optl-lpthread  >T8124.comp.stderr 2>&1
Compile failed (status 256) errors were:
[1 of 1] Compiling T8124            ( T8124.hs, T8124.o )
Linking T8124 ...
Warning: -rtsopts and -with-rtsopts have no effect with -no-hs-main.
    Call hs_init_ghc() from your main() function to set these options.
/usr/bin/ld: T8124_c.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

*** unexpected failure for T8124(optasm)
=====> T8124(threaded1) 47 of 48 [0, 3, 0] 
cd . && $MAKE -s --no-print-directory T8124_setup
cd . && '/data1/breitner/ghc/ghc-T7994/inplace/bin/ghc-stage2' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-ghci-history -o T8124 T8124.hs -threaded -debug T8124_c.c -no-hs-main -optl-lpthread  >T8124.comp.stderr 2>&1
cd . && ./T8124    </dev/null >T8124.run.stdout 2>T8124.run.stderr
=====> T8124(threaded2) 47 of 48 [0, 3, 0] 
cd . && $MAKE -s --no-print-directory T8124_setup
cd . && '/data1/breitner/ghc/ghc-T7994/inplace/bin/ghc-stage2' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-ghci-history -o T8124 T8124.hs -O -threaded -eventlog T8124_c.c -no-hs-main -optl-lpthread  >T8124.comp.stderr 2>&1
cd . && ./T8124  +RTS -N2 -ls -RTS  </dev/null >T8124.run.stdout 2>T8124.run.stderr
=====> T8124(dyn) 47 of 48 [0, 3, 0] 
cd . && $MAKE -s --no-print-directory T8124_setup
cd . && '/data1/breitner/ghc/ghc-T7994/inplace/bin/ghc-stage2' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-ghci-history -o T8124 T8124.hs -O -dynamic T8124_c.c -no-hs-main -optl-lpthread  >T8124.comp.stderr 2>&1
Compile failed (status 256) errors were:
[1 of 1] Compiling T8124            ( T8124.hs, T8124.o )
Linking T8124 ...
Warning: -rtsopts and -with-rtsopts have no effect with -no-hs-main.
    Call hs_init_ghc() from your main() function to set these options.
/usr/bin/ld: T8124_c.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

*** unexpected failure for T8124(dyn)

comment:18 Changed 6 years ago by nomeata

Set ways appropriately in [0014fb3]; please merge that as well so that we have a reliable test suite on the stable branch.

comment:19 Changed 6 years ago by thoughtpolice

Version: 7.8.1-rc17.8.1-rc2

comment:20 Changed 6 years ago by nomeata

Can we get a merge of this soon? I think it would be desireable if we get useful information about the stable branch again on https://travis-ci.org/nomeata/ghc-complete/builds.

comment:21 Changed 6 years ago by simonpj

It seems to be on the list of things to merge: https://ghc.haskell.org/trac/ghc/wiki/Status/GHC-7.8/RC2.

Austin?

Simon

comment:22 Changed 6 years ago by thoughtpolice

Yes, it's on my queue. Sorry, I forgot this was breaking the travis build...

comment:23 Changed 6 years ago by thoughtpolice

Resolution: fixed
Status: mergeclosed
Note: See TracTickets for help on using tickets.