Opened 7 months ago

Closed 6 months ago

#16150 closed bug (fixed)

Data races in itimer_thread_func reported by ThreadSanitizer

Reported by: gnezdo Owned by:
Priority: normal Milestone: 8.6.4
Component: Runtime System Version: 8.4.4
Keywords: Cc: dvyukov@…
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Our local environment allows running Haskell programs under ThreadSanitizer. Even though the GHC runtime and compiled Haskell code are not instrumented, ThreadSanitizer still found a bug through interceptor instrumentation. I'll try to minimize the case and create a reproducer, but in case somebody wants to look at this with just the report, here it is (mildly massaged to remove irrelevant addresses):

==================
WARNING: ThreadSanitizer: data race (pid=2367)
  Write of size 1 at 0x55986a3e4f10 by thread T1:
    #0 pthread_mutex_destroy llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1245:3 
    #1 itimer_thread_func vendor_src/v8_4_4/rts/posix/itimer/Pthread.c:152:5 

  Previous atomic read of size 1 at 0x55986a3e4f10 by main thread:
    #0 pthread_mutex_lock llvm/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:4097:3 
    #1 startTicker vendor_src/v8_4_4/rts/posix/itimer/Pthread.c:184:5 
    #2 exitTicker vendor_src/v8_4_4/rts/posix/itimer/Pthread.c:206 
    #3 __libc_start_main <null> (libc.so.6+0x38bbc)

  Location is global 'mutex' of size 40 at 0x55986a3e4f10 

  Thread T1 'ghc_ticker' (tid=2369, running) created by main thread at:
    #0 pthread_create llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:980:3 
    #1 initTicker vendor_src/v8_4_4/rts/posix/itimer/Pthread.c:171:11 
    #2 __libc_start_main <null> (libc.so.6+0x38bbc)

SUMMARY: ThreadSanitizer: data race vendor_src/v8_4_4/rts/posix/itimer/Pthread.c:152:5 in itimer_thread_func
==================
==================
WARNING: ThreadSanitizer: data race (pid=2367)
  Write of size 8 at 0x55986a3e4ee0 by thread T1:
    #0 pthread_cond_destroy llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1216:3 
    #1 itimer_thread_func vendor_src/v8_4_4/rts/posix/itimer/Pthread.c:153:5 

  Previous read of size 8 at 0x55986a3e4ee0 by main thread (mutexes: write M146179626018688784):
    #0 pthread_cond_signal llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1202:3 
    #1 signalCondition vendor_src/v8_4_4/rts/posix/OSThreads.c:111:11 
    #2 __libc_start_main <null> (libc.so.6+0x38bbc)

  Location is global 'start_cond' of size 48 at 0x55986a3e4ee0 

  Mutex M146179626018688784 is already destroyed.

  Thread T1 'ghc_ticker' (tid=2369, running) created by main thread at:
    #0 pthread_create llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:980:3 
    #1 initTicker vendor_src/v8_4_4/rts/posix/itimer/Pthread.c:171:11 
    #2 __libc_start_main <null> (libc.so.6+0x38bbc)

SUMMARY: ThreadSanitizer: data race vendor_src/v8_4_4/rts/posix/itimer/Pthread.c:153:5 in itimer_thread_func
==================
ThreadSanitizer: reported 2 warnings

Change History (7)

comment:1 Changed 7 months ago by bgamari

Ahhh, I think we have actually seen this in manifest in practice in fact (in CI).

comment:2 Changed 7 months ago by bgamari

I believe this should be fixed by https://gitlab.haskell.org/ghc/ghc/merge_requests/95.

comment:3 Changed 7 months ago by Phyx-

I believe #16147 is similar @bgamari, In that case there are multiple OS threads even on the non-threaded runtime. But the ACQUIRE_LOCK and RELEASE_LOCK macros expand to nothing on the non-threaded runtime, which introduces a race condition with shutdown of the loop and rts. Which is why I think it happens on the non-threaded but not the threaded.

See https://ghc.haskell.org/trac/ghc/attachment/ticket/16147/sample.txt

I think in this case we always want a lock, so ACQUIRE_LOCK and RELEASE_LOCK should be changed to OS_ACQUIRE_LOCK to OS_RELEASE_LOCK.

comment:4 Changed 7 months ago by Ben Gamari <ben@…>

In 7b12b3f0/ghc:

itimer: Don't free condvar until we know ticker is stopped

When we are shutting down the pthread ticker we signal the start_cond condition
variable to ensure that the ticker thread wakes up and exits in a reasonable
amount of time. Previously, when the ticker thread would shut down it was
responsible for freeing the start_cond condition variable. However, this would
lead to a race wherein the ticker would free start_cond, then the main thread
would try to signal it in an effort to wake the ticker (#16150).

Avoid this by moving the mutex destruction to the main thread.

comment:5 Changed 7 months ago by Ben Gamari <ben@…>

In ce11f6f/ghc:

rts: Use always-available locking operations in pthread Itimer implementation

Previously we ACQUIRE_LOCK and RELEASE_LOCK but these compile to a noop in the
non-threaded RTS, as noted in #16150. Use OS_ACQUIRE_LOCK and OS_RELEASE_LOCK
instead.

comment:6 Changed 7 months ago by bgamari

Milestone: 8.6.4
Status: newmerge

comment:7 Changed 6 months ago by bgamari

Resolution: fixed
Status: mergeclosed
Note: See TracTickets for help on using tickets.