Opened 9 years ago

Closed 9 years ago

#4850 closed bug (fixed)

Segfault when lots of blocked MVar messages

Reported by: NeilMitchell Owned by:
Priority: high Milestone: 7.0.2
Component: Runtime System Version: 7.0.1
Keywords: Cc:
Operating System: Windows Architecture: Unknown/Multiple
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I reduced the original problem I started with in #4835 to the following example, which spans 3 files. There is still further scope for reducing the test case - I have stopped now it's standalone without any data file or package dependencies. I have reproduced this bug on GHC 7.0.2-rc1.

The program does has some parallelism, and due to a bug in the parallel-io library (since fixed) it causes lots of MVar blocked messages. This program sometimes runs to completion, sometimes gives a blocked MVar, and sometimes segfaults. I see the segfault > 50% of the time.

The program is compiled and run with:

ghc --make Main.hs -threaded && Main.exe

Line 82 of Main.hs does getDirectoryContents "C:/Windows". It seems to crash if I put file or directory operations here, but not if I put sleep or pure computation there. This line can be changed to any reasonably sized directory to get the same effects - there is nothing specific to that folder.

Attachments (3)

Main.hs (3.2 KB) - added by NeilMitchell 9 years ago.
Local.hs (13.3 KB) - added by NeilMitchell 9 years ago.
ConcurrentSet.hs (2.8 KB) - added by NeilMitchell 9 years ago.

Download all attachments as: .zip

Change History (7)

Changed 9 years ago by NeilMitchell

Attachment: Main.hs added

Changed 9 years ago by NeilMitchell

Attachment: Local.hs added

Changed 9 years ago by NeilMitchell

Attachment: ConcurrentSet.hs added

comment:1 Changed 9 years ago by igloo

Milestone: 7.0.3
Priority: normalhigh

I can't reproduce this on amd64/Linux.

comment:2 Changed 9 years ago by simonmar

Milestone: 7.0.37.0.2
Status: newmerge

Fixed:

Tue Dec 21 11:49:11 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * releaseCapabilityAndQueueWorker: task->stopped should be false (#4850)

Tue Dec 21 11:58:07 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * boundTaskExiting: don't set task->stopped unless this is the last call (#4850)
  The bug in this case was that we had a worker thread making a foreign
  call which invoked a callback (in this case it was performGC, I
  think).  When the callback ended, boundTaskExiting() was setting
  task->stopped, but the Task is now per-OS-thread, so it is shared by
  the worker that made the original foreign call.  When the foreign call
  returned, because task->stopped was set, the worker was not placed on
  the queue of spare workers.  Somehow the worker woke up again, and
  found the spare_workers queue empty, which lead to a crash.
  
  Two bugs here: task->stopped should not have been set by
  boundTaskExiting (this broke when I split the Task and InCall structs,
  in 6.12.2), and releaseCapabilityAndQueueWorker() should not be
  testing task->stopped anyway, because it should only ever be called
  when task->stopped is false (this is now an assertion).

We should get this in 7.0.2.

comment:3 Changed 9 years ago by igloo

Also

Tue Dec 21 12:27:31 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * Add test for #4850

comment:4 Changed 9 years ago by igloo

Resolution: fixed
Status: mergeclosed

All merged.

Note: See TracTickets for help on using tickets.