Opened 4 years ago

Closed 4 years ago

#10590 closed bug (fixed)

RTS failing with removeThreadFromDeQueue: not found message

Reported by: qnikst Owned by: slyfox
Priority: normal Milestone: 7.10.3
Component: Compiler Version: 7.10.1
Keywords: Cc: slyfox, simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D1024
Wiki Page:

Description

Under certain circumstances I'm facing RTS error, this happens when one thread is reading socket and another one closes it. Seems like that it happens when smth else is involved as I failed to trim this example

Error that I see:

% ./test
nid://127.0.0.1:8080:0
1
test: internal error: removeThreadFromDeQueue: not found
    (GHC version 7.10.1 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
zsh: abort      ./test

Attachments (1)

test.hs (1.3 KB) - added by qnikst 4 years ago.
example program

Download all attachments as: .zip

Change History (9)

Changed 4 years ago by qnikst

Attachment: test.hs added

example program

comment:1 Changed 4 years ago by slyfox

Cc: slyfox added

comment:2 Changed 4 years ago by slyfox

Owner: set to slyfox

Found a bug in excessive dequeueing. Fix is on the way.

comment:3 Changed 4 years ago by slyfox

Cc: simonmar added
Differential Rev(s): Phab:D1024
Milestone: 7.10.2

comment:4 Changed 4 years ago by slyfox

Status: newpatch
Type of failure: None/UnknownRuntime crash

comment:5 Changed 4 years ago by Ben Gamari <ben@…>

In 5857e0afb5823987e84e6d3dd8d0b269b7546166/ghc:

fix EBADF unqueueing in select backend (Trac #10590)

Alexander found a interesting case:
1. We have a queue of two waiters in a blocked_queue
2. first file descriptor changes state to RUNNABLE,
   second changes to INVALID
3. awaitEvent function dequeued RUNNABLE thread to a
   run queue and attempted to dequeue INVALID descriptor
   to a run queue.

Unqueueing INVALID fails thusly:
        #3  0x000000000045cf1c in barf (s=0x4c1cb0 "removeThreadFromDeQueue: not found")
                               at rts/RtsMessages.c:42
        #4  0x000000000046848b in removeThreadFromDeQueue (...) at rts/Threads.c:249
        #5  0x000000000049a120 in removeFromQueues (...) at rts/RaiseAsync.c:719
        #6  0x0000000000499502 in throwToSingleThreaded__ (...) at rts/RaiseAsync.c:67
        #7  0x0000000000499555 in throwToSingleThreaded (..) at rts/RaiseAsync.c:75
        #8  0x000000000047c27d in awaitEvent (wait=rtsFalse) at rts/posix/Select.c:415

The problem here is a throwToSingleThreaded function that tries
to unqueue a TSO from blocked_queue, but awaitEvent function
leaves blocked_queue in a inconsistent state while traverses
over blocked_queue:

      case RTS_FD_IS_READY:
          IF_DEBUG(scheduler,
              debugBelch("Waking up blocked thread %lu\n",
                         (unsigned long)tso->id));
          tso->why_blocked = NotBlocked;
          tso->_link = END_TSO_QUEUE;              // Here we break the queue head
          pushOnRunQueue(&MainCapability,tso);
          break;

Signed-off-by: Sergei Trofimovich <siarheit@google.com>

Test Plan: tested on a sample from T10590

Reviewers: austin, bgamari, simonmar

Reviewed By: bgamari, simonmar

Subscribers: qnikst, thomie, bgamari

Differential Revision: https://phabricator.haskell.org/D1024

GHC Trac Issues: #10590, #4934

comment:6 Changed 4 years ago by bgamari

Resolution: fixed
Status: patchclosed

comment:7 Changed 4 years ago by bgamari

Milestone: 7.10.27.10.3
Status: closedmerge

comment:8 Changed 4 years ago by bgamari

Status: mergeclosed

This has been merged to ghc-7.10.

Note: See TracTickets for help on using tickets.