Opened 2 years ago

Closed 2 years ago

#13987 closed bug (fixed)

T13701 fails sporadically

Reported by: duog Owned by: duog
Priority: normal Milestone: 8.4.1
Component: Test Suite Version: 8.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Other Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D3748
Wiki Page:

Description

T13701 fails sporadically, comments in Phab:D3586 indicate that it happens while the system is under load.

I reproduced this issue by duplicating T13701 7 times, making tests T13701[A-H], then running

make test TEST="T13701A T13701B T13701C T13701D T13701E T13701F T13701G T13701H THREADS=8"

on my box with 4 cpus.

Comparing profiling reports of the test under and not under load, I found the additional allocations to be in SysTools.builderMainLoop.loop. This function busy-waits from when 2 EOFs are received until getProcessExitCode returns a Just. That this is the cause was verified by changing

| otherwise -> loop chan hProcess t p exitcode

to

| otherwise -> threadDelay 10000 >> loop chan hProcess t p exitcode

Change History (4)

comment:1 Changed 2 years ago by duog

Owner: set to duog

comment:2 Changed 2 years ago by duog

Differential Rev(s): Phab:D3748
Status: newpatch

comment:3 Changed 2 years ago by Ben Gamari <ben@…>

In 194384f1/ghc:

Fix busy-wait in SysTools.builderMainLoop

Test T13701 was failing sporadically. The problem manifested while the
test was run on a system under load. Profiling showed the increased
allocations were in SysTools.builderMainLoop.loop, during calls to the
assembler. This was due to loop effectively busy-waiting from when both
stdin and stderr handles were closed, until getProcessExitCode
succeeded.

This is fixed by removing exit code handling from loop. We now wait for
loop to finish, then read the exit code with waitForProcess.

Some exception safety is added: the readerProc threads will now be
killed and the handles will be closed if an exception is thrown.

A TODO saying that threads dying is not accounted for is removed. I
believe that this case is handled by readerProc sending EOF in a finally
clause.

Test Plan:
Replicate test failures using procedure on the ticket, verify that they
do not occur with this patch.

Reviewers: austin, bgamari

Reviewed By: bgamari

Subscribers: rwbarton, thomie

GHC Trac Issues: #13987

Differential Revision: https://phabricator.haskell.org/D3748

comment:4 Changed 2 years ago by bgamari

Milestone: 8.4.1
Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.