Opened 9 years ago

Closed 9 years ago

#4922 closed bug (worksforme)

Segfault / Assertion failed in RTS (Compact.c)

Reported by: dleuschner Owned by: simonmar
Priority: high Milestone: 7.2.1
Component: Runtime System Version: 7.0.1
Keywords: Cc: wehr@…
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Our application terminates with a segfault or an internal RTS error in about 80% of our testruns when we use the following runtime flags:

+RTS -G4 -H1g -c -I0

Without them the application runs fine. We discovered the problem only after having done many performance improvements to our code while doing stress tests with fast CPUs with many cores.

We compiled with the debugging runtime and got the following assertion failure:

SalviaDerivationGateway: internal error: ASSERTION FAILED: file rts/sm/Compact.c, line 171
    (GHC version 7.0.1.20110121 for x86_64_unknown_linux)
    Please report this as a GHC bug:
    http://www.haskell.org/ghc/reportabug

We're testing with a custom GHC build from the GHC 7.0 branch (with patches until yesterday).

Without the debugging runtime we sometimes get segfaults and sometimes errors like:

SalviaDerivationGateway: internal error: scavenge_mark_stack: unimplemented/strange closure type 1970861226 @ 0x7f7578f488f8
    (GHC version 7.0.1.20110121 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

The last few system calls before a segfault are:

[pid 30727] rt_sigprocmask(SIG_BLOCK, [HUP INT], [], 8) = 0
[pid 30727] clock_gettime(0xfffffffa /* CLOCK_??? */, {147, 512463346}) = 0
[pid 30727] getrusage(RUSAGE_SELF, {ru_utime={126, 620000}, ru_stime={20, 890000}, ...}) = 0
[pid 30727] mmap(0x7fb643800000, 3145728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb643400000
[pid 30727] --- SIGSEGV (Segmentation fault) @ 0 (0) ---

We were very concerned about the situation because an unstable runtime system really feels like we should better be using Java for "serious" applications. It's absolutely no problem now because we'll just not use the tuned runtime system flags. It might be a good idea to remove them entirely until they're known to work in busy applications. (Or at least include a warning.)

I don't understand any of the details but maybe the problem with retainer profiling (issue #4820) has the same cause.

When testing new releases it would probably be a good idea to also test various flag combinations (maybe the GHC compiler binary could just choose some random values during startup if none are given ;-).

I hope this information is of some help. We haven't tried to reproduce the problem with a small test program as we're a bit in a hurry doing a release. If there is anything we can do to help to find the cause of the problem, please let us know.

Change History (8)

comment:1 Changed 9 years ago by simonmar

Thanks for the report. Of course it should not crash, but this combination of flags (4 generations and compacting GC) is much less well-tested than the default settings. If you're able to provide us with a reproducible test case that would let me find the bug. Failing that I can try this combination here on a few tests and see if I can make it crash.

comment:2 Changed 9 years ago by simonmar

Owner: set to simonmar

comment:3 Changed 9 years ago by igloo

Milestone: 7.0.3

comment:4 Changed 9 years ago by simonmar

I think this might be the same as bug #5086, which will be fixed in 7.0.4. Please test with the 7.0.4 release candidate when it is out (should be in the next few days).

comment:5 Changed 9 years ago by simonmar

Priority: normalhigh

comment:6 Changed 9 years ago by simonmar

Status: newinfoneeded

comment:7 Changed 9 years ago by simonmar

I should add that I've tried my local GC benchmark suite with these options (+RTS -G4 -c -H1g) with GHC 7.1.latest, and didn't find any problems.

comment:8 Changed 9 years ago by igloo

Resolution: worksforme
Status: infoneededclosed

No response from submitter, so closing.

Note: See TracTickets for help on using tickets.