Opened 5 years ago

Closed 5 years ago

Last modified 3 years ago

#9078 closed bug (fixed)

Segfault with makeStableName

Reported by: robertce Owned by:
Priority: highest Milestone: 7.8.3
Component: Compiler Version: 7.8.2
Keywords: Cc: tmcdonell@…, chak@…, erkokl@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


Calling makeStableName on the same object many times causes a segfault. The following triggers this behaviour on both OSX and Linux-x86_64, provided it is compiled (it doesn't seem to occur if the call to makeStableName is interpreted).

module Main where

import Control.Monad
import System.Mem.StableName

main :: IO ()
main = replicateM_ 100000 (makeStableName foo)

foo :: Int
foo = 1

It seems to happen largely at random. The 100000 calls to makeStableName in the above example essentially guarantee its occurrence, but I have observed it happen with far fewer.

The reason this bug was discovered is it is causing frequent failures in Accelerate on 7.8.2. See In that case it seems to be causing more strange closure type errors than segfaults, but I assume it is the same root cause.

Change History (24)

comment:1 Changed 5 years ago by tmcdonell

Cc: tmcdonell@… added

comment:2 Changed 5 years ago by ezyang

Reproduced; when run with the debug build we get an assert failure:

Starting program: /home/ezyang/test +RTS -DS -C0 -V0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/".
cap 0: initialised
test: internal error: ASSERTION FAILED: file rts/Stable.c, line 337

    (GHC version 7.9.20140410 for x86_64_unknown_linux)
    Please report this as a GHC bug:

Program received signal SIGABRT, Aborted.
0x00007ffff6eaef79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6eaef79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff6eb2388 in __GI_abort () at abort.c:89
#2  0x000000000067edef in rtsFatalInternalErrorFn (s=0x6e63c0 "ASSERTION FAILED: file %s, line %u\n", ap=0x7fffffff9878) at rts/RtsMessages.c:170
#3  0x000000000067ea27 in barf (s=0x6e63c0 "ASSERTION FAILED: file %s, line %u\n") at rts/RtsMessages.c:42
#4  0x000000000067ea8a in _assertFail (filename=0x6e6d44 "rts/Stable.c", linenum=337) at rts/RtsMessages.c:57
#5  0x0000000000682c03 in lookupStableName (p=0x9076b0) at rts/Stable.c:337
#6  0x00000000006a2f46 in stg_makeStableNamezh ()
#7  0x0000000000000000 in ?? ()
(gdb) f 5
#5  0x0000000000682c03 in lookupStableName (p=0x9076b0) at rts/Stable.c:337
337	    ASSERT(stable_name_table[sn].addr == p);

comment:3 Changed 5 years ago by ezyang

I believe the bug is when we free stable-name entries (in gcStableTables), we also need to remove the stable name from the hash table. updateStableTables will not catch this because it only iterates over live stable names. The bug is masked because if we immediately make another stable pointer, it will probably be put in the right slot, but the ASSERT catches the problem immediately.

comment:4 Changed 5 years ago by ezyang

This patch seems to do the trick.

diff --git a/rts/Stable.c b/rts/Stable.c
index ec74b0d..431b7c6 100644
--- a/rts/Stable.c
+++ b/rts/Stable.c
@@ -246,6 +246,7 @@ STATIC_INLINE void
 freeSnEntry(snEntry *sn)
   ASSERT(sn->sn_obj == NULL);
+  removeHashTable(addrToStableHash, (W_)sn->old, NULL);
   sn->addr = (P_)stable_name_free;
   stable_name_free = sn;

Alas, there are a bunch of validate failures at the moment.

comment:5 Changed 5 years ago by chak

@ezyang Thanks for chasing this, but I'm a bit surprised that the problem is with freeing stable names as the example program only allocated them (and they should never be freed by the GC).

@robertce Could you please build a GHC with this patch included and check whether it solves our problem?

comment:6 Changed 5 years ago by chak

Cc: chak@… added
Priority: normalhigh

comment:7 Changed 5 years ago by robertce

@chak I have just patched GHC and it does seem to solve our problem. I can run all of our testsuite without problem, something that was previously impossible.

comment:8 Changed 5 years ago by thoughtpolice

Milestone: 7.8.3
Priority: highhighest
Status: newpatch

comment:9 Changed 5 years ago by chak

@robertce Thanks!

@thoughtpolice What is the current timeline for 7.8.3? This bug is basically a killer for Accelerate as it leads to intermittent failures and it doesn't seem as if we can work around it. (Would be a pity if it ends up in the HP.)

comment:10 Changed 5 years ago by ezyang

@chak Stable names are GC'd as normal (as opposed to stable *pointers*), so they'll get freed upon GCs. This behavior is new from fixing #7674, I assume that is how the bug got introduced too.

comment:11 Changed 5 years ago by chak

@ezyang Yes, you are right.

comment:12 Changed 5 years ago by lerkok

Cc: erkokl@… added

A recent bug reported in SBV points to the same problem:

Only with GHC 7.8.2; heavy calls to SBV's sat/prove calls start causing seemingly random failures. We were able to track it down to the uses of makeStableName. We have not observed seg-faults, but we did observe erroneous behavior that we could not replicate with GHC 7.6.3 and earlier.

SBV uses makeStableName to implement Andy Gill's type-safe observable sharing idea (as described here:, which is a popular trick amongst DSL implementations these days. I'm not sure if Accelerate uses it for the same purpose, but I suspect this bug can impact a bunch of other DSL implementations out there, and would be really hard to debug for end-users. A timely fix before 7.8.2 becomes more widely adapted would be really good for the community.

comment:13 Changed 5 years ago by chak

@erkokl Yes, Accelerate uses stable names for observable sharing. I agree that it is a nasty bug.

comment:14 Changed 5 years ago by thoughtpolice

So we weren't sure when we wanted to release 7.8.3. But it looks like there are quite a few important bugfixes biting people at this point. I'll send an email to ghc-devs shortly...

comment:15 Changed 5 years ago by Austin Seipp <austin@…>

In b75d126e779e8690c675be84e8972dc023e04b9b/ghc:

rts: remove stable-names from hashtable upon free

This fixes #9078.

Signed-off-by: Austin Seipp <>

comment:16 Changed 5 years ago by thoughtpolice

Status: patchmerge

comment:17 Changed 5 years ago by chak

@thoughtpolice Is there any news on a timeline?

comment:18 Changed 5 years ago by simonpj

Manuel, you mean a timeline for 7.8.3? What would be helpful for you? (You can always build your own 7.8.3 working copy.)


comment:19 Changed 5 years ago by thoughtpolice

I have posted an update here, please let me know what you think:

comment:20 in reply to:  18 Changed 5 years ago by chak

Replying to simonpj:

Manuel, you mean a timeline for 7.8.3? What would be helpful for you? (You can always build your own 7.8.3 working copy.)


Simon, I'm worried that the Haskell Platform might ship with 7.8.2 (including this bug). It would mean that the Platform would be unusable for Accelerate (and possibly all other EDSLs that use Andy Gill's method to implement observable sharing). Hence, I would much appreciate if 7.8.3 could be pushed out of the door in time to get included in the forthcoming Haskell Platform.

comment:21 Changed 5 years ago by lerkok

@chak Looks like the next platform will indeed ship with 7.8.2; so maybe it's already too late:

comment:22 Changed 5 years ago by carter

I think the case could be made to mark re haskell platform using 7.8.3 assuming the timeline for 7.8.3 is soon enough...

comment:23 Changed 5 years ago by thoughtpolice

Resolution: fixed
Status: mergeclosed

Merged into 7.8 branch for 7.8.3.

comment:24 Changed 3 years ago by Thomas Miedema <thomasmiedema@…>

In 0eb0378/ghc:

Testsuite: do not add -debug explicitly in .T file

This prevents `cannot find -lHSrts_debug_p` when running `make
TEST=T9078 WAY=profasm` (#9078).
Note: See TracTickets for help on using tickets.