Opened 5 years ago

Closed 5 years ago

#10080 closed bug (fixed)

Recusive IO actions crash with segmentation fault

Reported by: nakal Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.3
Keywords: Cc:
Operating System: FreeBSD Architecture: x86_64 (amd64)
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by nakal)

I am learning Haskell and wrote a program that writes some strings to stdout infinitely by recursing in an IO action. It seems such a simple construct crashes on FreeBSD when Ctrl+C is pressed to interrupt it.

I have been able to reduce it to a simple case like this:

main = do
        putStrLn "hello world"
        main

I compiled it with

ghc --make hello.hs

Ran it and then pressed Ctrl+C and immediately got:

[...]
hello world
hello world
hello world
hello world
^CSegmentation fault (core dumped)

Here is the backtrace of the crash:

(gdb) bt
#0  0x000000000047cdd7 in generic_handler ()
#1  0x000000080147c467 in swapcontext () from /lib/libthr.so.3
#2  0x000000080147c062 in sigaction () from /lib/libthr.so.3
#3  <signal handler called>
#4  0x00000008017d7b7a in select () from /lib/libc.so.7
#5  0x0000000801479b32 in select () from /lib/libthr.so.3
#6  0x000000000043cb13 in fdReady ()
#7  0x000000000044751c in base_GHCziIOziFD_zdwa3_info ()
#8  0x0000000000000000 in ?? ()

Operating system version is: FreeBSD 10.1-RELEASE-p5 (GENERIC)

Change History (11)

comment:1 Changed 5 years ago by nakal

Description: modified (diff)

comment:2 Changed 5 years ago by pgj

What is the underlying hardware? Unfortunately I could not yet reproduce this problem on my Intel Core i5 (with amd64), but this might be because I do not have a 10.1-RELEASE-p5 installed on that machine. (That is a bit older 11-CURRENT instead.) However, I was able to observe the same problem on a Hyper-V instance with 10.1 -- but I am not still sure if this is due to the virtualization or the kernel itself. I have also tried it with the 11-CURRENT kernel and the 10.1 userland (libthr and libc) but I did not get any problems.

comment:3 Changed 5 years ago by nakal

This is not a virtualized system and the segfaults are reproducible without any exceptions. Here comes some data about it:

CPU: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (3503.51-MHz K8-class CPU) FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads real memory = 17179869184 (16384 MB)

At work I have some other system, also running FreeBSD-10.1-RELEASE on a AMD 3-core processor with 8 GB RAM. It also segfaults.

Notice:

I also have actually some other race conditions with signals on FreeBSD/amd64 with GHC-compiled programs that happen in Xmonad and are described here: https://code.google.com/p/xmonad/issues/detail?id=576#c6start=100

I would like to add it to the bugtracker as a separate problem (I still don't know how to describe it precisely enough for a bug report, but I am almost sure it belongs here rather than the Xmonad bugtracker), but looking at all this, it appears to me that GHC-binaries have some more problems with signals and threads on FreeBSD/amd64.

comment:4 Changed 5 years ago by pgj

Thanks for the details. I will keep trying to reproduce the problem. Based on my past experiences, I still believe that the problem comes from the FreeBSD base system, not GHC itself -- in such situations, I usually do some bisecting between the (kernel + userland) revisions to see which commit might have introduced the bug. It may also happen (as you wrote above) that it is all about race conditions which may require a system fast enough to make them visible.

comment:5 Changed 5 years ago by pgj

All right, here is a more detailed backtrace, just for the record:

#0  0x00000000004801d7 in generic_handler ()
#1  0x00000008014810d6 in handle_signal (actp=0x7fffffffa418, sig=2, info=0x10006, ucp=0x7fffffffa480) at /usr/src/lib/libthr/thread/thr_sig.c:238
#2  0x0000000801480ad5 in thr_sighandler (sig=2, info=0x10006, _ucp=0x7fffffffa480) at /usr/src/lib/libthr/thread/thr_sig.c:183
#3  <signal handler called>
#4  select () at select.S:3
#5  0x000000080147ce95 in __select (numfds=2, readfds=0x7fffffffa930, writefds=0x7fffffffa9b0, exceptfds=0x0, timeout=0x7fffffffa920)
    at /usr/src/lib/libthr/thread/thr_syscalls.c:561
#6  0x00000000004534f3 in fdReady ()
#7  0x0000000000458e54 in base_GHCziIOziFD_zdwa3_info ()
#8  0x0000000000000000 in ?? ()

comment:6 Changed 5 years ago by nakal

I have various systems on which this small hello-world program crashes. In fact, every system I try crashes it reliably no matter how fast or slow it is.

Here two Xeons of type:

CPU: Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz (1800.04-MHz K8-class CPU) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) real memory = 17179869184 (16384 MB)

And then the AMD triple-core (which I mentioned above) which is probably not as fast as your i5:

CPU: AMD Athlon(tm) II X3 460 Processor (3400.20-MHz K8-class CPU) FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 1 package(s) x 3 core(s) real memory = 8589934592 (8192 MB)

Last edited 5 years ago by nakal (previous) (diff)

comment:7 Changed 5 years ago by pgj

Yeah, I have also managed to reproduce the crash on my dual-core AMD C-50 netbook (with 10.1-RELEASE), which is probably much slower than all of these machines. In the meantime, I have contacted Kostik Belousov (kib@), who maintains libthr(3) these days, but unfortunately he could not yet reproduce the problem on any of his machines. Although, this might be because he runs 10-STABLE. Perhaps could you please try a recent revision of the stable/10 branch and report if the problem still persists there?

comment:8 Changed 5 years ago by nakal

I've built FreeBSD 10.1-STABLE (r278692) and I can confirm that the bug is gone.

comment:9 Changed 5 years ago by pgj

That is great! So may I close this ticket?

comment:10 Changed 5 years ago by nakal

Yes, it looks to me as it is indeed a bug in FreeBSD 10.1-RELEASE and has been fixed in a future release. Thank you for your help.

comment:11 Changed 5 years ago by pgj

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.