Ticket #1183 (closed defect: fixed)

Opened 5 years ago

Last modified 4 years ago

Segfault in ghci 6.12.1

Reported by: guest Owned by: axel
Priority: normal Milestone: 0.11.0
Component: general (Gtk+, Glib) Version: 0.11.0
Keywords: ghci segfault Cc:

Description

Summary: I get a segmentation fault when running gtk2hs in interactive mode. I haven't seen this happen (yet?) with a compiled gtk2hs program. I think that it may be related to garbage collection.

Environment:

OS: Arch Linux (x86_64) GHC: 6.12.1 gtk2hs: Darcs - last commit @ Fri Feb 12 00:23:26 PST 2010

Reproduce:

  • Open hello world demo in ghci
  • Run main
  • Everything works and looks normal
  • Wait a few moments (so far never more than a 30 seconds)
  • Segmentation fault

Backtrace: Unfortunately, Arch Linux strips out a lot of debugging information so my gdb backtrace is rather useless. I tried to recompile without debug stripping, but I have yet to have the right magic to get it working correctly.

Garbage collection?: I ran ghci with +RTS -S and performed the same steps. On startup, I got lots of things, typically (Gen: 0) - eg:

524192 221720 12653616 0.00 0.00 0.46 3.52 0 0 (Gen: 0)

Which is perfectly fine. But when the GUI starts up, I get silence for the however many seconds, then I get:

231704 7528416 11688904 0.04 0.04 0.95 9.02 0 0 (Gen: 1)

and an immediate crash afterwards. In compiled mode, I see similar GC's (both) and I don't get a crash.

Change History

  Changed 5 years ago by axel

  • status changed from new to closed
  • resolution set to fixed

We have fixed two issues with respect to concurrency and the difference between a compiled Gtk2Hs program and one run in ghci is mostly that the latter is always multi-threaded. I don't think that the issue you report still occurs since objects are now properly collected and destroyed, even when compiled with -threaded.

  Changed 4 years ago by guest

Odd, I'm still getting this problem. Because no one else is complaining, I can only assume this is an isolated case with my system so I'm leaving this as closed.

However, I would appreciate it if I could get help diagnosing this problem. Is there anything I can provide to find out what's wrong on my system? I'm still running ghc 6.12.1 but now I'm running gtk2hs 0.11.0 and the issue persists.

Thanks.

  Changed 4 years ago by guest

  • status changed from closed to reopened
  • resolution fixed deleted

I started researching this a little more, but now I have a little more data. I went ahead and added tracing to System.Glib.FFI, updating:

newForeignPtr :: Ptr a -> FinalizerPtr a -> IO (ForeignPtr a)
newForeignPtr p finalizer
   = trace ("NP:" ++ show p) Foreign.Concurrent.newForeignPtr p (putStrLn ("GC:" ++ show p) >> (mkFinalizer finalizer p))

And then adding putStrLn around the hello examples:

  putStrLn "Window going to be created"
  window <- windowNew
  putStrLn "Window was created"
... 
  putStrLn "Button will be created"
  button <- buttonNew
  putStrLn "Button was created"
...

Then I ran the following and paused:

$ date; echo main | ghci -package gtk World.hs; date
Thu Jun 10 19:15:03 PDT 2010
GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package array-0.3.0.0 ... linking ... done.
Loading package bytestring-0.9.1.5 ... linking ... done.
Loading package containers-0.3.0.0 ... linking ... done.
Loading package filepath-1.1.0.3 ... linking ... done.
Loading package old-locale-1.0.0.2 ... linking ... done.
Loading package old-time-1.0.0.3 ... linking ... done.
Loading package unix-2.4.0.0 ... linking ... done.
Loading package directory-1.0.1.0 ... linking ... done.
Loading package process-1.0.1.2 ... linking ... done.
Loading package time-1.1.4 ... linking ... done.
Loading package random-1.0.0.2 ... linking ... done.
Loading package haskell98 ... linking ... done.
Loading package glib-0.11.0 ... linking ... done.
Loading package mtl-1.1.0.2 ... linking ... done.
Loading package cairo-0.11.0 ... linking ... done.
Loading package gio-0.11.0 ... linking ... done.
Loading package pretty-1.0.1.1 ... linking ... done.
Loading package pango-0.11.0 ... linking ... done.
Loading package gtk-0.11.0 ... linking ... done.
Loading package ffi-1.0 ... linking ... done.
[1 of 1] Compiling Main             ( World.hs, interpreted )
Ok, modules loaded: Main.
*Main> Window going to be created
NP:0x00000000027662f0
Window was created
Button will be created
NP:0x000000000279c000
Button was created
GC:0x000000000279c000
Segmentation fault
Thu Jun 10 19:15:14 PDT 2010

Clearly it's garbage collecting something with the button that it shouldn't be collecting. Unfortunately, I'm not familiar enough with concurrent garbage collection, gtk2hs, or ghc to fix this on my own. But if I find anything else, I'll be sure let you know.

  Changed 4 years ago by axel

  • owner changed from somebody to axel
  • status changed from reopened to new

I cannot reproduce this using the current darcs head on Mac OS. It could be a Linux or x86_64 specific problem.

I'm not quite clear when the error occurs. You're saying that you close the Hello World window, then you get to the prompt and then the GC causes the segfault?

In gtk/Graphics/UI/Gtk/General/hsgthread.c there is a #undef DEBUG that you can change to #define DEBUG to get more information on when Gtk2Hs adds finalizers to a queue (this happens during GC) and when they are freed (this happens when the Gtk main loop runs).

  Changed 4 years ago by guest

I run the command I listed above and then I take my hands off the keyboard and wait a few seconds. I don't press the button nor do I close the window. Given a few seconds ghci segfaults (there is a few second pause between "Button was created" and the "GC").

I had tried defining DEBUG and it does not appear to print in the ghci. I get the same output as you see in my previous post. However, when I compile with threaded, it will print it out:

$ ./World 
Window going to be created
NP:0x000000000117e2f0
Window was created
Button will be created
NP:0x00000000011e1000
Button was created
GC:0x00000000011e1000
adding finalizer!
running 1 finalizers!

Strangely enough, there was no pause between "Button was created" and "GC".

So to investigate further, I added a few more debugging statements at the function call entry points and some specifically around the mutex lock:

#ifdef DEBUG
    printf("calling g_static_mutex_lock(0x%x)\n", gtk2hs_finalizer_mutex);
#endif
    g_static_mutex_lock(&gtk2hs_finalizer_mutex);
#ifdef DEBUG
    printf("returnd from g_static_mutex_lock(0x%x)\n", gtk2hs_finalizer_mutex);
#endif

Here's the ghci output. Again, the pause remains between "Button was created" and "GC".

$ date; echo main | ghci -package gtk World.hs; date
Fri Jun 11 07:50:24 PDT 2010
GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package array-0.3.0.0 ... linking ... done.
Loading package bytestring-0.9.1.5 ... linking ... done.
Loading package containers-0.3.0.0 ... linking ... done.
Loading package filepath-1.1.0.3 ... linking ... done.
Loading package old-locale-1.0.0.2 ... linking ... done.
Loading package old-time-1.0.0.3 ... linking ... done.
Loading package unix-2.4.0.0 ... linking ... done.
Loading package directory-1.0.1.0 ... linking ... done.
Loading package process-1.0.1.2 ... linking ... done.
Loading package time-1.1.4 ... linking ... done.
Loading package random-1.0.0.2 ... linking ... done.
Loading package haskell98 ... linking ... done.
Loading package glib-0.11.0 ... linking ... done.
Loading package mtl-1.1.0.2 ... linking ... done.
Loading package cairo-0.11.0 ... linking ... done.
Loading package gio-0.11.0 ... linking ... done.
Loading package pretty-1.0.1.1 ... linking ... done.
Loading package pango-0.11.0 ... linking ... done.
Loading package gtk-0.11.0 ... linking ... done.
Loading package ffi-1.0 ... linking ... done.
Ok, modules loaded: Main.
Prelude Main> gtk2hs_threads_initialise()
Window going to be created
NP:0x000000000300b2f0
Window was created
Button will be created
NP:0x0000000003041000
Button was created
GC:0x0000000003041000
gtk2hs_g_object_unref_from_mainloop(0x3041000)
calling g_static_mutex_lock(0x0)
Segmentation fault
Fri Jun 11 07:50:30 PDT 2010

And just in case it's important, here's the compiled version:

$ ghc --make World.hs -threaded
[1 of 1] Compiling Main             ( World.hs, World.o )
Linking World ...
$ ./World 
gtk2hs_threads_initialise()
Window going to be created
NP:0x00000000026a52f0
Window was created
Button will be created
NP:0x0000000002708000
Button was created
GC:0x0000000002708000
gtk2hs_g_object_unref_from_mainloop(0x2708000)
calling g_static_mutex_lock(0x0)
returnd from g_static_mutex_lock(0x0)
adding finalizer!
gtk2hs_run_finalizers(0x0)
running 1 finalizers!
Hello World
A "clicked"-handler to say "destroy"

I waited and there was no segfault, so I clicked the button.

It appears that it's all the same except ghci segfaults on the mutex lock. Which is very strange to me.

Thanks for your help!

  Changed 4 years ago by guest

So I noticed the comment saying ghci had problems on windows 7 with g_static_mutex_lock as well. So I settled with changing it all to use pthread mutex locking and now it seems to work great. I didn't do a particularly clean or portable job of it, so I won't share the code here. But it's a pretty trivial change.

  Changed 4 years ago by axel

I don't quite follow. You said this is on Linux. How can you "fix" the problem without using the g_static_mutex?

If this is an x86_64 issue then we better sort this out before the next release.

There was a bug in the released 0.11.0 version of Gtk2Hs in that the Gtk+ mutex was not initialized properly. However, this has nothing to do with the mutex that is used to protect the GC queue. Since your program works fine with ghc and ghc -threaded but not in ghci, it is probably not the Gtk+ mutex that is affected but actually the GC mutex.

So, assuming it is a ghci issue, I wonder if this is reproducible on any x86_64 machine. Have you encountered this problem on any other machine? I can try to install your version combination somewhere, but I'd like to use the correct hardware first.

follow-up: ↓ 9   Changed 4 years ago by guest

Yes, I've reproduced this problem with the same setup (Arch Linux, x86_64) on a different machine. I've also duplicated it on the newest release of GHC (6.12.3).

In case it makes a difference, the uname -m -p of the two machines:

  • x86_64 Intel(R) Pentium(R) D CPU 3.20GHz
  • x86_64 Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz

I assumed it was GTK's mutex because it was reported to be an issue for Win 7 in only ghci, my tests show it segfaults when calling g_static_mutex_lock(..), and replacing the GTK mutex with a pthread mutex hasn't given me any obvious errors yet -- but I don't claim to be an expert on this, so I could very well be wrong.

The "fix" included adding:

#include <pthread.h>

Using the following definition of the finalizer mutex:

static pthread_mutex_t gtk2hs_finalizer_mutex = PTHREAD_MUTEX_INITIALIZER;

And locking/unlocking with:

    pthread_mutex_lock(&gtk2hs_finalizer_mutex);
    pthread_mutex_unlock(&gtk2hs_finalizer_mutex);

in reply to: ↑ 8   Changed 4 years ago by guest

Oh, and yes - I do run Linux. I was just mentioning the Win 7 thing because the comment in hsgthread.c:

 * Also g_static_mutex_lock and g_static_mutex_unlock cause linking problems
 * in ghci on Windows 7 (namely: HSgtk-0.10.5.o: unknown symbol
 * `__imp__g_threads_got_initialized'), so we use a Win32 critical section
 * instead.

seemed to indicate that Win 7 was having a similar issue that I was (well, sort of). And it was fixed using a similar solution (abandoning GTK's mutex and using a different implementation).

  Changed 4 years ago by axel

  • status changed from new to assigned

I would like to wrap this thing up before the new patch-level release.

So we know:

- you get a segfault only in ghci and only on x86-64 - you don't get a segfault if you rewrite hsgthread.c to use Posix threads

The comment you quote only refers to a linking issue, not to any function problem. Thus, replacing the glib threads with Posix threads may work but makes us no wiser.

After the 0.11.0 release, I fixed an issue in which threads were initialized incorrectly. This fixed a problem on Solaris where the glib threads used a different implementation. I hope this also fixes your problem. Could you build the most recent darcs version of gtk2hs and check if the problem persists? If yes, I will investigate (if I can find the appropriate hardware/software combo).

  Changed 4 years ago by axel

  • keywords ghci segfault added
  • status changed from assigned to closed
  • resolution set to fixed

This is a bug in ghci on x86-64 in that global variables are not resolved correctly. The workaround is to compile with -fPIC which I've added in the .cabal file. (This flag affects just the C files.)

Note: See TracTickets for help on using tickets.