Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#10435 closed bug (fixed)

catastrophic exception-handling disablement on Windows Server 2008 R2

Reported by: malcolmw Owned by: simonmar
Priority: normal Milestone: 7.10.3
Component: Runtime System Version: 7.10.1
Keywords: windows, exceptions Cc: malcolm.wallace@…, ndmitchell@…, simonmar, Phyx-
Operating System: Windows Architecture: x86
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: #9218 #10726 Differential Rev(s):
Wiki Page:

Description

We have found a very strange RTS bug that only manifests on some Windows machines, running Windows Server 2008 R2. It does not occur on our Windows 7 machines. We are not sure whether the installed version of Visual C runtime system matters.

The context is a ghc-compiled executable, that calls a function from a C++ DLL. The C++ function throws an exception internally, then catches it, and returns normally. The symptom of the bug is that, from ghc-7.8.1 onwards, including ghc-7.10.1, the C++ exception is not caught by the C++ code, but terminates the program catastrophically, with exit code 127. When the Haskell executable is compiled by ghc-7.2.3 or before, the bug does not happen.

If, instead of having the main function in Haskell, we write a wrapper main function in C++, that calls the Haskell from a DLL (and the Haskell then calls back into C++), the bug does not happen. Hence, we surmise there is some ghc RTS initialisation that is specific to Windows, that deals with exception handling, and that is incorrect for certain versions of Windows.

Attached to the ticket, please find (a) a C++ module, which exports a single function that throws an exception and catches it; (b) a Haskell module which imports the C++ via the FFI, and calls it; (c) a build script which compiles the C++ to a DLL, using the MSVC compiler, compiles the Haskell with ghc, and links them together.

Attachments (3)

Exception.cpp (351 bytes) - added by malcolmw 5 years ago.
TestExceptions.hs (948 bytes) - added by malcolmw 5 years ago.
exception.sh (461 bytes) - added by malcolmw 5 years ago.

Download all attachments as: .zip

Change History (23)

Changed 5 years ago by malcolmw

Attachment: Exception.cpp added

Changed 5 years ago by malcolmw

Attachment: TestExceptions.hs added

Changed 5 years ago by malcolmw

Attachment: exception.sh added

comment:1 Changed 5 years ago by malcolmw

Correction: I meant to say that the bug does not appear with ghc-7.6.3 and below.

comment:2 Changed 5 years ago by malcolmw

Here is the expected output, on a Windows 7 machine:

$ ./TestExceptions.exe
1
Called foo() with bar=0
$ echo $?
0

And here is the unexpected output, on a Windows 2008 R2 machine:

$ ./TestExceptions.exe
$ echo $?
127

comment:3 Changed 5 years ago by NeilMitchell

Cc: ndmitchell@… added

comment:4 Changed 4 years ago by simonmar

Speculation: perhaps there's a difference in the ABI for C++ exceptions between the MS C Compiler and the gcc that we ship with GHC 7.8 and later?

Did you try creating the DLL using gcc?

Possibly related: 5200bdeb26c5ec98739b14b10fc8907296bceeb9

comment:5 Changed 4 years ago by malcolmw

There is a possibility that this is a conjunction of some weirdness in the particular version of the MS VC++ runtime, together with a weirdness in the ghc initialisation. Unfortunately, we have not been able to check a different version of the MSVCRT on that machine.

Also, sadly, it is not a possibility for us to build our C++ DLLs with gcc on Windows.

comment:6 Changed 4 years ago by malcolmw

Do you know if the commitdiff you referenced has made it into a ghc distro that I could play with?

comment:7 Changed 4 years ago by simonmar

That commit isn't on the 7.10 branch. You could patch it into a local GHC build and see if it helps.

If, instead of having the main function in Haskell, we write a wrapper main function in C++, that calls the Haskell from a DLL (and the Haskell then calls back into C++), the bug does not happen. Hence, we surmise there is some ghc RTS initialisation that is specific to Windows, that deals with exception handling, and that is incorrect for certain versions of Windows.

The only difference between these two setups is the main() function, which is pretty small (start with hs_main): https://phabricator.haskell.org/diffusion/GHC/browse/master/rts/RtsMain.c

Really the only thing in there that looks remotely suspicious is the SEH stuff that was touched by 5200bdeb26c5ec98739b14b10fc8907296bceeb9, so that looks like the most likely suspect. It only does a setjmp/longjump, but perhaps that interacts badly with that particular version of MSVCRT. Or something.

comment:8 Changed 4 years ago by Phyx-

I've only seen this Ticket today, but the exit code has been bugging me. Getting a 127 on a SEH error is odd.

Building the files I was able to re-produce the error using a fresh Win 2008 R2 VM.

However running the application outside of a console or using dependencywalker that 127 became clear. I was missing the proper visual c++ runtime file. Hence the 127 which is usually "missing command" or "missing file".

Can you try running dependency walker to see if you have any missing dependencies? Installing the proper VCRedist solved the first problem for me and the application then threw an unhandled exception resulting in a APPLICATION_FAULT_SEHOP and another crash. but the first part of the result is written. I have issues remote debugging on the VM so I didn't look further into this.

I currently get:

PS E:\> .\TestExceptions.exe
Called foo() with bar=0
PS E:\>

comment:9 Changed 4 years ago by Phyx-

Cc: Phyx- added

comment:10 Changed 4 years ago by malcolmw

I don't see any missing VC runtime. But running the .exe outside the Cygwin shell was a good idea. I then see a popup console which contains the expected (partial) output "Called foo() with bar=0", but then throws a popup error dialogue telling me ""TestExceptions.exe has stopped working" with the following extra info:

Problem signature:
  Problem Event Name:	APPCRASH
  Application Name:	TestExceptions.exe
  Application Version:	0.0.0.0
  Application Timestamp:	5587dbd2
  Fault Module Name:	KERNELBASE.dll
  Fault Module Version:	6.1.7601.23002
  Fault Module Timestamp:	5507b1dc
  Exception Code:	e06d7363
  Exception Offset:	0000c44d
  OS Version:	6.1.7601.2.1.0.274.10
  Locale ID:	2057
  Additional Information 1:	0271
  Additional Information 2:	02712a14bbb8bb18ca2af857dbc5b852
  Additional Information 3:	d1b4
  Additional Information 4:	d1b4e576b897fdc9bfef974c5ac3140c

comment:11 Changed 4 years ago by Phyx-

So this crash happens because of SEHOP, on the Windows client versions (Vista+) it is off by default (Windows 8 has it on for Microsoft processes) but on Windows server 2008 SEHOP is on by default.

SEHOP is a SEH exploit mitigation technique which among others checks that the SEH registration records ends with the default handler in ntdll. http://blogs.technet.com/b/srd/archive/2009/02/02/preventing-the-exploitation-of-seh-overwrites-with-sehop.aspx and https://www.exploit-db.com/docs/15379.pdf for how it works.

This same error can be gotten on windows 7 (or 8) by opting in to SEHOP manually http://blogs.technet.com/b/srd/archive/2009/11/20/sehop-per-process-opt-in-support-in-windows-7.aspx . This can be done globally or per process.

MingW-w64 and MSVC++ don't seem to have this problem, they both preserve the exception chain properly.

But GHC i686 seems to be using Mingw which has a different crtmain. But looking at it I can't figure out why it's going wrong, http://sourceforge.net/p/mingw/mingw-org-wsl/ci/21762bb4a1bd0c88c38eead03f59e8d994349e83/tree/src/libcrt/crt/crt1.c#l212

unless SetUnhandledExceptionFilter clears the exception chain. But that's doubtful (and calling it from Mingw-w64 g++ didn't reproduce the error).

This is looking like it's a bug in libcrt *somewhere* though I am not entirely sure, but there's nothing in GHC's rts that's manually modifying the exception chain (as far as I know).

So on the short term, what can you do?

I can think of two options:

1) compile the code to x86_64 2) opt your binary out of SEHOP on Windows 2008 R2

Second one is the easiest.

For GHC, we should either find out what's causing the issue and report it upwards (but Mingw hasn't been maintained since 2012 on first glance) or switch to MingW-w64 for both x86 and x64_86 since they seem to do it right http://sourceforge.net/p/mingw-w64/mingw-w64/ci/8a67ab4541226a80b3ec2047347890d915126de1/tree/mingw-w64-headers/crt/excpt.h#l102

A bit more detail on what's happening:

BUGCHECK_STR:  APPLICATION_FAULT_APPLICATION_FAULT_SEHOP

PRIMARY_PROBLEM_CLASS:  APPLICATION_FAULT_SEHOP

DEFAULT_BUCKET_ID:  APPLICATION_FAULT_SEHOP

LAST_CONTROL_TRANSFER:  from 74059339 to 74efb727

STACK_TEXT:  
00cedd78 74059339 e06d7363 00000001 00000003 KERNELBASE!RaiseException+0x58
00ceddb8 741e106a 00ceddd4 741e2280 026021b4 msvcr120!_CxxThrowException+0x5b
00ceddf0 004015b6 741e2104 00000000 00000000 Exception_cpp!foo+0x6a
WARNING: Stack unwind information not available. Following frames may be wrong.
00cedef0 76f9cbaf 00000000 00cee1a4 00000000 TestExceptions+0x15b6
74aba010 80000018 00000000 00000000 00000000 ntdll!LdrpResCompareResourceNames+0x1dc
74aba034 00000000 00010000 00000409 00000048 0x80000018

When RaiseException is called it tried to walk the exception chain.

On load, the exception chain is:

00cefb08: ntdll!_except_handler4+0 (77b274a0)
  CRT scope  0, filter: ntdll!LdrpDoDebuggerBreak+2e (77b43bf0)
                func:   ntdll!LdrpDoDebuggerBreak+32 (77b43bf4)
00cefca8: ntdll!_except_handler4+0 (77b274a0)
  CRT scope  0, func:   ntdll!LdrpInitializeProcess+16d4 (77b18e85)
00cefcf8: ntdll!_except_handler4+0 (77b274a0)
  CRT scope  0, filter: ntdll!_LdrpInitialize+42ace (77b2d4a4)
                func:   ntdll!_LdrpInitialize+42ae1 (77b2d4b7)

But when the exception happens something's off:

0:000> gh
ModLoad: 75470000 75497000   C:\WINDOWS\SysWOW64\IMM32.DLL
ModLoad: 759b0000 75ac2000   C:\WINDOWS\SysWOW64\MSCTF.dll
(2171c.203c4): C++ EH exception - code e06d7363 (first chance)
(2171c.203c4): C++ EH exception - code e06d7363 (!!! second chance !!!)
eax=00cedd10 ebx=004e6670 ecx=00000003 edx=00000000 esi=62602200 edi=00ceddc4
eip=75b04598 esp=00cedd10 ebp=00cedd68 iopl=0         nv up ei pl nz ac po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000212
KERNELBASE!RaiseException+0x48:
75b04598 8b4c2454        mov     ecx,dword ptr [esp+54h] ss:002b:00cedd64=8a43e393

0:000> !exchain
00ceddd4: *** ERROR: Symbol file could not be found.  Defaulted to export symbols for H:\Exception.cpp.dll - 
Exception_cpp!foo+ae0 (62601ae0)
00cefe88: 00000000
Invalid exception stack at 02603bc4

Without SEHOP the invalid stack is ignored, with SEHOP that fault (NTSTATUS) 0xe06d7363 is thrown and you get the crash you reported.

For the record, the mingw-w64 compilers return:

00eafdd4: msvcrt!_except_handler4+0 (773c7220)
  CRT scope  0, func:   msvcrt!doexit+110 (773b3bcc)
00eaffcc: ntdll!_except_handler4+0 (77b274a0)
  CRT scope  0, filter: ntdll!__RtlUserThreadStart+54386 (77b3f076)
                func:   ntdll!__RtlUserThreadStart+543cd (77b3f0bd)
00eaffe4: ntdll!FinalExceptionHandlerPad50+0 (77ad0241)

Which also correctly ends in FinalExceptionHandler.

comment:12 Changed 4 years ago by Phyx-

comment:13 Changed 4 years ago by malcolmw

Thanks for tracking this down. I'm going to try some registry-hacking, as suggested at https://support.microsoft.com/en-us/kb/956607 to disable SEHOP on our W2008 R2 server, as a workaround.

comment:14 Changed 4 years ago by Phyx-

I would suggest doing it per-process instead of globally if this is an internet facing server since SEH exploits seem to be fairly common exploit vectors. That's detailed http://blogs.technet.com/b/srd/archive/2009/11/20/sehop-per-process-opt-in-support-in-windows-7.aspx

Could you let me know if disabling SEHOP works for you too?

comment:15 Changed 4 years ago by malcolmw

Confirmed: disabling SEHOP in the registry makes C++ exceptions work again, when called from a ghc-compiled Haskell program.

comment:16 Changed 4 years ago by Phyx-

comment:17 Changed 4 years ago by Phyx-

This will be fixed when #10726 is committed.

comment:18 Changed 4 years ago by Ben Gamari <ben@…>

In 7b211b4/ghc:

Upgrade GCC to 5.2.0 for Windows x86 and x86_64

This patch does a few things

- Moved GHC x86 to MinGW-w64 (Using Awson's patch)
- Moves Both GHCs to MSYS2 toolchains
- Completely removes the dependencies on the git tarball repo
  - Downloads only the required tarball for the architecture for
    which we are building
  - Downloads the perl tarball is missing as well
  - Fixed a few bugs in the linker to fix tests on Windows

The links currently point to repo.msys2.org and GitHub, it might be
more desirable to mirror them on
http://downloads.haskell.org/~ghc/mingw/ as with the previous patch
attempt.

For more details on what the MSYS2 packages I include see #10726
(Awson's comment). but it should contain all we need
and no python or fortran, which makes the uncompressed tar a 1-2
hundreds mb smaller.

The `GCC 5.2.0` in the package supports `libgcc` as a shared library,
this is a problem since
when compiling with -shared the produced dll now has a dependency on
`libgcc_s_sjlj-1.dll`.
To solve this the flag `-static-libgcc` is now being used for all GCC
calls on windows.

Test Plan:
./validate was ran both on x86 and x86_64 windows and compared against
the baseline.

A few test were failing due to Ld no longer being noisy. These were
updated.

The changes to the configure script *should* be validated by the build
bots for the other platforms before landing

Reviewers: simonmar, awson, bgamari, austin, thomie

Reviewed By: thomie

Subscribers: #ghc_windows_task_force, thomie, awson

Differential Revision: https://phabricator.haskell.org/D1123

GHC Trac Issues: #10726, #9014, #9218, #10435

comment:19 Changed 4 years ago by Phyx-

Resolution: fixed
Status: newclosed

GHC x86 now uses MinGW-w64 as well, this problem no longer occurs with that toolchain.

comment:20 Changed 4 years ago by Phyx-

Milestone: 7.10.3

Fix to this has been merged back to 7.10.3

Note: See TracTickets for help on using tickets.