Opened 4 years ago

Closed 4 years ago

#10672 closed bug (fixed)

GHCi linker does not understand C++ exception tables on Windows

Reported by: lukexi Owned by: Phyx-
Priority: high Milestone: 7.10.3
Component: Runtime System (Linker) Version: 7.10.1
Keywords: Cc: ezyang, thomie, simonmar, thoughtpolice, igloo, JohnWiegley, Phyx-
Operating System: Windows Architecture: Unknown/Multiple
Type of failure: Compile-time crash Test Case: rts/T10672/T10672_x86 T10672_x64
Blocked By: Blocking:
Related Tickets: #9297 #10563 #8237 #9907 Differential Rev(s): Phab:D1244
Wiki Page:

Description

When compiling an executable that uses Template Haskell against a library that contains C++ code, GHC crashes:

[2 of 2] Compiling Main             ( app\Main.hs, dist\build\main\main-tmp\Main.o )
ghc.exe: internal error: checkProddableBlock: invalid fixup in runtime linker: 0000000000360564
    (GHC version 7.10.1 for x86_64_unknown_mingw32)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

I've boiled this down into a minimal reproduction of a library that includes a .cpp file, and an executable that depends on it. To test:

git clone https://github.com/lukexi/cxx-link-fail-repro
cabal run

The crash does not occur in the repro unless I use C++ exceptions in the library, and use Template Haskell in the executable, but in the project I boiled this down from (http://github.com/lukexi/bullet-mini) the problem occurs even with cc-options: -fno-exceptions.

Some more details are at https://github.com/lukexi/cxx-link-fail-repro

The platform is Windows 8.1 under MSYS2 (GHC is still using its inbuilt mingw). I've also tried 7.10.2-RC1 with the same result.

Change History (22)

comment:1 Changed 4 years ago by lukexi

I've tested on GHC 7.8.4 now and get this error instead:

Loading package cxxylib-0.1.0.0 ... ghc.exe: Unknown PEi386 section name `.gcc_except_table' (while processing: C:\msys64\home\lukex_000\cxx-link-fail-repro\dist\build\HScxxylib-0.1.0.0.o)

<no location info>:
    ghc.exe: panic! (the 'impossible' happened)
  (GHC version 7.8.4 for x86_64-unknown-mingw32):
        loadObj "C:\\msys64\\home\\lukex_000\\cxx-link-fail-repro\\dist\\build\\HScxxylib-0.1.0.0.o": failed

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

so it looks like this is related to #9907 and friends.

comment:2 Changed 4 years ago by lukexi

Cc: thoughtpolice added

comment:3 Changed 4 years ago by lukexi

Results of 7.8.4 with bullet-mini, which is using -fno-exceptions to avoid linking libgcc_s_sjlj-1.dll

Loading package bullet-mini-0.1.0.0 ... ghc.exe: Unknown PEi386 section name `.text$_ZN21btBroadphaseInterfaceD1Ev' (while processing: C:\msys64\home\lukex_000\Projects\bullet-mini\dist\build\HSbullet-mini-0.1.0.0.o)

<no location info>:
    ghc.exe: panic! (the 'impossible' happened)
  (GHC version 7.8.4 for x86_64-unknown-mingw32):
        loadObj "C:\\msys64\\home\\lukex_000\\Projects\\bullet-mini\\dist\\build\\HSbullet-mini-0.1.0.0.o": failed

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

comment:4 Changed 4 years ago by lukexi

I'm able to get by the crash by commenting out the first call to checkProddableBlock in ocResolve_PEi386 https://github.com/ghc/ghc/blob/master/rts/Linker.c#L4696. I'm not sure how to translate that into a proper fix yet — does that give anyone familiar with the linker any clues?

comment:5 Changed 4 years ago by lukexi

Milestone: 7.10.3

comment:6 Changed 4 years ago by lukexi

Cc: igloo added

comment:7 Changed 4 years ago by lukexi

To anyone else bitten by this:

While this is figured out, I'm just stripping out all uses of Template Haskell in my software since I'm only using it for generating lenses.

I put up a simple preprocessor here to do that during the build phase https://github.com/lukexi/strip-ths, designed to work with stack or cabal.

(this workaround works because, as I mention in the README for the repro,

Everything also works [...] if I just call cabal exec -- ghc app/Main.hs. This is presumably because GHC doesn't try to link cxxylib when it is just compiling the Template Haskell splices, whereas Cabal asks it to.

)

comment:8 Changed 4 years ago by JohnWiegley

Cc: JohnWiegley added

comment:9 Changed 4 years ago by Phyx-

Cc: Phyx- added

Hmm, this seems to happen because in checkProddableBlock the oc->proddables is NULL. So it has no blocks to check and it errors out. Don't know why it errors out and doesn't skip that ObjectCode and continue when there are no proddables. I'm pretty new at the linker so hopefully someone who knows more about this part can explain it.

Also it won't work on anything prior to 7.10.x since you'll get the Unknown PE section error fixed in #9907

I don't think #9297 is related either, which seems to have more to do with the name manglings.

comment:10 Changed 4 years ago by ezyang

Hello lukexi,

The problem is that your binary contains a section which (1) is pointed to by relocations, and (2) GHC doesn't understand. The pre-7.10.x error is better because it looks like to make your code work we really do need to understand the relocation. (We could paper over the problem by simply skipping relocations in sections we don't understand, but it is quite likely that you will get some even more bizarre error.)

Can you run your test program with +RTS -Dl (debug RTS) and tell us exactly what section is unknown?

UPDATE: Actually, it looks like the exception table...

Edward

Last edited 4 years ago by ezyang (previous) (diff)

comment:11 Changed 4 years ago by lukexi

Hi Edward, thanks for the reply!

Just to be sure, do you mean compiling a GHC with GhcDebugged=YES and then using cabal build --ghc-options="+RTS -Dl"?

Luke

comment:12 Changed 4 years ago by ezyang

Yeah, something like make re2 GhcDebugged=YES. But it looks like the sections in question are .gcc_except_table and .text$_ZN21btBroadphaseInterfaceD1Ev.

So two things we need to do:

  1. Handle the exception table correctly
  2. Match against the PREFIX so that anything starting with .text is loaded as a text segment.

comment:13 Changed 4 years ago by ezyang

Component: CompilerRuntime System (Linker)
Summary: checkProddableBlock crash during Template Haskell linkingGHCi linker does not understand C++ exception tables on Windows

comment:14 Changed 4 years ago by Phyx-

Owner: set to Phyx-

comment:15 Changed 4 years ago by lukexi

Hi Phyx-/all — let me know if there are any public repos with work-in-progress on this; I'd love to help out! Planning on giving it some time later this week.

comment:16 Changed 4 years ago by Phyx-

@lukexi You can find the work in progress code at https://github.com/Mistuke/ghc/tree/rework-windows-pe-linker it should already fix the problem reported here (at least the code compiles and runs and prints three lines).

I've only tested it for x86_64 so far.

I'll be making a few other changes and test x86 before sending this out for review :) I will keep the branch on GitHub up-to-date.

comment:17 Changed 4 years ago by lukexi

Awesome stuff Tamar!!

I confirmed your branch works on my test case and also backported the patches to 7.10.2 at https://github.com/lukexi/ghc/tree/ghc-7.10.2-release-plus-rework-windows-pe-linker (git did 99% of the work, they mostly applied cleanly) and was able to build all of bullet-mini flawlessly, hurray! Thanks so much for your hard work.

comment:18 Changed 4 years ago by Phyx-

Differential Rev(s): Phab:D1244
Status: newpatch

Thanks for verifying @lukexi!. Good to hear that it worked for your test cases as well. Patch has been submitted for review :)

comment:19 Changed 4 years ago by Thomas Miedema <thomasmiedema@…>

In 620fc6f9/ghc:

Make Windows linker more robust to unknown sections

The Windows Linker has 3 main parts that this patch changes.

1) Identification and classification of sections
2) Adding of symbols to the symbols tables
3) Reallocation of sections

1.
Previously section identification used to be done on a whitelisted
basis. It was also exclusively being done based on the names of the
sections. This meant that there was a bit of a cat and mouse game
between `GCC` and `GHC`. Every time `GCC` added new sections there was a
good chance `GHC` would break. Luckily this hasn't happened much in the
past because the `GCC` versions `GHC` used were largely unchanged.

The new code instead treats all new section as `CODE` or `DATA`
sections, and changes the classifications based on the `Characteristics`
flag in the PE header. By doing so we no longer have the fragility of
changing section names. The one exception to this is the `.ctors`
section, which has no differentiating flag in the PE header, but we know
we need to treat it as initialization data.

The check to see if the sections are aligned by `4` has been removed.
The reason is that debug sections often time are `1 aligned` but do have
relocation symbols. In order to support relocations of `.debug` sections
this check needs to be gone. Crucially this assumption doesn't seem to
be in the rest of the code. We only check if there are at least 4 bytes
to realign further down the road.

2.
The second loop is iterating of all the symbols in the file and trying
to add them to the symbols table. Because the classification of the
sections we did previously are (currently) not available in this phase
we still have to exclude the sections by hand. If they don't we will
load in symbols from sections we've explicitly ignored the in # 1. This
whole part should rewritten to avoid this. But didn't want to do it in
this commit.

3.
Finally the sections are relocated. But for some reason the PE files
contain a Linux relocation constant in them `0x0011` This constant as
far as I can tell does not come from GHC (or I couldn't find where it's
being set). I believe this is probably a bug in GAS. But because the
constant is in the output we have to handle it. I am thus mapping it to
the constant I think it should be `0x0003`.

Finally, static linking *should* work, but won't. At least not if you
want to statically link `libgcc` with exceptions support. Doing so would
require you to link `libgcc` and `libstd++` but also `libmingwex`. The
problem is that `libmingwex` also defines a lot of symbols that the RTS
automatically injects into the symbol table. Presumably because they're
symbols that it needs. like `coshf`. The these symbols are not in a
section that is declared with weak symbols support. So if we ever want
to get this working, we should either a) Ask mingw to declare the
section as such, or b) treat all a imported symbols as being weak.
Though this doesn't seem like it's a good idea..

Test Plan:
Running ./validate for both x86 and x86_64

Also running the specific test case for #10672

make TESTS="T10672_x86 T10672_x64"

Reviewed By: ezyang, thomie, austin

Differential Revision: https://phabricator.haskell.org/D1244

GHC Trac Issues: #9907, #10672, #10563

comment:20 Changed 4 years ago by thomie

Status: patchmerge
Test Case: rts/T10672/T10672_x86 T10672_x64

comment:22 Changed 4 years ago by bgamari

Resolution: fixed
Status: mergeclosed
Note: See TracTickets for help on using tickets.