Opened 11 months ago

Last modified 9 months ago

#15808 new bug

Loading libraries with FFI exports may cause segfaults in the compiler if they are loaded far from the rts in memory.

Reported by: AndreasK Owned by:
Priority: normal Milestone:
Component: Compiler (Linking) Version: 8.7
Keywords: Cc:
Operating System: Windows Architecture: x86_64 (amd64)
Type of failure: Compile-time crash or panic Test Case:
Blocked By: Blocking:
Related Tickets: #16067 Differential Rev(s):
Wiki Page:

Description (last modified by AndreasK)

Original report below.

In this case we compile aeson which uses TH triggering dynamic loading of a number of libraries.

Some libraries (eg base) have FFI exports which require us to place a relative jump to the RTS in order to register a stable name. Now an issue arises if base is placed more than 2G from the RTS as we can't have relative jumps are limited to a 2GB range.

In the particular case this caused the jump target to underflow, resulting in a jump to unallocated memory and a segfault.

In more detail the PE linker (PEi386.c:ocResolve_PEi386) fails to detect, or properly deal with the bounds violation.

There seems to be some code in place to deal with an overflow already but fails to detect it.

I haven't had any luck with reproducing it outside of building the aeson package with cabal yet. So for now just documenting the fact. used

GhcLibHcOpts += -g3
GhcRtsHcOpts += -g3


SplitObjs          = NO
SplitSections      = NO
BUILD_MAN          = NO

Error log:

"E:/ghc_dwarf/inplace/bin/ghc-stage2.exe" "--make" "-fbuilding-cabal-package" "-O" "-outputdir" "C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build" "-odir" "C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build" "-hidir" "C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build" "-stubdir" "C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build""-i" "-iC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build" "-i." "-iattoparsec-iso8601/" "-ipure" "-iC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\autogen" "-iC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\global-autogen" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\autogen" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\global-autogen" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build" "-Iinclude" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\include" "-optP-include" "-optPC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\autogen\cabal_macros.h" "-this-unit-id" "aeson-" "-hide-all-packages" "-Wmissing-home-modules" "-no-user-package-db" "-package-db" "C:\Users\Andi\AppData\Roaming\cabal\store\ghc-8.7.20181025\package.db" "-package-db" "C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\packagedb\ghc-8.7.20181025" "-package-db" "C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\package.conf.inplace" "-package-id" "attoparsec-" "-package-id" "base-" "-package-id" "base-compat-0.10.5-34e11ceb2d98e0262d1d958bca2afc3184e70c60" "-package-id" "bytestring-" "-package-id" "containers-" "-package-id" "deepseq-" "-package-id" "dlist-" "-package-id" "ghc-prim-0.5.3" "-package-id" "hashable-" "-package-id" "primitive-" "-package-id" "scientific-" "-package-id" "tagged-0.8.6-d3cce1acba663b646f565adb64d80579664d8caa" "-package-id" "template-haskell-" "-package-id" "text-" "-package-id" "th-abstraction-" "-package-id" "time-" "-package-id" "time-locale-c_-" "-package-id" "unordered-con_-" "-package-id" "uuid-types-1.0.3-f68643250767dce83d2c227104d15a0aa9c3c77f" "-package-id" "vector-" "-XHaskell2010" "Data.Aeson" "Data.Aeson.Encoding" "Data.Aeson.Parser" "Data.Aeson.Text" "Data.Aeson.Types" "Data.Aeson.TH" "Data.Aeson.QQ.Simple" "Data.Aeson.Encoding.Internal" "Data.Aeson.Internal" "Data.Aeson.Internal.Time" "Data.Aeson.Parser.Internal" "Data.Aeson.Encode" "Data.Aeson.Compat" "Data.Aeson.Encoding.Builder" "Data.Aeson.Internal.Functions" "Data.Aeson.Parser.Unescape" "Data.Aeson.Parser.Time" "Data.Aeson.Types.FromJSON" "Data.Aeson.Types.Generic" "Data.Aeson.Types.ToJSON" "Data.Aeson.Types.Class" "Data.Aeson.Types.Internal" "Data.Attoparsec.Time" "Data.Attoparsec.Time.Internal" "Data.Aeson.Parser.UnescapePure" "-Wall" "-O2" "-hide-all-packages" "-g13"
[ 2 of 25] Compiling Data.Aeson.Internal.Functions ( Data\Aeson\Internal\Functions.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Internal\Functions.o ) [Data.HashMap.Strict changed]
[ 5 of 25] Compiling Data.Aeson.Types.Generic ( Data\Aeson\Types\Generic.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Types\Generic.o ) [Prelude.Compat changed]
[ 6 of 25] Compiling Data.Aeson.Types.Internal ( Data\Aeson\Types\Internal.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Types\Internal.o ) [Data.Vector changed]
[ 7 of 25] Compiling Data.Aeson.Parser.Internal ( Data\Aeson\Parser\Internal.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Parser\Internal.o ) [Data.Scientific changed]
[ 8 of 25] Compiling Data.Aeson.Parser ( Data\Aeson\Parser.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Parser.o ) [Data.Aeson.Parser.Internal changed]
[ 9 of 25] Compiling Data.Attoparsec.Time.Internal ( attoparsec-iso8601\Data\Attoparsec\Time\Internal.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Attoparsec\Time\Internal.o ) [Prelude.Compat changed]

attoparsec-iso8601\Data\Attoparsec\Time\Internal.hs:24:1: warning: [-Wunused-imports]
    The import of `Unsafe.Coerce' is redundant
      except perhaps to import instances from `Unsafe.Coerce'
    To import instances alone, use: import Unsafe.Coerce()
24 | import Unsafe.Coerce (unsafeCoerce)
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[10 of 25] Compiling Data.Attoparsec.Time ( attoparsec-iso8601\Data\Attoparsec\Time.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Attoparsec\Time.o ) [Data.Attoparsec.Text changed]
[11 of 25] Compiling Data.Aeson.Parser.Time ( Data\Aeson\Parser\Time.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Parser\Time.o ) [Data.Attoparsec.Text changed]
[12 of 25] Compiling Data.Aeson.Types.FromJSON ( Data\Aeson\Types\FromJSON.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Types\FromJSON.o ) [Data.Primitive.PrimArray changed]
[13 of 25] Compiling Data.Aeson.Internal ( Data\Aeson\Internal.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Internal.o ) [Data.Aeson.Types.FromJSON changed]
[14 of 25] Compiling Data.Aeson.Internal.Time ( Data\Aeson\Internal\Time.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Internal\Time.o ) [Data.Attoparsec.Time.Internal changed]
[15 of 25] Compiling Data.Aeson.Encoding.Builder ( Data\Aeson\Encoding\Builder.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Encoding\Builder.o ) [Data.Vector changed]
[16 of 25] Compiling Data.Aeson.Encoding.Internal ( Data\Aeson\Encoding\Internal.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Encoding\Internal.o ) [Data.Scientific changed]
[17 of 25] Compiling Data.Aeson.Encoding ( Data\Aeson\Encoding.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Encoding.o ) [Data.Aeson.Encoding.Internal changed]
[18 of 25] Compiling Data.Aeson.Types.ToJSON ( Data\Aeson\Types\ToJSON.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Types\ToJSON.o ) [Data.Primitive.PrimArray changed]

Access violation in generated code when executing data at 0x103fec440

 Attempting to reconstruct a stack trace...

   Frame        Code address
 * 0x845d9c0    0x103fec440
 * 0x845da20    0x400c0f8 E:\ghc_dwarf\inplace\bin\ghc-stage2.exe+0x3c0c0f8
 * 0x845da80    0x3fec9a1 E:\ghc_dwarf\inplace\bin\ghc-stage2.exe+0x3bec9a1
 * 0x845dab0    0x3feca31 E:\ghc_dwarf\inplace\bin\ghc-stage2.exe+0x3beca31
 * 0x845dab8    0x34c8934 E:\ghc_dwarf\inplace\bin\ghc-stage2.exe+0x30c8934
 * 0x845dac0    0xfa340
 * 0x845dac8    0x2a940b78
 * 0x845dad0    0x2a98cd69
 * 0x845dad8    0x2a98d7d0

CallStack (from HasCallStack):
  die', called at .\\Distribution\\Client\\ProjectOrchestration.hs:977:55 in main:Distribution.Client.ProjectOrchestration
cabal.exe: Failed to build aeson- The build process terminated
with exit code 11

I could only reproduce it with master on Windows so far. It always triggers but under very specific circumstances:

  • GHC built with the flags above, adding dwarf info to the ghc executable or removing dwarf info eliminates the issue.
  • Only on a complete rebuild of aeson. Restarting the crashed build finishes without an error.

Change History (13)

comment:1 Changed 11 months ago by AndreasK

The error happens during linking (for TH I assume):

The exact moment it happens differs when the memory usage of the compiler changes. If I pass different -H values, different verbosity flags and so on it either fails earlier, later or not at all.

When it fails it's always during linking. So I assume there is some issue that arises when GC kicks in during linking, where we then access a pointer that hasn't been updated properly.

!!! ByteCodeGen [Ghci1]: finished in 0.00 milliseconds, allocated 0.049 megabytes
*** gcc:
"E:\ghc_dwarf\inplace\lib\../mingw/bin/gcc.exe" "-fno-stack-protector" "-DTABLES_NEXT_TO_CODE" "--print-search-dirs"
Loading package ghc-prim-0.5.3 ... linking ... done.
Loading package integer-gmp- ... linking ... done.
Loading package base- ... linking ... done.
Loading package array- ... linking ... done.
Loading package deepseq- ... linking ... done.
Loading package transformers- ... linking ... done.
Loading package primitive- ... linking ... done.
Loading package vector- ... linking ... done.
Loading package bytestring- ... linking ... done.
Loading package containers- ... linking ... done.
Loading package binary- ... linking ... done.
Loading package text- ... linking ... done.
Loading package hashable- ... linking ... done.
Loading package filepath- ... linking ... done.
Loading package Win32- ... linking ... done.
Loading package time- ... linking ... done.
Loading package random-1.1 ... linking ... done.
Loading package uuid-types-1.0.3 ... linking ... done.
Loading package unordered-containers- ... linking ... done.
Loading package time-locale-compat- ... linking ... done.
Loading package ghc-boot-th-8.7 ... linking ... done.
Loading package pretty- ... linking ... done.
Loading package template-haskell- ... linking ... done.
Loading package th-abstraction- ... linking ... done.
Loading package tagged-0.8.6 ... linking ... done.
Loading package dlist- ... linking ... done.
Loading package base-compat-0.10.5 ... linking ... done.
Loading package integer-logarithms- ... linking ... done.
Loading package scientific- ... linking ... done.
Loading package attoparsec- ... linking ... done.
Search directories (user):
Search directories (gcc):
Loading object (static archive) E:/ghc_dwarf/inplace/mingw/bin/../lib/gcc/x86_64-w64-mingw32/7.2.0/../../../../x86_64-w64-mingw32/lib/../lib/libpthread.dll.a ... done
final link ... done

Access violation in generated code when executing data at 0xffffffff800ba2c8

 Attempting to reconstruct a stack trace...

   Frame        Code address
 * 0x845dae0    0xffffffff800ba2c8

Another variant:

!!! CorePrep [Data.Aeson.Encoding]: finished in 0.00 milliseconds, allocated 0.061 megabytes
*** Stg2Stg:
*** CodeGen [Data.Aeson.Encoding]:
!!! CodeGen [Data.Aeson.Encoding]: finished in 0.00 milliseconds, allocated 0.514 megabytes
*** Assembler:
"E:\ghc_dwarf\inplace\lib\../mingw/bin/gcc.exe" "-fno-stack-protector" "-DTABLES_NEXT_TO_CODE" "-iquote.\Data\Aeson" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\autogen" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\global-autogen" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build" "-Iinclude" "-IC:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\include" "-no-pie" "-x" "assembler" "-c" "C:\ghc\msys64\tmp\ghc173148_0\ghc_72.s" "-o" "C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Encoding.o"
*** Deleting temp files:
Deleting: C:\ghc\msys64\tmp\ghc173148_0\ghc_71.s C:\ghc\msys64\tmp\ghc173148_0\ghc_72.s C:\ghc\msys64\tmp\ghc173148_0\ghc_73.c
Warning: deleting non-existent C:\ghc\msys64\tmp\ghc173148_0\ghc_71.s
Warning: deleting non-existent C:\ghc\msys64\tmp\ghc173148_0\ghc_73.c
compile: input file C:\ghc\msys64\tmp\ghc173148_0\ghc_15.hscpp
*** Checking old interface for Data.Aeson.Types.ToJSON (use -ddump-hi-diffs for more details):
[18 of 25] Compiling Data.Aeson.Types.ToJSON ( Data\Aeson\Types\ToJSON.hs, C:\ghc\msys64\home\Andi\aeson_repro\dist-newstyle\build\x86_64-windows\ghc-8.7.20181025\aeson-\build\Data\Aeson\Types\ToJSON.o )
*** Parser [Data.Aeson.Types.ToJSON]:
!!! Parser [Data.Aeson.Types.ToJSON]: finished in 15.63 milliseconds, allocated 109.276 megabytes
*** Renamer/typechecker [Data.Aeson.Types.ToJSON]:
*** Simplify [expr]:
!!! Simplify [expr]: finished in 0.00 milliseconds, allocated 0.554 megabytes
*** CorePrep [expr]:
!!! CorePrep [expr]: finished in 0.00 milliseconds, allocated 0.011 megabytes
*** ByteCodeGen [Ghci1]:
!!! ByteCodeGen [Ghci1]: finished in 0.00 milliseconds, allocated 0.049 megabytes
*** gcc:
"E:\ghc_dwarf\inplace\lib\../mingw/bin/gcc.exe" "-fno-stack-protector" "-DTABLES_NEXT_TO_CODE" "--print-search-dirs"
Loading package ghc-prim-0.5.3 ... linking ... done.
Loading package integer-gmp- ... linking ... done.
Loading package base- ... linking ...
Access violation in generated code when executing data at 0x103fec440

 Attempting to reconstruct a stack trace...

   Frame        Code address
 * 0x845d9c0    0x103fec440
 * 0x845da20    0x400c0f8 E:\ghc_dwarf\inplace\bin\ghc-stage2.crash.exe+0x3c0c0f8
 * 0x845da80    0x3fec9a1 E:\ghc_dwarf\inplace\bin\ghc-stage2.crash.exe+0x3bec9a1
 * 0x845dab0    0x3feca31 E:\ghc_dwarf\inplace\bin\ghc-stage2.crash.exe+0x3beca31
 * 0x845dab8    0x34c8934 E:\ghc_dwarf\inplace\bin\ghc-stage2.crash.exe+0x30c8934
 * 0x845dac0    0x1
 * 0x845dac8    0xa5c2030
 * 0x845dad0    0xc
 * 0x845dad8    0x231c3190

comment:2 Changed 11 months ago by AndreasK

I've traced it back to ocRunInit_PEi386 so far.

There in the call to (*init) we trigger an access exception by calling into the target address.

I've seen some changes in related linker code were made recently. I will try to bisect that.

ocRunInit_PEi386 ( ObjectCode *oc )
  if (!oc || !oc->info || !oc->info->init) {
    return true;

  int argc, envc;
  char **argv, **envv;

  getProgArgv(&argc, &argv);
  getProgEnvv(&envc, &envv);

  Section section = *oc->info->init;

  uint8_t *init_startC = section.start;
  init_t *init_start   = (init_t*)init_startC;
  init_t *init_end     = (init_t*)(init_startC + section.size);

  // ctors are run *backwards*!
  for (init_t *init = init_end - 1; init >= init_start; init--)
      (*init)(argc, argv, envv);

  freeProgEnvv(envc, envv);
  releaseOcInfo (oc);
  return true;

comment:3 Changed 11 months ago by AndreasK

I've spent some quality time with gdb the last few days and documenting progress here:

  • aeson uses TH which via the GHCi machinery links in base (among other things).
  • During linking of base we find initialization code in base originating from foreign exports
  • The initialization code contains relocations.
  • In particular we want to jump to foreignExportStablePtr at the end of this function.
  • But something during relocation goes wrong and we jump to a wrong address.

comment:4 Changed 11 months ago by AndreasK

Architecture: Unknown/Multiplex86_64 (amd64)
Component: CompilerCompiler (Linking)
Description: modified (diff)
Operating System: Unknown/MultipleWindows
Priority: normalhigh
Summary: Master sefaults on windows during aeson build when stage2 libs have dwarf enabled.Loading libraries with FFI exports may cause segfaults in the compiler if they are loaded far from the rts in memory.
Type of failure: None/UnknownCompile-time crash or panic

Marking as high as we should try to get this into 8.8.

comment:5 Changed 11 months ago by bgamari

I recently noticed that e019ec94f12268dd92ea5d5204e9e57e7ebf10ca broke i386 (and is now reverted). This might be a good commit to check.

comment:6 Changed 11 months ago by AndreasK

The bug was already present in a commit I tried from the second half of September. So sadly a different issue.

comment:7 Changed 11 months ago by AndreasK

Owner: set to AndreasK

I have an idea for an fix, if it works out I will put up a patch shortly.

We should be able to just use the large memory model for the stubs and things should (mostly) work out. Or we run into other strange bugs on the way :)

Last edited 11 months ago by AndreasK (previous) (diff)

comment:8 Changed 11 months ago by AndreasK

Owner: AndreasK deleted
Priority: highnormal

Fixing this won't be easy, so for the moment I will leave it as is.

The main issue is that we try to jump from a loaded library to the rts.

Now we have the problem that we end up with a memory layout of something like this: <compiler+rts> <... 2+GB of things ...> <libraries loaded for TH>

Fixing the C stubs was reasonably easy. We can just tell gcc to use the large memory model making jumps 64bit sized. (Passing -mcmodel=large to gcc when compiling them).

What I had failed to consider is that we also jump into the rts from regular haskell code. Now making sure all THIS code works well with a 64 bit address space sounds bad enough. But it also comes with performance costs as we increase code size. Given, probably not a big one but still.

So even IF we fix this we would incurr a performance penalty for pretty much ALL haskell programs out there. So this seems like it's not worth it at the moment unless there are somehow a lot of "random" crashes out there which can be traced back to this issue.

For what it's worth we support doing the required things already when using -fPIC on Linux so it might not be THAT much effort. But I'm not going to be the one to do it for the foreseeable future.

Last edited 11 months ago by AndreasK (previous) (diff)

comment:9 Changed 11 months ago by AndreasK

If anyone ever wants to pick this up since 8.6 there is also -fexternal-dynamic-refs which goes most of the way already. Currently building GHC with the flag fails with linker errors. But it seems like a good starting point.

comment:10 Changed 10 months ago by AndreasK

As it came up in a discussion:

The overflow check present in the RTS fails for a value of -2287728808.

comment:11 Changed 10 months ago by AndreasK

The code causing the failure is at rts/linker/PEi386.c:1960 where the overflow check is not correctly implemented.

comment:13 Changed 9 months ago by AndreasK

Note: See TracTickets for help on using tickets.