Opened 3 years ago

Last modified 6 months ago

#13786 new bug

GHCi linker is dependent upon object file order

Reported by: ppelleti Owned by:
Priority: high Milestone: 8.8.1
Component: Runtime System (Linker) Version: 8.0.2
Keywords: Cc: angerman, simonmar, rwbarton
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time crash or panic Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Using this package, I tried doing "stack repl" with GHC 8.0.2:

whiteandnerdy:hs-mercury-api ppelleti$ stack repl
The following GHC options are incompatible with GHCi and have not been passed to it: -threaded

* * * * * * * *
The main module to load is ambiguous. Candidates are: 
1. Package `mercury-api' component exe:tmr-firmware with main-is file: /Users/ppelleti/programming/haskell/hs-mercury-api/examples/tmr-firmware.hs
2. Package `mercury-api' component exe:tmr-gpio with main-is file: /Users/ppelleti/programming/haskell/hs-mercury-api/examples/tmr-gpio.hs
3. Package `mercury-api' component exe:tmr-lock with main-is file: /Users/ppelleti/programming/haskell/hs-mercury-api/examples/tmr-lock.hs
4. Package `mercury-api' component exe:tmr-params with main-is file: /Users/ppelleti/programming/haskell/hs-mercury-api/examples/tmr-params.hs
5. Package `mercury-api' component exe:tmr-read with main-is file: /Users/ppelleti/programming/haskell/hs-mercury-api/examples/tmr-read.hs
6. Package `mercury-api' component exe:tmr-write with main-is file: /Users/ppelleti/programming/haskell/hs-mercury-api/examples/tmr-write.hs
You can specify which one to pick by: 
 * Specifying targets to stack ghci e.g. stack ghci mercury-api:exe:tmr-firmware
 * Specifying what the main is e.g. stack ghci --main-is mercury-api:exe:tmr-firmware
 * Choosing from the candidate above [1..6]
* * * * * * * *

Specify main module to use (press enter to load none): 4
Loading main module from cadidate 4, --main-is /Users/ppelleti/programming/haskell/hs-mercury-api/examples/tmr-params.hs

Configuring GHCi with the following packages: mercury-api
GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
ghc: panic! (the 'impossible' happened)
  (GHC version 8.0.2 for x86_64-apple-darwin):
	Loading temp shared object failed: dlopen(/var/folders/d1/v9ptqpx12mdcxj77509440rc0000gn/T/ghc91859_0/libghc_5.dylib, 5): Symbol not found: _TMR_SR_cmdStopReading
  Referenced from: /var/folders/d1/v9ptqpx12mdcxj77509440rc0000gn/T/ghc91859_0/libghc_5.dylib
  Expected in: flat namespace
 in /var/folders/d1/v9ptqpx12mdcxj77509440rc0000gn/T/ghc91859_0/libghc_5.dylib

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

This bug appears to have been around a while, because it also happens with GHC 7.8.3:

whiteandnerdy:hs-mercury-api ppelleti$ cabal repl
Preprocessing library mercury-api-0.1.0.0...
GHCi, version 7.8.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package array-0.5.0.0 ... linking ... done.
Loading package deepseq-1.3.0.2 ... linking ... done.
Loading package bytestring-0.10.4.0 ... linking ... done.
Loading package containers-0.5.5.1 ... linking ... done.
Loading package binary-0.7.1.0 ... linking ... done.
Loading package text-1.2.2.1 ... linking ... done.
Loading package hashable-1.2.6.0 ... linking ... done.
Loading package unordered-containers-0.2.8.0 ... linking ... done.
Loading package clock-0.7.2 ... linking ... done.
Loading package old-locale-1.0.0.6 ... linking ... done.
Loading package time-1.4.2 ... linking ... done.
Loading package unix-2.7.0.1 ... linking ... done.
Loading package ansi-terminal-0.6.2.3 ... linking ... done.
Loading object (static) dist/build/cbits/api/tmr_strerror.o ... done
Loading object (static) dist/build/cbits/api/tmr_param.o ... done
Loading object (static) dist/build/cbits/api/hex_bytes.o ... done
Loading object (static) dist/build/cbits/api/tm_reader.o ... ghc: panic! (the 'impossible' happened)
  (GHC version 7.8.3 for x86_64-apple-darwin):
	Loading temp shared object failed: dlopen(/var/folders/d1/v9ptqpx12mdcxj77509440rc0000gn/T/ghc91576_0/ghc91576_4.dylib, 9): Symbol not found: _TMR_SR_SerialTransportNativeInit
  Referenced from: /var/folders/d1/v9ptqpx12mdcxj77509440rc0000gn/T/ghc91576_0/ghc91576_4.dylib
  Expected in: flat namespace
 in /var/folders/d1/v9ptqpx12mdcxj77509440rc0000gn/T/ghc91576_0/ghc91576_4.dylib

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

I'm using Mac OS X 10.9.5:

whiteandnerdy:hs-mercury-api ppelleti$ uname -a
Darwin whiteandnerdy.lan 13.4.0 Darwin Kernel Version 13.4.0: Mon Jan 11 18:17:34 PST 2016; root:xnu-2422.115.15~1/RELEASE_X86_64 x86_64

Change History (16)

comment:1 Changed 3 years ago by RyanGlScott

Architecture: x86_64 (amd64)Unknown/Multiple
Operating System: MacOS XUnknown/Multiple

This is also reproducible on Linux:

$ cabal repl
Preprocessing library mercury-api-0.1.0.0...
GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
ghc: panic! (the 'impossible' happened)
  (GHC version 8.0.2 for x86_64-unknown-linux):
        Loading temp shared object failed: /tmp/ghc25635_0/libghc_7.so: undefined symbol: TMR_SR_SerialTransportNativeInit

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

comment:2 Changed 3 years ago by RyanGlScott

Summary: GHC panic on Mac OS X with "cabal repl" / "stack repl" on GHC 8.0.2 and 7.8.3GHC panic with "cabal repl" / "stack repl" on GHC 8.0.2 and 7.8.3

comment:3 Changed 3 years ago by bgamari

Did this package ever work? It looks to me like the object files aren't being passed to ghc in dependency-order, hence the link errors.

comment:4 Changed 3 years ago by bgamari

For the record, I played around with the GHC command line for a bit and got close to getting it to build with the following link order,

api/tmr_strerror.o
api/tmr_param.o
api/hex_bytes.o
api/serial_transport_posix.o
api/serial_reader_l3.o
api/serial_reader.o
api/tm_reader_async.o
api/tm_reader.o
api/tmr_utils.o
glue/glue.o
api/osdep_posix.o

Unfortunately this also doesn't quite work since isSecureAccessEnabled is defined in serial_reader but used in both serial_reader_l3 and serial_reader, whereas the latter appears to have dependencies on serial_reader_l3, creating a circular dependency.

comment:5 Changed 3 years ago by RyanGlScott

Since you seem to know more about what's going on here than I do (and because there are several other tickets of this flavor that I'd like to characterize), can you explain what you mean by "dependency-order"? And why that would make a difference as to whether runtime linking would succeed (but apparently not compilation)?

comment:6 Changed 3 years ago by ppelleti

The package works fine when compiled (i. e. "cabal build" or "stack build"), but this was my first time trying it with GHCi (i. e. "cabal repl" / "stack repl"). I hadn't known the object files needed to be passed in dependency order.

Perhaps the error message could be improved, to indicate it is an error on the user's part, rather than a GHC bug?

comment:7 Changed 3 years ago by ppelleti

I fixed the circular dependency for isSecureAccessEnabled, and got this order to work:

  c-sources:           cbits/api/serial_transport_posix.c
                     , cbits/api/osdep_posix.c
                     , cbits/api/tmr_strerror.c
                     , cbits/api/tmr_utils.c
                     , cbits/api/tmr_param.c
                     , cbits/api/hex_bytes.c
                     , cbits/api/serial_reader_l3.c
                     , cbits/api/serial_reader.c
                     , cbits/api/tm_reader_async.c
                     , cbits/api/tm_reader.c
                     , cbits/glue/glue.c

However, the unpleasant part is that Cabal seems to always put the conditional c-sources after the unconditional c-sources, even if I put the conditional c-sources first in the Cabal file. So, the only workaround seems to be to list all the sources twice: once for POSIX and once for Windows.

comment:8 Changed 3 years ago by bgamari

Cc: angerman Jaffacake rwbarton added
Component: GHCiRuntime System (Linker)

I actually wasn't quite correct. ld requires that archives be provided on the command line in topological order; that is if libA refers to libB, they must be provided in the order of libB libA (alternatively you can explicitly flag a group of archives as a recursive group using the --start-group/--end-group flags).

However, it turns out that there is no such requirement of object files: it seems that you can give them in any order and ld will link them without any trouble. Unfortunately, this isn't true of GHCi's linker. We sequentially load one object file after another, failing if we encounter any undefined reference. Perhaps what we should instead do is load the objects as a group, first slurping in the symbol tables of each of them, and only afterwards try to resolve references. This will add some complexity, but will ensure that we offer the same ordering guarantees that ld currently provides.

comment:9 Changed 3 years ago by bgamari

Summary: GHC panic with "cabal repl" / "stack repl" on GHC 8.0.2 and 7.8.3GHCi linker is dependent upon object file order

comment:10 Changed 3 years ago by Phyx-

The complexity shouldn't be all that much. Just delay the calls to ocResolve until all ocGetNames is done should be enough. There's no real reason that we have to call the three stages sequentially off the top of my head.

comment:11 Changed 3 years ago by bgamari

Actually, I hadn't previously noticed that the error isn't from the runtime linker at all. Rather, it is from dlopen. It seems that we are linking each individual object file into a separate shared library and dlopening them individually. In our defense, we do add the previously loaded objects as dependencies, but this nevertheless precludes resolution of cycles.

Why do we do this? Well, I don't know, but it sure seems silly. Presumably we should at very least compile and link all of the C sources into a single dynlib.

Last edited 3 years ago by bgamari (previous) (diff)

comment:12 Changed 17 months ago by recursion-ninja

Can we prioritize fixing this so I can use GHCi again. It's been years since I could use it with our project at work and I really miss the ability to test functions in the REPL environment rather than recompiling with Debug.Trace statements inserted.

comment:13 Changed 17 months ago by bgamari

Cc: simonmar added; Jaffacake removed
Milestone: 8.8.1
Priority: normalhigh

We can try.

comment:14 Changed 12 months ago by recursion-ninja

Wondering if there's been any progress on this issue. The ability to use GHCi again would be much appreciated!

As you can see from the date opened on this ticket, we have not been able to use GHCI for well over a year: https://ghc.haskell.org/trac/ghc/ticket/14713

Last edited 10 months ago by recursion-ninja (previous) (diff)

comment:15 Changed 6 months ago by Marge Bot <ben+marge-bot@…>

In ade3db53/ghc:

testsuite: Test for #13786

comment:16 Changed 6 months ago by Marge Bot <ben+marge-bot@…>

In 5a502cd1/ghc:

ghci: Load static objects in batches

Previously in the case where GHC was dynamically linked we would load
static objects one-by-one by linking each into its own shared object and
dlopen'ing each in order. However, this meant that the link would fail
in the event that the objects had cyclic symbol dependencies.

Here we fix this by merging each "run" of static objects into a single
shared object and loading this.

Fixes #13786 for the case where GHC is dynamically linked.
Note: See TracTickets for help on using tickets.