Opened 5 years ago

Last modified 3 years ago

#9498 new feature request

GHC links against unversioned .so files

Reported by: Kritzefitz Owned by:
Priority: normal Milestone:
Component: Compiler (Linking) Version: 7.6.3
Keywords: Debian Cc: trommler, simonmar
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Other Test Case:
Blocked By: #11238 Blocking: #9237
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Greetings,

GHC tries to load unversioned dynamic libraries instead of versioned (i.e. libfoo.so instead of libfoo.so.1.2.3). This is a problem, since Distributions like Debian (I don't know about other distributions) don't include unversioned .SOs in their runtime packages and the unversioned are only available in the -dev packages as a symlink to the verioned ones. This means FFI libraries depend on the -dev packages, even though they don't really need them. It would be nice if GHC would try to load the versioned libraries as well.

Regards Sven

Change History (28)

comment:1 Changed 5 years ago by thomie

On Ubuntu 14.04, libbsd.so.0 (unclear whether versioned or unversioned) is a symbolic link to the file libbsd.so.0.6.0 (versioned). Both this symbolic link as the library file are created by the libbsd0 (runtime) package.

$ ll /lib/x86_64-linux-gnu/libbsd.so.0
... /lib/x86_64-linux-gnu/libbsd.so.0 -> libbsd.so.0.6.0

$ dpkg -S /lib/x86_64-linux-gnu/libbsd.so.0
libbsd0:amd64: /lib/x86_64-linux-gnu/libbsd.so.

$ dpkg -S /lib/x86_64-linux-gnu/libbsd.so.0.6.0
libbsd0:amd64: /lib/x86_64-linux-gnu/libbsd.so.0.6.0

There is another symbolic link to libbsd.so.0.6.0, from libbsd.so (definitely unversioned, this may have been the link you were referring to), which is created by the libbsd-dev (dev) package.

$ ll /usr/lib/x86_64-linux-gnu/libbsd.so
... /usr/lib/x86_64-linux-gnu/libbsd.so -> /lib/x86_64-linux-gnu/libbsd.so.0.6.0

$ dpkg -S /usr/lib/x86_64-linux-gnu/libbsd.so
libbsd-dev: /usr/lib/x86_64-linux-gnu/libbsd.so

Here is a small program (called testbsd.hs) that calls a function in the libbsd library via the foreign function interface:

module Main where

import Foreign.C.Types

foreign import ccall "arc4random"
  version :: CUInt

main = print version

GHC links this program to libbsd.so.0 from the runtime package, NOT to the libbsd.so from the dev package.

$ ghc --make -lbsd testbsd.hs
[1 of 1] Compiling Main             ( testbsd.hs, testbsd.o )
Linking testbsd ...

$ ldd testbsd
        ...
	libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007faef5c2c000)
	...

$ ./testbsd
1131902945
Last edited 4 years ago by thomie (previous) (diff)

comment:2 Changed 5 years ago by thomie

Kritzefitz: thank you once again for the report. My first version of the above text was wrong. This is the corrected version.

I don't understand the problem you're proposal is trying to solve. Could you please give an example of the situation on your system?

comment:3 Changed 5 years ago by Kritzefitz

I will demonstrate with the curl package:

Assuming I have installed the libcurl3 package but not the libcurl-dev package. This means I have the following shared libraries belonging to curl installed: /usr/lib/i386-linux-gnu/libcurl-gnutls.so.3 /usr/lib/i386-linux-gnu/libcurl.so.3

(I actually have some curl version 4 libraries installed, but those shouldn't matter now.)

Then I start ghci and try to use curl:

$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> import Network.Curl
Prelude Network.Curl> curlGet "http://weltraumschlangen.de/" []
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package bytestring-0.10.0.2 ... linking ... done.
Loading package containers-0.5.0.0 ... linking ... done.
Loading package curl-1.3.8 ... can't load .so/.DLL for: libcurl.so (libcurl.so: cannot open shared object file: No such file or directory)
Prelude Network.Curl>

So it seems it tries to load libcurl.so, which is contained in libcurl-dev, instead of libcurl.so.3. It's not working when compiling a file either:

test.hs:

import Network.Curl

main = curlGet "http://weltraumschlangen.de/" []

compiling:

$ ghc --make test.hs 
[1 of 1] Compiling Main             ( test.hs, test.o )
Linking test ...
/usr/bin/ld: cannot find -lcurl
collect2: error: ld returned 1 exit status

comment:4 Changed 4 years ago by thomie

Status: newinfoneeded

Kritzefitz: thank you for your response.

I don't think there isn't anything for GHC to do here, and I'm inclined to close as invalid. What is your opinion?

Here's my understanding of what's happening. I'm keeping this linux only (sudo apt-get install libcurl4-openssl-dev and cabal install curl first):

  • The .cabal file of the curl package specifies extra-libraries: curl
  • After cabal installing curl, this stanza ends up in curl's ghc package file (check with ghc-pkg describe curl | grep extra-libraries)
  • When you make test.hs, ghc figures out that curl is required. It parses curl's ghc package file, and extracts the extra-libraries field. When it's time for linking, it then passes -lcurl to the linker.
  • Since the linker wasn't told which version of curl it should find, it searches for a file libcurl.so in any of the system lib directories. On Debian/Ubuntu, as you mentioned, this unversioned file (symbolic link) is only present after installing a curl -dev package (1).
  • The linker then follows the symbolic link, and produces a binary that is linked to a versioned .so file (libcurl.so.4 in my case). This is good, because that means that users of the binary will not need the -dev package for curl to be installed, but just the runtime package will do.

The linker can be also be told to look for a specific file, by using -l:filename instead of -lname (see man ld and search for --library=). This filename is necessarily platform specific (e.g. dll on Windows), but cabal and ghc support this format in principal (though there seems to be a bug when doing this via ghci).

Here's what I've tried:

  • cabal get curl && cd curl-1.3.8
  • # edit curl.cabal and change extra-libraries: curl to extra-libraries: :libcurl.so.4
  • cabal install
  • sudo apt-get remove libcurl4-openssl-dev
  • # verify that libcurl.so is no longer present
  • # change to directory with your test.hs
  • ghc test.hs

It works!

Responding to your issue, either:

  1. Debian's should change its policy (1)
  2. or the curl library should be updated
  3. or users of the curl library should just install the curl -dev package (they are developers after all, not end users).

(1) https://www.debian.org/doc/debian-policy/ch-sharedlibs.html#s-sharedlibs-dev

Edit: the colon in front of :libcurl.so.4 is important.

Last edited 4 years ago by thomie (previous) (diff)

comment:5 Changed 4 years ago by nomeata

Status: infoneedednew

It is right that the cabal file specifies only curl in the cabal file, after all, no specific version is required here.

But after GHC has produced a .so file, this .so file, as you say, is tied to a specific version of the curl library. So the correct thing to do would be to store the precisely used version in the package data base.

This way, building the curl bindings requires the libcurl-dev package, but using the curl bindings, e.g. when building a Haskell executable or from ghci, would only require the specific runtime package. This would actually make packaging Haskell for Debian a bit simpler...

comment:6 Changed 4 years ago by nomeata

(But maybe this means that this is actually a Cabal bug?)

comment:7 Changed 4 years ago by thomie

You're right!

curl's cabal file would still contain extra-libraries: curl. But when installing curl, whoever decides the parameters for the ghc-pkg file (either cabal or ghc, I don't know at the moment) could ask the linker which version it actually found, and write extra-libraries: :libcurl.so.<version> to the ghc-pkg file.

That should work (at least on linux).

comment:8 in reply to:  7 Changed 4 years ago by trommler

Owner: set to trommler

Replying to thomie:

You're right!

curl's cabal file would still contain extra-libraries: curl. But when installing curl, whoever decides the parameters for the ghc-pkg file (either cabal or ghc, I don't know at the moment) could ask the linker which version it actually found, and write extra-libraries: :libcurl.so.<version> to the ghc-pkg file.

I think it is sufficient for GHCi to link against libHScurl<something>.so (the Haskell library) and omit -lcurl (the C library) from the ld command altogether. This would make the linking of dynamic libraries way more efficient as fewer shared libraries need to be opened and fewer symbol tables need to be read.

See p 41 of Ulrich Drepper's "How To Write Shared Libraries" http://www.akkadia.org/drepper/dsohowto.pdf.

I am going to look into this.

comment:9 Changed 4 years ago by rwbarton

That means we need to record the fact that -lcurl is necessary when linking against the static Haskell library but not when linking against the dynamic Haskell library, right? Or can we assume that this is always the case?

comment:10 in reply to:  9 Changed 4 years ago by trommler

Owner: trommler deleted

Replying to rwbarton:

That means we need to record the fact that -lcurl is necessary when linking against the static Haskell library but not when linking against the dynamic Haskell library, right?

You are right, the dynamic library is linked against the versioned C library and no extra -lcurl is required. At build time we need the development package for libcurl so the link editor (ld) can link the dynamic Haskell library. To use the dynamic Haskell library the unversioned library is not needed anymore.

To load a static library we use the RTS linker and the RTS linker needs to load dependent C libraries explicitly. Unfortunately we cannot assume that the C library with a specific version is installed. So the RTS linker looks for the unversioned library and tries to load that.

So yes, the unversioned C library is currently necessary to load a static Haskell library but not needed to load a dynamic Haskell library.

Perhaps we could make cabal record the version information for C libraries in the package database as suggested in comment:8 and change the RTS linker to use that information?

I don't have time to work on this now, so I am disowning the ticket.

comment:11 Changed 4 years ago by nomeata

To load a static library we use the RTS linker and the RTS linker needs to load dependent C libraries explicitly. Unfortunately we cannot assume that the C library with a specific version is installed. So the RTS linker looks for the unversioned library and tries to load that.

BTW, why is that? Is it a valid and supported use case to build against one version and run against another version of the library?

comment:12 Changed 4 years ago by ezyang

Component: Compiler (FFI)Compiler (Linking)

comment:13 Changed 4 years ago by rwbarton

Priority: lownormal

comment:14 Changed 4 years ago by trommler

Cc: trommler added

comment:15 in reply to:  11 Changed 4 years ago by trommler

Replying to nomeata:

To load a static library we use the RTS linker and the RTS linker needs to load dependent C libraries explicitly. Unfortunately we cannot assume that the C library with a specific version is installed. So the RTS linker looks for the unversioned library and tries to load that.

BTW, why is that? Is it a valid and supported use case to build against one version and run against another version of the library?

You don't have to recompile for say a security fix in a shared library.

The official GHC binaries are build against a certain version of the standard C library (libc) but generally work with a newer libc as long as it is binary compatible.

comment:16 Changed 4 years ago by trommler

Cc: simonmar added

How about this for a fix in the static case:

  1. Prepare a dummy shared library libHSC<package>.so at build time where we use ld to deal with finding the right shared libraries. If the package depends on more than one C library still only one dummy shared library needs to be created.
  2. Install the dummy shared library with the static library libHS<package>.a
  3. Teach RTS linker to load the dummy shared library to satisfy C dependencies.

I can look into this after #10458.

CC'ing @simonmar

comment:17 Changed 4 years ago by trommler

Blocked By: 10458 added

The dynamic case will be fixed by #10458.

comment:18 Changed 4 years ago by trommler

Blocking: 9237 added

comment:19 Changed 4 years ago by simonmar

All this scares me a lot. Why does the package database need to record the versioned .so dependency? It is not required, either for linking against the dynamic library (because it is already recorded as a dependency in the .so) or against the static library (because we use -lfoo).

We require the -dev versions of libraries for GHCi very deliberately, because GHCi is supposed to mimic the linking that would be done for a standalone executable.

comment:20 in reply to:  19 Changed 4 years ago by trommler

Replying to simonmar:

We require the -dev versions of libraries for GHCi very deliberately, because GHCi is supposed to mimic the linking that would be done for a standalone executable.

OK. So the in the dynamic case GHCi SHOULD NOT (MUST NOT?) require the presence of the development versions of C libraries when using an installed Haskell package. This is currently not the case.

comment:21 Changed 4 years ago by simonmar

OK. So the in the dynamic case GHCi SHOULD NOT (MUST NOT?) require the presence of the development versions of C libraries when using an installed Haskell package. This is currently not the case.

I think we could make it so that GHCi, if using dynamic libraries, doesn't require the -dev version of C library dependencies, but what's the gain from doing that? GHC would still require it when not using -dynamic, so you wouldn't be able to omit the dependency on -dev in the distro packages, for example.

comment:22 in reply to:  21 Changed 4 years ago by trommler

Replying to simonmar:

OK. So the in the dynamic case GHCi SHOULD NOT (MUST NOT?) require the presence of the development versions of C libraries when using an installed Haskell package. This is currently not the case.

I think we could make it so that GHCi, if using dynamic libraries, doesn't require the -dev version of C library dependencies, but what's the gain from doing that? GHC would still require it when not using -dynamic, so you wouldn't be able to omit the dependency on -dev in the distro packages, for example.

No, if we implemented my proposal in comment:16 we would not need the dependency on C library -dev packages anymore. The -dev package would only be a build requirement.

This makes a difference on openSUSE's build service where we build each package in a separate virtual machine and install only packages that are actually needed for the current build. So when we use the such a package in another build we would not have to install the C -dev packages and some of those can be fairly large.

Moreover, we could get rid of the code chasing C libraries in Linker.hs and remove the code that parses linker scripts in rts/Linker.c. #9237 fails because the parsing of linker scripts is incomplete. #10046 might be an issue with linker scripts, too.

comment:23 Changed 4 years ago by simonmar

No, if we implemented my proposal in comment:16 we would not need the dependency on C library -dev packages anymore. The -dev package would only be a build requirement.

I think maybe we're talking about different things, let me try to clarify.

  • If you install a Haskell package to be used with GHC, you still need the -dev version of any C library dependencies. That's independent of your proposal in comment:16, because we need the -dev libraries when building standalone executables.
  • If you install a package because it is a runtime dependency of an executable or another library, then you don't need the -dev version of C dependencies. So you could, if you wanted, split compiled Haskell packages into dev and non-dev distributions.
  • The case you seem to be referring to is somewhere between these two, where you're only using a package in GHCi, or only via the GHC API. In those cases you could avoid needing the -dev dependency, but I'm not sure how you would distinguish this kind of dependency. How can you tell that the user isn't going to try to use GHC?

comment:24 in reply to:  23 Changed 4 years ago by trommler

Replying to simonmar:

I think maybe we're talking about different things, let me try to clarify.

  • If you install a Haskell package to be used with GHC, you still need the -dev version of any C library dependencies. That's independent of your proposal in comment:16, because we need the -dev libraries when building standalone executables.

Is the C library going to be a static (archive) or da dynamic (shared object) library?

If it is the latter then I think it would still work with comment:16 and if we pass a linker flag (something along the lines of copy all DT_NEEDED tags for C libraries from the dummy SO into the executable) the performance impact would be small.

But in the completely static case you are right, the dev dependency is always required.

comment:25 Changed 4 years ago by simonmar

Is the C library going to be a static (archive) or da dynamic (shared object) library?

Usually both, so that you can use -optl-static if you want. The -dev package of a C library normally includes both the unversioned .so and the .a.

Remember that GHC links static Haskell packages but dynamic C libraries by default.

If it is the latter then I think it would still work with comment:16 and if we pass a linker flag (something along the lines of copy all DT_NEEDED tags for C libraries from the dummy SO into the executable) the performance impact would be small.

Did you know the code in Linker.lhs is only used for GHCi and not linking of standalone executables? I have a suspicion that there's some confusion here. There is no "dummy SO" when linking a standalone executable.

Since you have a design in mind, maybe it would be good to flesh it out in a wiki page so we can discuss with more precision?

comment:26 in reply to:  25 Changed 4 years ago by trommler

Replying to simonmar:

Is the C library going to be a static (archive) or a dynamic (shared object) library?

Usually both, so that you can use -optl-static if you want. The -dev package of a C library normally includes both the unversioned .so and the .a.

Remember that GHC links static Haskell packages but dynamic C libraries by default.

Static Haskell with dynamic C is the issue that I want to address in comment:16

[...]

Did you know the code in Linker.lhs is only used for GHCi and not linking of standalone executables? I have a suspicion that there's some confusion here. There is no "dummy SO" when linking a standalone executable.

Sorry, this is indeed confusing. This dummy SO is not the same as the dummy SO in #10458. Let me call the #11458 dummy SO "GHCi dummy SO" and the one in comment:16 "package dummy SO". The package dummy SO belongs to a Haskell package and is installed next to the static (libHS*.a) and the dynamic (libHS*.so) Haskell library.

Then the package dummy SO would be used in the static Haskell dynamic C case in GHCi only. We could do this to help user's that are in the situation described in this ticket. On the other hand we could include the essence of your comment:19 "GHCi works like ld" in the user's guide (Section 2.6.2?). Perhaps we can discuss this in LinkingHaskell

When linking a static executable a static executable then all C dev dependencies are required. That is no different from linking a C executable.

Since you have a design in mind, maybe it would be good to flesh it out in a wiki page so we can discuss with more precision?

I created LinkingHaskell

comment:27 Changed 4 years ago by trommler

Blocked By: 11238 added; 10458 removed

It was decided to not risk merging a first version of the redesign of dynamic linking for 8.0.1 and so the fix for #10458 is not going to fix the dynamic case in this ticket. #11238 is the ticket for a redesign of dynamic linking. Changing BlockedBy accordingly.

comment:28 Changed 3 years ago by nomeata

This, or a related problem, came up in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=834026#10

The summary is: Currently, if a Haskell library foo depends on bar, and bar uses a C library libbaz, then creating the shared library libHSfoo.so will be built with -lbaz, and hence link to libbaz.so.42, even though that dependency is already encoded in libHSbar.foo. Now, if libbaz is upgraded and bar is rebuilt the new version of libbaz, then baz’s ABI stays the same (so foo is not going to be rebuilt), but now foo is broken, as it cannot find libbaz.so.42 any more.

@trommler writes on LinkingHaskell (Emphasis mine), which should avoid this problem for us.

All Haskell libraries and all directly dependent C libraries must be passed to the linker.

Note: See TracTickets for help on using tickets.