Opened 2 years ago

Last modified 2 years ago

#14031 new bug

Linker paths carry substantial N*M overhead when many libaries are used

Reported by: nh2 Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.0.2
Keywords: Cc: nh2, bgamari, hvr
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #11587 Differential Rev(s):
Wiki Page:


"The linker search path problem"

It's common these days to have 400 (transitive) Hackage dependencies (even my hobby projects have that many).

When linking, statically or dynamically, GHC passes -L /path/to/foo/dir and -L HSfoo... to the linker for each Haskell package foo.

That results in N many linker search paths (-L) and M many library names. That will translate in N * M many stat() syscalls (file existence checks) in the linker to check where each file is.

For 400 package dependencies, that translates to 160k stat() syscalls. On my machine (on SSD and hot file system cache in RAM), that takes 0.7 seconds (measured with strace -c -w).

That is a substantial overhead for linking (for example, the executable with 400 dependencies with which I found this problem can be linked with in just 1.5 seconds).

If there is tooling in between GHC and the linker that also parses linker flags and checks for file existence (for example Nix's cc-wrapper and ld-wrapper scripts, or other instrumentation or build system tooling, which are often not written in compiled languages and thus much slower at stat()ing than the linker is), then the problem can be easily blown up another 10x.

As a result, you may be waiting a lot longer for your link to finish than necessary.

For static linking, this could be resolved by passing the archive files (libHSfoo....a) directly to the linker instead of the -L/-l combination.

For this we would need to check that passing files directly really have the same semantics as -L and -l, but the man pages suggest so (and I'll check it with the binutils authors).

For dynamic linking, it cannot be fixed this way. It can be fixed at compile time, but with dynamic linking the same problem exists at run-time (startup): Here all the -Ls become RPATHs if not installed in a global location (for example, nix has an RPATH for each package), and thus program startup can take up to those 0.7 seconds I mentioned.

To solve the runtime problem, the only solution seems to be to place the libraries into the same directory (i.e. to have M = 1 essentially).

This approach (putting all libs into 1 directory) would also solve it for the static linking case, with the drawback of this being not compatible with how cabal sandboxes, stack build and snapshot dirs, nix, Debian and other distributions deliver and build libraries. And for some of them this approach goes straight against their nature.

Detail discussion:

See below the relevant discussion on #ghc:

nh2:        bgamari-: do you know why ghc uses -L and -l for linking libHS... things instead of passing the .a file directly?
hvr:        nh2: btw, fun fact: on IBM AIX, both static and DSOs are stored in .a archives
bgamari-:   nh2, I suspect just since it makes the dynamic and static linking codepaths consistent
hvr:        nh2: and if you use .a for linking, you end up hardwiring the DSO path into your executable
nh2:        bgamari-: I'm looking into whether it would make sense to change it, at least for the platforms that would support it, because the (-L * -l) searching that it does currently seems to dominate link times here
nh2:        cc hvr
hvr:        nh2: sounds reasonable; but then you end up with longer CLI arguments
hvr:        nh2: it could be a problem for when the tools don't support @
bgamari-:   nh2, seriously?
bgamari-:   the search is really that long?
bgamari-:   nh2, it seems like that's a linker bug
hvr:        bgamari-: I guess it depends on how many -L there are
bgamari-:   nh2, how many libraries are you linking against?
hvr:        bgamari-: as each -L and -l multiply w/ each other requiring several stat() calls iirc
bgamari-:   hvr, yes
hvr:        but otoh, I wouldn't expect this to *dominate* linker times
bgamari-:   right
bgamari-:   hvr, this is the compile-time equivalent to #11587
bgamari-:   which I still think we should do
hvr:        right
bgamari-:   hmm
bgamari-:   although comment:9 sort of suggests that this is already the case for user-installed libraries
***bgamari- forgot about that
hvr:        I mean, if you have really *many* -L-paths
hvr:        then the quadratic effect becomes probably significant
bgamari-:   right
bgamari-:   nh2, how many libraries are you linking against?
nh2:        bgamari-: I have approximately L=450 and l=350. There is some notoriously slow tooling "in between" that I'm currently investigating, namely nix's ld-wrapper, which does does the stat'ing in bash, where statting that many files easily takes 1 second, but I did a quick test with `xargs ls -1` on the "$DIR/lib${name}.a" expansion, an that too takes 1 second, so it might not be entirely its fault
hvr:        ok, ~400 is quite a bit
bgamari-:   sheesh
bgamari-:   yes, that is... impressive
geekosaur:  next they'll want this to work on macOS >..
geekosaur:  >.>
hvr:        and then windows!
nh2:        even my hobby projects have 300 recursive deps by now
nh2:        bgamari-: my strace says that `ls` did the 160k stat()s in 0.7 seconds, which is half of the time that `gold` needs to link my binary
nh2:        bgamari-: I have now strace'd `ld` itself, that does it slightly faster, 0.5 seconds, but this is just because it's my hobby project and has "only" that many deps; for a real project that'd be 2 seconds just stat()ing
bgamari-:   nh2, sheesh
nh2:        bgamari-: of course nix's ld wrapper, being written in bash, does the same thing but at least 5x slower
bgamari-:   nh2, do you just add all of Hackage to your build-depends? ;)
bgamari-:   nh2, Instead of passing explicit paths I think I would rather just put all of the archives in LIBDIR
nh2:        bgamari-: I blame it on my dependees ;) also, "Haskell has high-quality libraries and encourages code reuse" comes back at us here
bgamari-:   although, admittedly, I haven't really thought through the consequences of this yet
bgamari-:   this does sound like a worthwhile thing to fix though
bgamari-:   that is a significant amount of time
nh2:        bgamari-: personally I'd WAY prefer explicit args vs env vars because then I can see them in ps, I can easily replay the specific command myself to debug it etc
hvr:        nh2: not every library on Hackage is "high-quality" ;-)
bgamari-:   I had always assumed that our large linking times we due to, well, the "linking"
bgamari-:   not just the stating
bgamari-:   nh2, fair enough, but I'd rather not have to add even more platform-dependent behavior to our interaction with the system toolchain
bgamari-:   oh, wait
bgamari-:   nh2, I didn't mean to pass via env var
bgamari-:   nh2, I just meant pass one -L
bgamari-:   nh2, where the static and dynamic libraries can both be
bgamari-:   found
nh2:        bgamari-: I don't know that, where can I read about that? Is it a flag?
bgamari-:   nh2, -L?
bgamari-:   it's just the usual --library-path flag
nh2:        bgamari-: I got confused by LIBDIR in caps
bgamari-:   ah
bgamari-:   right, I should have been more specific
nh2:        bgamari-: so what do you mean by that, somehow put all the libraries into a single dir?
bgamari-:   nh2, treat static archives the same way we treat dynamic libraries
bgamari-:   which is to place them in, e.g., $HOME/.cabal/lib/$ghc_version/
bgamari-:   instead of where they are currently,  $HOME/.cabal/lib/$ghc_version/$library_abi_name
bgamari-:   then you can just pass -L$HOME/.cabal/lib/$ghc_version to ld
bgamari-:   and a -l for each library
bgamari-:   I think this is probably as much a Cabal change as it is a GHC change
nh2:        bgamari-: that doesn't seem to go well with approaches that package separately, like debian, nix, cabal sandboxes and stack work dirs
hvr:        which includes new-build
bgamari-:   I'm not sure it's fundamentally incompatible
hvr:        yeah, but it's a lot of work to change everyone :-)
bgamari-:   to take the case of new-build, all that matters AFAIK is that the packages are registered in separate package dbs
bgamari-:   there's nothing stopping you from throwing all of the libraries into one directory
hvr:        bgamari-: well, I kinda exploit that new-build has nicely self-contained folders
hvr:        this makes it easier to atomically move around artifacts
nh2:        bgamari-: do you have a dislike for passing them explicitly as files beyond cross-platform worries? Because I really find explicit arguments the cleanest, easiest and most debuggable approach
hvr:        as I can stage the new artifact, and then move it into place
hvr:        atomically
hvr:        by simply renaming/moving the folder which contains everyting
bgamari-:   nh2, I'm just currently really not sure that passing libraries by path is really equivalent to passing via -l
hvr:        unfortauntely filesytems don't offer many other facilities for atomic transactions
bgamari-:   I would need to run some experiments
nh2:        bgamari-: gcc's and ld's man pages suggest that to me, looking at their respective `-l`; we certainly should double check (or maybe simpler, ask the linker authors)
bgamari-:   if we can really show that the linker treats explicit paths no differently then I'd be okay with it
nh2:        suggests that they are equivalent)
bgamari-:   but I'm rather weary
bgamari-:   nh2, are you only suggesting this for static libraries?
geekosaur:  it's supposed to be equivalent. but shared objects have the added issue of runtime discoverability (i.e. passing by full path at link time does not install a full path for runtime access)
geekosaur:  except on macos or older aix where it's always a full path)
nh2:        bgamari-: no, at least the stat()-taking-time problem is the same problem for .so's (though I'm using static linking in many places and would be appreciate an improvement even if limited to static linking)
bgamari-:   nh2, the problem is that your approach doesn't help dynamic linkage at all
bgamari-:   since the runtime linker still needs to search N paths
bgamari-:   so I would argue that new-build, stack, etc. really should be putting libraries in the same directory regardless
bgamari-:   more work or not
bgamari-:   it's the only way to truly solve the problem
mpickering: Wonders if David saw this issue
alanz:      mpickering: what are you planning to use the source plugins for?
nh2:        bgamari-: yeah you're right, the start-up cost of the executable will always be there for dynamic linking, I was purely talking from the compiler's perspective
bgamari-:   nh2, my point was that I would prefer to solve the problem once and for all
bgamari-:   rather than have another sharp edge that we then still need to solve
bgamari-:   given that moving the libraries solves both problems, it feels like the right solution
mpickering: alanz: haskell-indexer was the plan
nh2:        bgamari-: I agree that for dynamic linking that is the only solution. But I'm not sure there exist a fits-it-all solution across static and dynamic. At least for my work, I'd prefer to have static libs that can be anywhere, in nix-style immutable locations, and give up on dynamic linking (of Haskell libs) to get that. It seems that hvr's thoughts are going in the same direction on the cabal side, and lots of build system and OS movements these days seem to converge away from global shared dirs to dir-prefixes.
nh2:        I also suspect that static linking will be even more attractive now that sensible -split-sections is in place, which can give smaller code sizes, which .so's can't deliver.
nh2:        So I'm not sure that a big shift to the fits-it-all solution that satisfies dynamic linking needs would be the most practical or used; it'd certainly be nice to have it for dynamic linking, but I'm not sure all the downstreams can be forced to do it that way for non-static if it quite fundamentally dictates how they have to work. If they can, it would probably take pretty long.

Change History (6)

comment:1 Changed 2 years ago by bgamari

This is essentially the compile-time equivalent of #11587.

comment:2 Changed 2 years ago by nh2

There is more info from sphalerite on #nixos on the topic of dynamic linking.

Apparently it is possible to give "default locations" for .so files, instead of using RPATHs:

sphalerite[m]: I opened an issue about that (linking through absolute paths rather than rpath) a couple of months back it's possible, just not implemented
sphalerite[m]: on linux that is -- on macOS it's already done through absolute paths
nh2:           that's excellent news, can you give me a short summary of what I have to do to give it an explicit path? Also, will the resulting link just be a "hint", so still be overridable with LD_LIBRARY_PATH?
sphalerite[m]: It's not implemented in nixpkgs's machinery yet, so you'd need to pass absolute paths to the linker manually instead of using `-l<library>`
sphalerite[m]: and no, it would not be overridable using LD_LIBRARY_PATH at that point. LD_PRELOAD would still work though.
sphalerite[m]: That stuff is all discussed in the issue comments though
nh2:           I'm getting a bit lost in the amount of comments; what feature/flags to ld is the magic that has this effect?
sphalerite[m]: passing absolute paths instead of -lXYZ
sphalerite[m]: so rather than -lreadline, /nix/store/...-readline-1.2.3/lib/
nh2:           ah, so passing explicit .so to `ld` doesn't do the same as `-l` does, but instead makes it do this "default path link"? I always assumed `-l` just finds files and then passes them normally (or so at least `man ld` suggests)
sphalerite[m]: Yeah no, it's not the same
sphalerite[m]: Huh, actually they seem to be producing the same result for me in a test I did just now

This nix issue discusses the mentioned approach.

comment:3 Changed 2 years ago by nh2

nh2:           and in the case where the "hinted" so location doesn't exist, what does it do, does it then search the search paths?
sphalerite[m]: no. It's not a hint, it's "the library is here". It's "if the library is not here this program will not run."
nh2:           hmm OK. So that approach only works if you really know ahead of time where the lib will be

comment:4 Changed 2 years ago by nh2

I forgot to mention, gold has an optimisation that trades the stat() against reading the directory contents; this makes it much faster at the problem and theoretically avoids the N * M complexity.

However, it's not an all-round solution because it can be slower when the directories are very large / contain files we don't care about -- we'd read those only to throw them away. So it's essentially changing the the problem into N * (number of total files in each N).

comment:5 Changed 2 years ago by nh2

I asked the binutils upstream question here:

comment:6 Changed 2 years ago by nh2

Result on the binutils mailing list:

It is mostly just lookup convenience. If you replace the -l option with the full path of the library that is found, the result will be almost identical. The only differences I know of are error messages and the soname that will be used for a shared library with no DT_SONAME dynamic tag.

Note: See TracTickets for help on using tickets.