Opened 9 years ago

Closed 9 years ago

#4828 closed bug (fixed)

ghci fails to load fat binary archives on OS X

Reported by: gwright Owned by: igloo
Priority: normal Milestone: 7.4.1
Component: Runtime System Version: 7.0.1
Keywords: Cc: pho@…
Operating System: MacOS X Architecture: Unknown/Multiple
Type of failure: GHCi crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

ghci crashes when loading fat binary archives (i.e., multi-architecture .a files) on OS X. From looking at rts/Linker.c, the fat header isn't parsed at all.

This isn't hard to fix, it's just a matter of examining the fat header structure, computing the offset to the host architecture in the archive, and continuing with the processing in loadArchive.

This bug affect people who build universal architecture libraries on OS X. The only workaround now is to rebuild the library for a single architecture.

It would be good to get this fixed for 7.0.2; I'll try to produce a patch within a week.

Attachments (3)

Linker.c.diff (7.1 KB) - added by gwright 9 years ago.
Patch for rts/Linker.c to load fat archives on OS X
fat-archive.dpatch (36.6 KB) - added by igloo 9 years ago.
Linker.c.dpatch (22.2 KB) - added by gwright 9 years ago.

Download all attachments as: .zip

Change History (23)

comment:1 Changed 9 years ago by PHO

Cc: pho@… added

comment:2 Changed 9 years ago by gwright

I sent a patch which fixes this on the ghc-7-branch. If accepted, it should also be applied to HEAD.

comment:3 Changed 9 years ago by gwright

Status: newpatch

Changed 9 years ago by gwright

Attachment: Linker.c.diff added

Patch for rts/Linker.c to load fat archives on OS X

comment:4 Changed 9 years ago by gwright

The attached patch also fixes some warnings in debug statements. No substantive changes, just casts to suppress mismatched type warnings.

comment:5 Changed 9 years ago by igloo

Owner: set to igloo

Changed 9 years ago by igloo

Attachment: fat-archive.dpatch added

comment:6 Changed 9 years ago by igloo

Milestone: 7.2.1

Thanks, I've applied the cast warning fixes.

For the fat archive support, I've attached an untested patch which merges bits of yours together, so there is less code duplication. Don't we need something to stop it carrying on past the end of this arch's section, though?

Milestoning for 7.2.1, as this isn't blocking anything AFAIK.

comment:7 in reply to:  6 Changed 9 years ago by gwright

Replying to igloo:

Thanks, I've applied the cast warning fixes.

For the fat archive support, I've attached an untested patch which merges bits of yours together, so there is less code duplication. Don't we need something to stop it carrying on past the end of this arch's section, though?

Milestoning for 7.2.1, as this isn't blocking anything AFAIK.

Thank you for cleaning up the patch; I'll give it a try.

I don't think we need anything to prevent reading past the host architecture's archive. The header for each archive within the fat archive contains the length of the archive in bytes. As long as we use that instead of just blindly searching for end-of-file, we should be OK. If we run off the end, we should fix the archive loader to read the archive length.

Also, the original patch would barf if it tried to read more than nfat_arch fat_header blocks, so it wouldn't run past the headers into the archives. I think your patch does the same.

comment:8 Changed 9 years ago by gwright

I've tested a version of your patch and it loads fat archives correctly. (The only difference between your patch and the one I've attached are a few typographical corrections.)

Building with GhcDebugged=YES, we can see the result:

plumbbob-franklin> inplace/bin/ghc-stage2 --interactive +RTS -Dl -RTS 2>&1 | grep loadArchive
initLinker: inserting rts symbol _loadArchive, 0x101f01bb8
loadArchive: Loading archive `/opt/local/lib/libiconv.a'
loadArchive: found a fat archive containing 2 architectures
loadArchive: found my archive in a fat archive
loadArchive: Found member file `__.SYMDEF SORTED'
loadArchive: Found member file `iconv.o'
loadArchive: Found member file `localcharset.o'
loadArchive: Found member file `relocatable.o'
loadArchive: Loading archive `/opt/local/lib/libcharset.a'
loadArchive: found a fat archive containing 2 architectures
loadArchive: found my archive in a fat archive
loadArchive: Found member file `__.SYMDEF SORTED'
loadArchive: Found member file `localcharset.o'
loadArchive: Found member file `relocatable.o'
Prelude> 

I checked the loadArchive routine and it does get the length of an archive from the archive header, so we should never run past the end.

comment:9 Changed 9 years ago by gwright

I forgot to mention that to test the loading of fat archives, I temporarily backed out Igloo's change to compiler/ghci/Linker.lhs. The default behavior is still to use a dynamic library and let the dynamic linker worry about all this.

Changed 9 years ago by gwright

Attachment: Linker.c.dpatch added

comment:10 Changed 9 years ago by gwright

I've attached the tested patch.

comment:11 in reply to:  8 ; Changed 9 years ago by igloo

Replying to gwright:

I checked the loadArchive routine and it does get the length of an archive from the archive header, so we should never run past the end.

Where does it do that?

As far as I can see the

    while(1) {
        n = fread ( fileName, 1, 16, f );
[...]
    }

loop will keep reading off the end of the arch's segment.

comment:12 in reply to:  11 Changed 9 years ago by gwright

Replying to igloo:

Replying to gwright:

I checked the loadArchive routine and it does get the length of an archive from the archive header, so we should never run past the end.

Where does it do that?

As far as I can see the

    while(1) {
        n = fread ( fileName, 1, 16, f );
[...]
    }

loop will keep reading off the end of the arch's segment.

Hmm. I see where it finds the length, but you're right that it looks as if it keeps reading until the end of the file, rather than the end of the archive. I'll add some debug statements so I can follow along. If it really is reading past the end of the archive, I'll fix it.

comment:13 Changed 9 years ago by gwright

Ugh. loadArchive is worse than I thought. On OS X (at least) it doesn't even load the objects modules in the archive. This is because we read from the archive header the width of the object file name field, not its length, so we miss the ".o" at the end of the name. I've fixed this, but there are more problems.

When loading the object modules we get multiply defined symbol errors. This is because we don't correctly distinguish external symbols (flags N_EXT and N_SECT set) and internal symbols (only N_SECT set).

I've added a number of debug statements so we can watch what's going on in detail and will attach that patch to this ticket.

The failure to load any object modules was concealing the error that should have occurred when reading past the end of an archive in a fat archive. I still need to fix that.

Good that we're targeting 7.2 for this one.

comment:14 Changed 9 years ago by gwright

There's something quite odd going on here. I've made changes (not tested fully) to fix the "run off the end of the archive" issue. I've added more debug statements to see what's going on with the duplicate symbols. The code in ocGetNames_MachO looks as if it checks N_EXT and N_SECT correctly, but symbols that shouldn't be inserted (because N_EXT isn't set) are inserted anyway.

This is probably just some C subtlety that is right before my eyes, but I don't see it yet.

comment:15 Changed 9 years ago by gwright

What is wrong is not a C subtlety, it's how the linker handles "private external" symbols (symbols with N_PEXT set in the type field). The OS X documentation is terse on this point, but perhaps private externals don't need to be inserted into the symbol table.

comment:16 Changed 9 years ago by gwright

If I simply do not insert symbols with the N_PEXT flag set, I'm able to load the archives on OS X 10.6 64-bit. I still need to test this on 32-bit. (In the 64-bit archives I've been using, the 64-bit entries are last, so my fixes to prevent "running off the end" aren't tested. I have access to a 32-bit machine running 10.6 to test this.)

After testing on 32-bit I'll run the full testsuite.

comment:17 Changed 9 years ago by hesselink

I think I've run into this bug, but not using GHCi. I'm building a package which uses HDBC-postgresql. During building, it tries to load a postgresql archive, and fails:

Loading package HDBC-postgresql-2.2.3.3 ... ghc: internal error: loadArchive: Not an archive: `/Library/PostgreSQL/8.4/lib/libpq.a'
    (GHC version 7.0.1 for i386_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

Examining the file in question shows it is a universal binary with two architectures:

$ file /Library/PostgreSQL/8.4/lib/libpq.a 
/Library/PostgreSQL/8.4/lib/libpq.a: Mach-O universal binary with 2 architectures
/Library/PostgreSQL/8.4/lib/libpq.a (for architecture ppc):	current ar archive random library
/Library/PostgreSQL/8.4/lib/libpq.a (for architecture i386):	current ar archive

If I read this ticket correctly, it seems this means HDBC-postgresql will remain unusable with GHC 7.0, and thus with the coming haskell platform. It that correct?

comment:18 Changed 9 years ago by igloo

No, it'll work due to this patch:

Mon Dec 13 12:49:30 GMT 2010  Ian Lynagh <igloo@earth.li>
  * GHCi linker: Assume non-Haskell libraries are dynamic libs
  Ignore-this: aa153a8f6e309c7b3dae7e46bb7a9583
  This works around a segfault we get when trying to load libiconv.a on
  some platforms.

comment:19 in reply to:  17 Changed 9 years ago by gwright

Replying to hesselink:

I think I've run into this bug, but not using GHCi. I'm building a package which uses HDBC-postgresql. During building, it tries to load a postgresql archive, and fails:

Loading package HDBC-postgresql-2.2.3.3 ... ghc: internal error: loadArchive: Not an archive: `/Library/PostgreSQL/8.4/lib/libpq.a'
    (GHC version 7.0.1 for i386_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

Examining the file in question shows it is a universal binary with two architectures:

$ file /Library/PostgreSQL/8.4/lib/libpq.a 
/Library/PostgreSQL/8.4/lib/libpq.a: Mach-O universal binary with 2 architectures
/Library/PostgreSQL/8.4/lib/libpq.a (for architecture ppc):	current ar archive random library
/Library/PostgreSQL/8.4/lib/libpq.a (for architecture i386):	current ar archive

If I read this ticket correctly, it seems this means HDBC-postgresql will remain unusable with GHC 7.0, and thus with the coming haskell platform. It that correct?

As igloo noted, this bug isn't you're problem. I've not finished testing my patches for this issue because it's been pushed back to 7.2.1 and there have been more urgent issues with the OS X linker (e.g., #4867).

comment:20 Changed 9 years ago by igloo

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.