Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#4993 closed bug (invalid)

getDirectoryContents goes into an infinite loop

Reported by: bos Owned by:
Priority: normal Milestone:
Component: libraries/directory Version: 7.0.1
Keywords: Cc:
Operating System: Linux Architecture: x86
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

This is a really peculiar bug. Prepare for some fun!

I'm using a tool called boxgrinder to create a virtual machine image for running in Amazon's EC2 compute environment. The virtual machine is intended to contain GHC and a few libraries. I'm running boxgrinder itself in an EC2 virtual machine.

boxgrinder works fine on a 64-bit virtual machine, when creating a 64-bit virtual machine, but gets stuck forever when I try to run it in a 32-bit VM to create a 32-bit image.

The reason for boxgrinder hanging is that when installing GHC, ghc-pkg is going into an infinite loop. I've got a super simple reproduction:

import System.Directory

main = getDirectoryContents "." >>= print

If compiled, the above command runs fine in the regular 32-bit system image, but not in the chroot filesystem created by boxgrinder. In there, it goes into an infinite loop right after reading some of the directory contents:

... everything looks normal ...
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
getdents64(3, /* 25 entries */, 32768)  = 640
getdents64(3, /* 0 entries */, 32768)   = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
... and no more life! ...

I have verified that this infinite loop occurs both with 6.12.3 and 7.0.2 RC2. Investigating further at the moment.

Change History (3)

comment:1 Changed 9 years ago by bos

Resolution: invalid
Status: newclosed

I've got it figured out. It's not a GHC bug, and it's quite subtle.

So. On a 32-bit system, glibc relies on being able to do tricks like %gs:-1234 to get negative offsets from the TLS (thread-local storage) base. But in fact that's really gsbase + (big positive number), and glibc relies on wraparound to get the resulting negative value. But the CPU only does that iff the segment limit is a full 4G.

Under Xen, segments are clipped to protect the hypervisor, so we need a version of glibc which spends a couple more instructions to compute the negative offset without relying on segment wrapping.

This is done via an ld.so.conf entry:

hwcap 1 nosegneg

The above directs ldconfig and ld.so to use the Xen-friendly version of glibc.

So this isn't really a GHC bug at all, but perhaps a boxgrinder bug. In any case, the symptom is appearing in GHC :-(

comment:2 Changed 9 years ago by bos

I filed a bug against boxgrinder: https://issues.jboss.org/browse/BGBUILD-172

comment:3 Changed 9 years ago by simonmar

Wow, fascinating. Nice catch, glad it wasn't our fault :)

Note: See TracTickets for help on using tickets.