Opened 16 months ago

Last modified 9 months ago

#15153 new bug

GHC uses O_NONBLOCK on regular files, which has no effect, and blocks the runtime

Reported by: nh2 Owned by:
Priority: normal Milestone: 8.10.1
Component: Runtime System Version: 8.2.2
Keywords: Cc: nh2, lehins, redneb
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

This is the outcome of https://mail.haskell.org/pipermail/ghc-devs/2018-May/015749.html

Reading through the code of readRawBufferPtr the first line jumped to my eye:

  | isNonBlocking fd = unsafe_read -- unsafe is ok, it can't block

This looks suspicious.

On Linux, if fd is a a descriptor to a regular file (on disk, a networked filesystem, or a block device), then O_NONBLOCK will have no effect, yet unsafe_read is used which will block the running OS thread.

You can read more about O_NONBLOCK not working on regular files on Linux here:

And indeed, the following program does NOT keep printing things in the printing thread, and instead blocks for 30 seconds:

module Main where

import           Control.Concurrent
import           Control.Monad
import qualified Data.ByteString as BS
import           System.Environment

main :: IO ()
main = do
  args <- getArgs
  case args of
    [file] -> do

      forkIO $ forever $ do
        putStrLn "still running"
        threadDelay 100000 -- 0.1 s
      bs <- BS.readFile file
      putStrLn $ "Read " ++ show (BS.length bs) ++ " bytes"

    _ -> error "Pass 1 argument (a file)"

when compiled with

~/.stack/programs/x86_64-linux/ghc-8.2.2/bin/ghc --make -O -threaded blocking-regular-file-read-test.hs

on my Ubuntu 16.04 and on a 2GB file like

./blocking-regular-file-read-test /mnt/images/ubuntu-18.04-desktop-amd64.iso

And strace -f -e open,read on it shows:

open("/mnt/images/ubuntu-18.04-desktop-amd64.iso", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 11
read(11,  <unfinished ...>

So GHC is trying to use O_NONBLOCK on regular files, which cannot work and will block when used through unsafe foreign calls like that.

Change History (7)

comment:1 Changed 16 months ago by nh2

This is mainly for Linux but may also apply to other OSs.

I've read that FreeBSD has a better story for reading files asynchronously, but I don't know whether GHC has any special code to use those features.

Input from FreeBSD users would be appreciated.

Otherwise, likely the fix to this for Linux will also improve things over the status quo on FreeBSD.

comment:2 Changed 16 months ago by nh2

Also funny but perhaps not too surprising:

If in my code, you replace forkIO by e.g. forkOn 2, then nondeterministically, sometimes the program hangs and sometimes it works with +RTS -N2.

The higher you set -N, the more likely it is to work.

If you put both the putStrLn loop and the readFile into forkOn 0 and forkOn 1 each, and run with +RTS -N3, then it always works as expected.

comment:3 Changed 16 months ago by simonmar

Yeah, this is well known. O_NONBLOCK doesn't do anything for local disk files, but it also doesn't do any harm. It's not usually a big problem except when the filesystem actually has significant latency, like with NFS or an arbitrary fuse filesystem.

I think I was worried about the overhead of using safe calls here, the problem is that if you have lots of threads all reading from Handles (e.g. sockets) then if those reads are safe calls we'll get a large number of real OS threads created.

comment:4 in reply to:  3 Changed 16 months ago by nh2

Replying to simonmar:

It's not usually a big problem except when the filesystem actually has significant latency, like with NFS or an arbitrary fuse filesystem.

What about my example though?

I'm just reading a 3GB file using BS.readFile and it blocks other threads for 30 seconds.

(My file system doens't have unexpected latency in terms of access time, it's a normal spinning disk, and the problem is the time it takes to read the data. Though I find it reasonablye to argue that the ~8ms it may take to seek this disk is also a lot of wasted CPU time, that's not the key problem in my example.)

I think I was worried about the overhead of using safe calls here, the problem is that if you have lots of threads all reading from Handles (e.g. sockets) then if those reads are safe calls we'll get a large number of real OS threads created.

The concern about overhead makes sense to me, but I find it easy to argue that there's more overheard in having all other resources of a machine (CPU, network, other disks) idle while a single disk arm is moving. And the time for a pthread_create is rather small compared to a read of any physical storage device.

Another argument is that if examples like the one I gave don't work, it is very hard to build reasonable heartbeating of long-running IO tasks in Haskell.

O_NONBLOCK doesn't do anything for local disk files, but it also doesn't do any harm

What I meant is that from a quick look at the code, it looked like if O_NONBLOCK isn't set, a safe call would be made instead.

In the spirit of https://ghc.haskell.org/trac/ghc/ticket/13296#comment:3 and the followup comment, would it be a good default to have reads to be safe, and offer unsafe equivalents when you know that the read will be very fast?

comment:5 Changed 15 months ago by bgamari

Milestone: 8.6.18.8.1

These won't be addressed for GHC 8.6.

comment:6 Changed 13 months ago by redneb

Cc: redneb added

comment:7 Changed 9 months ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of low-priority tickets.

Note: See TracTickets for help on using tickets.