Ticket #9 (closed defect: fixed)
Race/Async Exception issue in Network.Socket.connect
|Reported by:||sclv||Owned by:|
Submitted here as well: http://hackage.haskell.org/trac/ghc/ticket/3225
import Control.Concurrent import Control.Monad import Network.Socket import Control.Exception as C import System.Timeout import Network.BSD(hostAddresses, getHostByName) import System.IO.Error import Data.Maybe -- someHostName should be replaced by a real host that gives -- "connection refused" errors on connection to ports in the range. -- The latter ip is a junk one that should cause connections to -- hang indefinitely. -- More hostnames with either characteristic can be added to taste -- if that helps to reproduce the bug. servers = [ "someHostName", "18.104.22.168"] ports = [9001..9099] :: [Int] conns = [(h,p) | h <- servers, p <- ports] connectSock :: Integral a => String -> a -> IO Socket connectSock host port = do hn <- maybe (ioError . mkIOError doesNotExistErrorType "No Host Address" Nothing $ Just host) return . listToMaybe . hostAddresses =<< getHostByName host sk <- socket AF_INET Stream 6 connect sk (SockAddrInet (fromIntegral port) hn) `C.onException` sClose sk return sk pMapM f xs = mapM (\x -> forkIO $ f x) xs mapM' f xs = mapM (\x -> (C.try :: IO a -> IO (Either C.SomeException a)) (f x)) xs main = do -- This is the canary thread in the bugmine forkIO $ forever $ putStrLn "chirp" >> threadDelay 100000 -- This is the bug thread forever $ pMapM (\(h,p) -> timeout 1000000 (connectSock h p) >> return ()) conns >> threadDelay 2000000
The above code, compiled with the threaded runtime, causes a race condition. After roughly one to two cycles of the bug thread, the canary thread stops running, indicating that the program has become somehow trashed. (The bug thread stops running as well).
In experiments, this race condition is best triggered with at least two servers, one of which yields "connection refused" on connection, and the other of which simply hangs -- the nonsense ip address provided above works for the latter.
If the pMapM is replaced by mapM' (i.e. we switch from parallel to serial connection) then the bug does not appear to be triggered.
Wrapping the call to sClose in a mutex didn't seem to help, so it seems the race condition is in the connect call.