Opened 23 months ago

Last modified 18 months ago

#14407 new feature request

rts: Threads/caps affinity

Reported by: pacak Owned by:
Priority: normal Milestone:
Component: Runtime System Version: 8.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D4143
Wiki Page:

Description

Currently GHC supports two kinds of threads with respect to thread migration - pinned to a specific capability and those it can migrate between any capabilities. For purposes of achieving lower latency in Haskell applications it would be nice to have something in between - threads GHC rts can migrate but within a certain subset of capabilities only.

I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.

I have a patch for rts that implements this proposal

{- | 'setThreadAffinity' limits RTS ability to migrate thread to
capabilities with numbers that matches set bits of affinity mask, thus
mask of `0b101` (5) will allow RTS to migrate this thread to caps
0 (64, 128, ..) and 3 (64 + 3 = 67, 128 + 3 = 131, ...).

Setting all bits to 0 or 1 will disable the restriction.
-}
setThreadAffinity :: ThreadId -> Int -> IO ()

This allows to define up to 64 distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less sensitive things, caps 5-7 bulk things.

Sample program using this API

{-# LANGUAGE LambdaCase #-}

import Data.Time
import Control.Monad
import Control.Concurrent
import System.Environment (getArgs)
import GHC.Conc


wastetime :: Bool -> IO ()
wastetime affine = do
    tid <- forkIO $ do

        myThreadId >>= \tid -> labelThread tid "timewaster"

        forever $ do
            when (sum [1..1000000] < (0 :: Integer)) $
                print "impossible"
            threadDelay 100
            yield
    when affine $ setThreadAffinity tid (255 - 2)

client :: Bool -> IO ()
client affine = do
    myThreadId >>= \tid -> labelThread tid "client"
    when affine $ myThreadId >>= \tid -> setThreadAffinity tid 2
    before <- getCurrentTime
    replicateM_ 10 $ do
        threadDelay 10000
    after <- getCurrentTime
    print $ after `diffUTCTime` before

startClient :: Bool -> IO ()
startClient = {- replicateM_ 10 . -} client


main :: IO ()
main = do

    getArgs >>= \case
        [wno's, aff's] -> do
            let wno = read wno's
                aff = read aff's
            putStrLn $ unwords ["Affinity:", show aff, "Timewasters:", show wno]
            replicateM_ wno (wastetime aff)
            startClient aff
        _ -> putStrLn "Usage: <progname> <number of time wasters> <enable affinity: True/False>"


Compiled with -threaded and running with rts -N8 on 6 core (12 threads) machine.

Results are noisy but repeatable

Affinity: False Timewasters: 24
0.42482036s
Affinity: True Timewasters: 24
0.111743474s

Change History (4)

comment:1 Changed 23 months ago by pacak

Differential Rev(s): D4143

comment:2 Changed 23 months ago by pacak

Differential Rev(s): D4143Phab:D4143

comment:3 Changed 18 months ago by YitzGale

How does this relate to the setThreadAfinity mentioned in #10229, that was implemented 3 years ago in GHC 7.10?

comment:4 Changed 18 months ago by pacak

How does this relate to the setThreadAfinity mentioned in #10229, that was implemented 3 years ago in GHC 7.10?

They serve slightly different purpose. #10229 deals with distributing Haskell OS thread between underlying CPU cores in order to avoid when possible running code on different logical CPUs that gets mapped to the same physical CPU, while this setAffinity deals with distributing Haskell light threads between different capabilities - to allow giving some threads priority over some other threads but without having to manually pin them to specific capabilities.

Note: See TracTickets for help on using tickets.