Opened 3 years ago

Last modified 2 years ago

#12181 new bug

Multi-threaded code on ARM64 GHC runtime doesn't use all available cores

Reported by: varosi Owned by:
Priority: normal Milestone:
Component: Runtime System Version: 7.10.3
Keywords: Cc: simonmar
Operating System: Unknown/Multiple Architecture: arm
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


This is the machine:

Haskell ray-tracer that uses Control.Parallel.Strategies and parBuffer that is working well on x64 machine and using all the cores available use only 2 cores from 4 in total on that ARM machine. This machine usually work on two cores only and when it sees that they are used more - it enables two more to get total of four cores. If I give "+RTS -N4" it works just fine. So I think the problem is that the runtime doesn't check for all available cores, but only for enabled.

In the first link you could see that "lscpu" returns 4 cores in total.

Change History (11)

comment:1 Changed 3 years ago by thomie

varosi: you are reporting an issue with +RTS -N, correct?

When compiling the following program with -threaded, and running it with ./Main +RTS -N, it prints 2 instead of 4 on your machine:

import Control.Concurrent

main = getNumCapabilities >>= print

Here's the code that gets the number of processors when using +RTS -N, from rts/posix/OSThreads.c:

getNumberOfProcessors (void)
    static uint32_t nproc = 0;

    if (nproc == 0) {
#if defined(HAVE_SYSCONF) && defined(_SC_NPROCESSORS_ONLN)
        nproc = sysconf(_SC_NPROCESSORS_ONLN);
#elif defined(HAVE_SYSCONF) && defined(_SC_NPROCESSORS_CONF)
        nproc = sysconf(_SC_NPROCESSORS_CONF);
#elif defined(darwin_HOST_OS)
        size_t size = sizeof(uint32_t);
        if(sysctlbyname("hw.logicalcpu",&nproc,&size,NULL,0) != 0) {
            if(sysctlbyname("hw.ncpu",&nproc,&size,NULL,0) != 0)
                nproc = 1;
#elif defined(freebsd_HOST_OS)
        size_t size = sizeof(uint32_t);
        if(sysctlbyname("hw.ncpu",&nproc,&size,NULL,0) != 0)
            nproc = 1;
        nproc = 1;

    return nproc;

From man sysconf:

              The number of processors configured.

              The number of processors currently online (available).

comment:2 Changed 3 years ago by rwbarton

It really doesn't seem sensible to me to have GHC assume by default that CPUs that are off-line will magically become available under load. Though admittedly I don't know what the use of taking CPUs off-line is supposed to be.

This seems like a deficiency in the operating system, that there isn't a way to ask it "how many CPUs will my program run on". It's not for GHC to work around this I think. You could do so yourself by starting a thread that periodically checks the number of currently available processors and calls setNumCapabilities. Or just run with -N4.

comment:3 Changed 3 years ago by varosi

The problem seems to be more deep. Currently we run a program written in C for profiling of matrix multiplication and it runs on all cores. When we run Haskell program with "+RTS -N8" (similar machine but with heterogenous cores 4+4) it runs 8 OS threads but they are taking just half of available cores and program runs much slower than running it with "+RTS -N4".

This is the program for reference:

Last edited 3 years ago by varosi (previous) (diff)

comment:4 Changed 3 years ago by varosi

@thomie, yes, initially this 4 core machine is reporting only 1 active core. And as @rwbarton said, it is not a problem of GHC. So -N will not work correctly on that machine and we have to tell it explicitly 4 cores. It is actually tablet machine, so it save power with turning off cores.

comment:5 Changed 3 years ago by dobenour

What about having a Haskell API to tell the RTS to re-detect the number of CPUs, looking for the number of available processors?

comment:6 Changed 3 years ago by simonmar

You can already do this, with GHC.Conc.getNumberOfProcessors followed by GHC.Conc.setNumCapabilities.

comment:7 Changed 3 years ago by varosi

Isn't it good idea if GHC runtime is doing this once per second or the like? So this way it will automatically work on similar machines?

comment:8 Changed 3 years ago by simonmar

Yes, perhaps +RTS -N should automatically readjust at some regular interval.

comment:9 Changed 3 years ago by varosi

Or may be operating systems could have already mechanisms to signal of such changes without pulling them regularly.

comment:10 Changed 2 years ago by varosi

Will that issue enter 8.2 or 8.4?

comment:11 Changed 2 years ago by bgamari

At the moment there is no milestone meaning we aren't targetting any particular release for a fix. I will say that I'm not terribly keen on the idea of polling to get CPU information. I tend to agree with Reid that this is a distribution issue: bringing CPUs entirely offline for the sake of power management seems a bit crazy. Don't the Linux cpufreq, cpuidle, and clock management, and runtime PM subsystems exist precise to avoid this sort of heavy-weight power management?

Why not just implement the RTS's -N logic yourself, but using at _SC_NPROCESSORS_CONF instead of _SC_NPROCESSORS_ONLN?

Note: See TracTickets for help on using tickets.