Opened 9 years ago

Closed 9 years ago

#4391 closed bug (fixed)

forkIO threads do not properly save/restore the floating point environment

Reported by: draconx Owned by:
Priority: normal Milestone: 7.4.1
Component: Runtime System Version: 6.12.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

When using forkIO threads, the floating point environment is not correctly saved and/or restored on context switches. This causes floating point state to leak between the threads.

For example, if two threads (call them A and B) are created, and thread A changes the rounding direction, computations in thread B will be carried out under the rounding direction set by A, and vice versa. Likewise, any exceptions raised by computations in thread A are visible in thread B.

If forkIO threads are not intended to be used with floating point, this should be spelled out explicitly in the documentation rather than some vague notion of avoiding "thread local state".

I've attached a short test program which demonstrates this issue. For portability, it uses standard C library functions to manipulate the floating point environment. It prints the following when compiled with GHC (parenthetical statements added):

A: ToNearest   (default rounding mode)
A: True
B: ToNearest
B: TowardZero
B: Infinity    (result of division by 0)
A: [DivByZero] (A sees B's exception)
A: TowardZero  (A sees B's rounding direction)
A: False

If we use OS threads (which properly save/restore the floating point environment):

A: ToNearest
A: True
B: ToNearest
B: TowardZero
B: Infinity   (result of division by zero)
A: []         (no new exceptions are visible)
A: ToNearest  (rounding direction is unchanged)
A: True

Attachments (1)

fenvfail.tar.gz (1.4 KB) - added by draconx 9 years ago.
Test case

Download all attachments as: .zip

Change History (18)

Changed 9 years ago by draconx

Attachment: fenvfail.tar.gz added

Test case

comment:1 Changed 9 years ago by igloo

Milestone: 7.2.1

Thanks for the report; we'll take a look.

comment:2 Changed 9 years ago by simonmar

Component: CompilerRuntime System

I doubt we'll fix this. The floating-point environment is 9 words of stuff (according to my rough count), whereas a Haskell thread is 17 words. Typically the amount of stuff saved and restored over a context switch is only a handful of words, which is why Haskell threads are so cheap. If we had to save the floating point unit state as well, that would significantly impact thread performance.

Is it terrible to require the use of OS threads for this? Or is there some way we can avoid saving and restoring the full FP state?

comment:3 Changed 9 years ago by draconx

Using bound threads is an option, but this needs to be explicitly spelled out in the documentation: every single floating point operation across the entire program must be performed from bound threads. It would be interesting to see how much code on hackage is broken in this regard...

Currently, if any unbound thread performs any floating point operation (perhaps deep in an external library or foreign call), floating point operations in _all_ threads (even bound ones!) are subject to subtle and rare races.

This behaviour is not at all obvious or discoverable.

comment:4 Changed 9 years ago by simonmar

Can you give me an example of something that goes wrong? As far as I know, this doesn't affect ordinary floating point operations, it is only an issue if you want to change the floating point rounding mode or exception mode.

comment:5 Changed 9 years ago by draconx

The thing is, it _does_ affect "ordinary" floating point operations.

Ordinary floating point operations can raise exceptions, which will then leak to other threads. Similarly, ordinary floating point operations depend on the rounding mode, which is leaked from other threads.

For example, if any unbound thread anywhere in the program does something innocuous like a single floating point addition, then any attempt to handle exceptions elsewhere in the program is subject to races with that unbound thread.

comment:6 Changed 9 years ago by rwbarton

Perhaps a better question is whether it affects "ordinary" programs, where by "ordinary" programs I mean ones that don't invoke the fe* family of functions (via the FFI) or the equivalent. My understanding is that it does not, but I am not an expert on the subject; please correct me if I am wrong.

If your program does write the rounding mode register or read from the exception register, then this behavior of forkIO is not the only problem you'll have: the Prelude exports the floating point arithmetic operations as pure functions, even though they depend on the rounding mode and can raise exceptions. This obviously breaks referential transparency and compiler optimizations, so you could say that even the Prelude is "broken" under these conditions. Asking "how much code on hackage is broken in this regard" is meaningless, unless you think there a lot of code on hackage which goes and mucks with the floating point state on unsuspecting programs (seems unlikely to me).

It would be nice to mention in the documentation, maybe in section "12.1.2. GHC's interpretation of undefined behaviour in Haskell 98" under "Unchecked float arithmetic", that GHC does not manage the floating point rounding or exception registers in any way; in particular, they are not saved or restored by Haskell thread switches, and accessing them directly in conjunction with using built-in floating point operations causes undefined behavior.

comment:7 in reply to:  6 ; Changed 9 years ago by draconx

Replying to rwbarton:

Perhaps a better question is whether it affects "ordinary" programs, where by "ordinary" programs I mean ones that don't invoke the fe* family of functions (via the FFI) or the equivalent. My understanding is that it does not, but I am not an expert on the subject; please correct me if I am wrong.

Since I don't know _exactly_ what parts of the CPU's floating point state are not preserved on context switches, it's hard to say what _won't_ break. All that I know from the test program is that rounding modes and exceptions are not preserved. If this is the only thing that is not preserved, then a program that (a) never changes the rounding direction, and (b) never inspects the floating exceptions should be unaffected.

But note: many common operations change the rounding direction and you might not even realize it! For example: a common way to implement "round to integer towards negative infinity" (i.e. the "floor" function in C, or "roundToIntegralTowardNegative" in IEEE parlance) efficiently on the x87 is by code somewhat like the following:

  x <- get rounding mode
  set rounding mode to towards negative infinity
  round to integer (according to the current rounding direction)
  set rounding mode to x

On these systems, a foreign call to "floor" would probably do something like the above. On the other hand, I *think* the runtime won't pre-empt foreign calls so we may be OK (the documentation is not clear what happens if a signal arrives in the middle of this, though...).

If your program does write the rounding mode register or read from the exception register, then this behavior of forkIO is not the only problem you'll have: the Prelude exports the floating point arithmetic operations as pure functions, even though they depend on the rounding mode and can raise exceptions. This obviously breaks referential transparency

You are right that the "pure" nature of the prelude functions is problematic. However, in this case we can, with some care, control the evaluation order: I don't think that the problems are much worse than those of unsafePerformIO.

and compiler optimizations

I would consider any compiler optimization that prevents floating point from working reliably to be a separate bug orthogonal to this issue.

Note that such optimizations are usually problematic even if the program does not touch the rounding mode or exceptions, because they can affect the stability of an algorithm.

so you could say that even the Prelude is "broken" under these conditions.

The Prelude has an awful lot of problems with respect to floating point, but I don't think they have anything to do with this particular issue.

Asking "how much code on hackage is broken in this regard" is meaningless, unless you think there a lot of code on hackage which goes and mucks with the floating point state on unsuspecting programs (seems unlikely to me).

Almost every floating point operation potentially mucks with the floating point state, because almost every operation can raise exceptions. If an unbound thread _anywhere_ in the program raises an exception, it becomes impossible to reliably inspect the floating point exception state _anywhere_ in the program. I suspect that there may be packages on Hackage which use unbound threads and floating point, and the implications are that any program which uses those packages (perhaps transitively) would be unable to rely on floating point exceptions working, even if their code is otherwise perfect.

Unlike the case with secretly-impure functions, I'm not aware of any way to control the way in which threads are pre-empted or migrated.

It would be nice to mention in the documentation, maybe in section "12.1.2. GHC's interpretation of undefined behaviour in Haskell 98" under "Unchecked float arithmetic", that GHC does not manage the floating point rounding or exception registers in any way; in particular, they are not saved or restored by Haskell thread switches, and accessing them directly in conjunction with using built-in floating point operations causes undefined behavior.

That would be a very unfortunate limitation as it goes well beyond "don't use floating point in unbound threads".

comment:8 Changed 9 years ago by isaacdupree

An example which actually does computation with Doubles or Floats and gets the "wrong" answer would be helpful. I imagine that different-rounding-than-expected should be pretty easy to demonstrate. If you can, also make an example that does something worse than that due to a Float arithmetic-operation being executed with an unfortunate floating-point environment. (say, crashing the program, or getting a totally bogus result that isn't even close). This can help as both a test-case and a measure of practical severity.

comment:9 in reply to:  7 Changed 9 years ago by rwbarton

Replying to draconx:

Replying to rwbarton:

It would be nice to mention in the documentation, maybe in section "12.1.2. GHC's interpretation of undefined behaviour in Haskell 98" under "Unchecked float arithmetic", that GHC does not manage the floating point rounding or exception registers in any way; in particular, they are not saved or restored by Haskell thread switches, and accessing them directly in conjunction with using built-in floating point operations causes undefined behavior.

That would be a very unfortunate limitation as it goes well beyond "don't use floating point in unbound threads".

But that's just the way it is, due to the types of the built-in floating point operations not involving IO. For example, your program produces different output ("A: True" on the last line) when compiled with -O2, due to constant folding.

comment:10 Changed 9 years ago by simonmar

Ok, so the simple fact is that if you use functions from fenv.h then floating-point operations are rendered impure, and all bets are off.

We can document exactly what GHC does, so those who really know what they're doing might be able to use fenv.h functions reliably.

Regarding the FFI point: GHC does not preempt a foreign call, so it's safe to temporarily change the rounding mode during a foreign call, as long as it is changed back before returing. Both safe and unsafe foreign calls are fine in this respect.

comment:11 Changed 9 years ago by duncan

I've occasionally thought about what it would take to do full IEEE floating point in Haskell. Assuming one can come up with a high level pure API (e.g. modeling rounding mode etc as extra input parameters) then the next big problem is at the ABI / codegen layer. With lazy evaluation you're calling unknown functions all the time and these expect the default FP state. I think you'd have to make the FP state part of the function call ABI. That is, for normal calls to normal functions you'd have to save/restore the FP state around the call. For functions that take the rounding/exception mode as a parameter then you can call them without save/restore. The danger is that it'd be rather expensive.

comment:12 Changed 9 years ago by simonmar

Status: newmerge

I added a section to the FFI part of the User's Guide:

Fri Nov 26 12:53:36 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * Document the behaviour of fenv.h functions with GHC (#4391)

The text I added was

    <sect2 id="ffi-floating-point">
      <title>Floating point and the FFI</title>

      <para>
        On POSIX systems, the <literal>fenv.h</literal> header
        provides operations for inspecting and modifying the state of
        the floating point unit.  In particular, the rounding mode
        used by floating point operations can be changed, and the
        exception flags can be tested.
      </para>

      <para>
        In Haskell, floating-point operations have pure types, and the
        evaluation order is unspecified.  So strictly speaking, since
        the <literal>fenv.h</literal> functions let you change the
        results of, or observe the effects of floating point
        operations, use of <literal>fenv.h</literal> renders the
        behaviour of floating-point operations anywhere in the program
        undefined.
      </para>

      <para>
        Having said that, we <emphasis>can</emphasis> document exactly
        what GHC does with respect to the floating point state, so
        that if you really need to use <literal>fenv.h</literal> then
        you can do so with full knowledge of the pitfalls:
        <itemizedlist>
          <listitem>
            <para>
              GHC completely ignores the floating-point
              environment, the runtime neither modifies nor reads it.
            </para>
          </listitem>
          <listitem>
            <para>
              The floating-point environment is not saved over a
              normal thread context-switch.  So if you modify the
              floating-point state in one thread, those changes may be
              visible in other threads.  Furthermore, testing the
              exception state is not reliable, because a context
              switch may change it.  If you need to modify or test the
              floating point state and use threads, then you must use
              bound threads
              (<literal>Control.Concurrent.forkOS</literal>), because
              a bound thread has its own OS thread, and OS threads do
              save and restore the floating-point state.
            </para>
          </listitem>
          <listitem>
            <para>
              It is safe to modify the floating-point unit state
              temporarily during a foreign call, because foreign calls
              are never pre-empted by GHC.
            </para>
          </listitem>
        </itemizedlist>
      </para>
    </sect2>

Please re-open if you disagree or want to suggest alternative text.

comment:13 Changed 9 years ago by draconx

The fenv.h functions are standard C, not limited to POSIX systems. Also, this IMO needs to be in the Control.Concurrent manual, because this has everything to do with threads and nothing to do with the FFI.

About purity: the thing is, even if we had the perfect pure API for floating point, you'd _still_ be bitten by this issue! That's because the issue is not about purity at all: it's about the runtime clobbering CPU registers on context switches. Note that integer operations on a certain popular CPU architecture are just as "impure" as floating point: there is a register which stores flags such as overflow state that is modified by operations (this register performs essentially identical function to the floating point control word on that same popular architecture). But nobody would posit that "all bets are off" because a program uses conditional branches! No amount of purity will save you from silent data corruption.

Nevertheless, I overestimated the damage: using bound threads appears to be a suitable workaround. I was under the impression that a bound thread could be pre-empted by an unbound thread, but some testing reveals this to not be the case.

comment:14 in reply to:  13 ; Changed 9 years ago by simonmar

Replying to draconx:

The fenv.h functions are standard C, not limited to POSIX systems.

True, I'll fix that.

Also, this IMO needs to be in the Control.Concurrent manual, because this has everything to do with threads and nothing to do with the FFI.

I don't think I agree. You have to use the FFI to access the fenv.h functions, and if you do, you have problems even without threads.

About purity: the thing is, even if we had the perfect pure API for floating point, you'd _still_ be bitten by this issue! That's because the issue is not about purity at all: it's about the runtime clobbering CPU registers on context switches. Note that integer operations on a certain popular CPU architecture are just as "impure" as floating point: there is a register which stores flags such as overflow state that is modified by operations (this register performs essentially identical function to the floating point control word on that same popular architecture). But nobody would posit that "all bets are off" because a program uses conditional branches! No amount of purity will save you from silent data corruption.

Are you claiming we have a problem with the overflow flag, or other CPU state?

I get the impression from your comments that you think GHC preempts threads at arbitrary points, and therefore has to save the entire CPU state, like OS threads do. We don't do that - threads are preempted at safe points only, and we know exactly what state needs to be saved and restored (it doesn't include the overflow flag, for instance, because we know that a safe point never occurs between an instruction that sets the overflow flag and one that tests it). Using safe points means that we have much less state to save than OS threads, which is why Haskell threads are much cheaper. There are costs of course - for example it's harder to preempt tight loops without sacrificing performance, and GHC doesn't currently attempt to do that.

We do handle certain global state specially. A good example is the errno variable: we save the value of errno over a context switch. We could do the same thing with the FPU state, but we've decided not to, for the reasons already explained. I'm sure this isn't the perfect solution for everyone, but I think it's the best compromise.

comment:15 in reply to:  14 ; Changed 9 years ago by draconx

Replying to simonmar:

Replying to draconx:

Also, this IMO needs to be in the Control.Concurrent manual, because this has everything to do with threads and nothing to do with the FFI.

I don't think I agree. You have to use the FFI to access the fenv.h functions

Sure, but the fenv.h functions aren't the only way to access the floating point unit (the fact that they're standard C helps with portability, though). Maybe it's impossible to access it without using the FFI, but that's incidental: I don't think you can access the internet without using the FFI either, but that doesn't mean we should put issues related to IP fragmentation in the FFI section of the user guide.

and if you do, you have problems even without threads.

The reason why this is purely a threading issue is because it affects *all* floating point, not just the built-in floating point ops. You will get no argument from me in saying that the built-in ops are problematic. To illustrate this, let's consider an example. Suppose we had a floating point API which captures all impurity. For simplicitiy, we'll do this by putting everything in IO. Our API therefore looks something like the following:

dblAdd :: Double -> Double -> IO Double
dblMul :: Double -> Double -> IO Double
fpSetRoundingMode :: FPRoundingMode -> IO ()
fpTestExceptions  :: IO [FPException]
etc.

We can easily implement this API, today, using the FFI (ignoring all issues related to the marshalling of floating point data). Further suppose that the program never uses any of the built-in floating point ops, thus there are trivially no problems related to impurity. This API might even come from a library which hides the fact that it uses the FFI internally.

A well-intentioned application developer has a correct single-threaded program using this API. She realizes that she can make it faster by using threads, so she turns to the threading documentation. The thread documentation tells her that forkOS threads are needed if you use "thread local state", otherwise forkIO threads are much faster (the docs emphasize this last point *very* strongly). It makes no mention of floating point, so she (quite reasonably) assumes that floating point (which doesn't depend on "thread local state" in the usual sense of that term) is OK to use with forkIO.

Little does she know that her program is now subject to subtle, rare races. Despite extensive testing, these races are never encountered in the lab. The issue remains hidden until the maiden launch of the spacecraft on which her code is running, at which point a mishandled floating point exception causes the craft to break apart shortly after takeoff.

About purity: the thing is, even if we had the perfect pure API for floating point, you'd _still_ be bitten by this issue! That's because the issue is not about purity at all: it's about the runtime clobbering CPU registers on context switches. Note that integer operations on a certain popular CPU architecture are just as "impure" as floating point

Are you claiming we have a problem with the overflow flag, or other CPU state?

No, I didn't mean to suggest that there was any problem with the handling of the integer overflow flag. I just wanted to draw a parallel to show that the issues are the same; that floating point is not somehow special in this regard.

I get the impression from your comments that you think GHC preempts threads at arbitrary points

From the perspective of the application developer, this is exactly what happens, since it's essentially impossible to know in advance when memory allocations will or will not occur. Furthermore, the wording in the docs suggests that it's not even safe to rely on this. Statements such as "threads are interleaved in a random fashion" and "GHC doesn't *currently* attempt [to preempt tight loops]" (emphasis mine) suggest that threads might be preempted for other reasons in the future.

We don't do that - threads are preempted at safe points only, and we know exactly what state needs to be saved and restored (it doesn't include the overflow flag, for instance, because we know that a safe point never occurs between an instruction that sets the overflow flag and one that tests it).

That's fine, but AFAIK there's no way for an application developer to make the same guarantee that a safe point never occurs between an instruction that sets the floating point overflow flag and one that tests it. Please correct me if I'm wrong.

comment:16 in reply to:  15 Changed 9 years ago by simonmar

Ok, I'll add a cross-ref somewhere in the concurrency docs.

Replying to draconx:

About purity: the thing is, even if we had the perfect pure API for floating point, you'd _still_ be bitten by this issue! That's because the issue is not about purity at all: it's about the runtime clobbering CPU registers on context switches. Note that integer operations on a certain popular CPU architecture are just as "impure" as floating point

Are you claiming we have a problem with the overflow flag, or other CPU state?

No, I didn't mean to suggest that there was any problem with the handling of the integer overflow flag. I just wanted to draw a parallel to show that the issues are the same; that floating point is not somehow special in this regard.

I get the impression from your comments that you think GHC preempts threads at arbitrary points

From the perspective of the application developer, this is exactly what happens, since it's essentially impossible to know in advance when memory allocations will or will not occur. Furthermore, the wording in the docs suggests that it's not even safe to rely on this. Statements such as "threads are interleaved in a random fashion" and "GHC doesn't *currently* attempt [to preempt tight loops]" (emphasis mine) suggest that threads might be preempted for other reasons in the future.

That's not the point I was getting at. Your comments about saving registers suggested to me that you were under the impression that GHC could preempt a thread between two arbitrary instructions, when the implementation doesn't do that, so I was trying to clarify. Yes this is an implementation detail and irrelevant to the application programmer, but that's beside the point.

That's fine, but AFAIK there's no way for an application developer to make the same guarantee that a safe point never occurs between an instruction that sets the floating point overflow flag and one that tests it. Please correct me if I'm wrong.

You're absolutely right - the application programmer has no way to write code that interacts with the low-level details of the CPU such as the floating point overflow flag (or the integer overflow flag, or the carry flag etc.). This is regardless of safe points or threads, it's just not something you can do.

Or perhaps I'm not understanding what you're getting at here. A concrete example might help? Can you show me some code that you would expect to work, but GHC doesn't implement correctly?

comment:17 Changed 9 years ago by igloo

Resolution: fixed
Status: mergeclosed

Merged.

Note: See TracTickets for help on using tickets.