Opened 9 years ago

Closed 9 years ago

Last modified 4 years ago

#4914 closed bug (fixed)

FPU initialization required again

Reported by: aruiz Owned by: simonmar
Priority: high Milestone: 7.2.1
Component: Compiler (NCG) Version: 7.0.1
Keywords: Cc: fryguybob@…
Operating System: Unknown/Multiple Architecture: x86
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I think that the bug 2724, fixed in ghc-6.10.x, has reappeared in ghc-7.0.1. When I run hmatrix tests I get the same kind of random errors, which disappear if I insert the asm finit call. I don't have a small test case without dependencies, but I hope that this problem can be easily reproduced as follows:

$ cabal update
$ cabal unpack hmatrix
$ cd hmatrix-0.11.0.0
$ cabal install [-f-vector] [-f-binary]

(these flags can be given to minimize dependencies)

$ cabal test

All tests must pass. Now we will try to produce the error:

Comment out line 172 of hmatrix.cabal to remove the workaround for ghc-7.0.1:

-       cpp-options: -DFINIT
+    -- cpp-options: -DFINIT
$ cabal clean
$ cabal install [-f-vector] [-f-binary]
$ cabal test

Now we will probably get several errors.

Please let me know if you need more information. I hope I have not made some big mistake.

Attachments (4)

hook.c (344 bytes) - added by aruiz 9 years ago.
bug.hs (216 bytes) - added by aruiz 9 years ago.
hmatrix-debug-0.11.0.4.tar.gz (113.5 KB) - added by aruiz 9 years ago.
hook.2.c (323 bytes) - added by aruiz 9 years ago.
corrected check for empty fpu stack

Download all attachments as: .zip

Change History (25)

comment:1 Changed 9 years ago by igloo

Milestone: 7.0.3

Thanks for the report. If you're able to make a standalone testcase, then that'll make it easier to reproduce, and also make it easy to add a testcase so that we can prevent future regressions.

comment:2 in reply to:  1 Changed 9 years ago by aruiz

This problem seems to be exposed by some unpredictable combination of foreign calls and external libraries, so it is difficult to make a standalone testcase. I can try to prepare it, if it is acceptable that the testcase compiles several modules and links -lgsl -llapack.

I'm sorry that it requires installation of packages, but could you reproduce the problem with the above method?

comment:3 Changed 9 years ago by simonmar

In the patch

    Implement SSE2 floating-point support in the x86 native code generator (#594)

I "optimised" the x87 backend to avoid some of the ffrees that seemed unnecessary. It's entirely possible that I got this wrong (but I just took a quick look and did't see any obvious problems).

What we ought to do is make a small C function that will fail if the x87 stack is non-empty, and have the RTS call it at shutdown time. Then running the testsuite should find a smaller failure case. I'm a little busy to do this right now, but if anyone else wants to have a go please go ahead.

comment:4 Changed 9 years ago by aruiz

I have done as suggested and I think that I have found a small test case that only requires QuickCheck.

I have defined OnExitHook() in file hook.c to check the FPU status. The test case is in bug.hs.

$ gcc -c hook.c
$ ghc --make bug.hs hook.o -fforce-recomp
[1 of 1] Compiling Main             ( bug.hs, bug.o )
Linking bug ...
$ ./bug 
True
140
+++ OK, passed 100 tests.
OnExitHook: NONEMPTY FPU Stack (7)

The stack seems to be ok after normal operations, but using functions in QuickCheck we get the problem. This can be observed if we comment out line 8 in bug.hs:

$ ghc --make bug.hs hook.o -fforce-recomp
[1 of 1] Compiling Main             ( bug.hs, bug.o )
Linking bug ...
$ ./bug 
True
140
OnExitHook: FPU STACK OK

I know very little assembler, so please verify that this makes sense. I have some confidence in the stack check used in the hook for the following reason: in the hmatrix tests which expose the problem I have inserted the check at the beginning of the foreign calls and only call finit if the stack is non-empty. Doing this all errors disappear.

Please let me know if you need more info.

I have used ghc-7.0.2-i386-unknown-linux.tar.bz2 and QuickCheck-2.4.0.1.

Changed 9 years ago by aruiz

Attachment: hook.c added

Changed 9 years ago by aruiz

Attachment: bug.hs added

comment:5 Changed 9 years ago by simonmar

Owner: set to simonmar
Priority: normalhigh

comment:6 Changed 9 years ago by fryguybob

Cc: fryguybob@… added

I poked around a little to see if I could make a smaller deterministic test case and came up with this:

main = print $ encodeFloat 2 6

Resulting in:

PS C:\hg\haskell\trac\4914> ghc --make bug.hs .\hook.o -fforce-recomp
[1 of 1] Compiling Main             ( bug.hs, bug.o )
Linking bug.exe ...
PS C:\hg\haskell\trac\4914> .\bug.exe
128.0
OnExitHook: NONEMPTY FPU Stack (7)

comment:7 in reply to:  6 Changed 9 years ago by aruiz

Good! For some reason in my computer this test case only shows the problem using Float:

main = print (encodeFloat 2 6 :: Float)

comment:8 Changed 9 years ago by simonmar

That's perfect, thanks guys. I'll look into it.

comment:9 Changed 9 years ago by simonmar

I took a quick look: GHC sometimes leaves the FPU stack pointer at a value other than 0, but the FPU stack is empty. I think this happens because we're using ffree rather than popping the stack sometimes. This shouldn't cause a problem, should it?

comment:10 Changed 9 years ago by aruiz

I have run the hmatrix tests checking the fpu state at the beginning of the foreign calls and the presence of NaN's in the results. In a large number of cases the stack pointer is not 0 and there is no problem.

But in a few cases in which the foreign call receives a nonzero stack pointer the foreign computation produces a wrong NaN. I have discovered that I must only call finit to prevent the error if the fpu is in one of the following three states:

14624 : 0011 1001 0010 0000
14629 :   =    =    =  0101
14631 :   =    =    =  0111

In this particular experiment any other state is harmless. I know very little about this, but from the info in this page it looks like the only common pattern is a top of stack = 7 and precision = 1, which is not problematic in many other cases.

I will try to find a small testcase producing a NaN.

comment:11 Changed 9 years ago by simonmar

Could you also check to see whether any of those calls have a non-empty FPU stack? Checking the stack pointer isn't enough. I think an fsave instruction should get you the tag word, which should be 0xffff if the stack is empty.

comment:12 Changed 9 years ago by aruiz

Thanks for your help! I am now checking the tag word and also if a NaN is produced (calling frstor after fsave to let the computation go on in the original state). In an extremely large number of cases the stack is empty and there is no problem. But in a few rare cases the stack is non empty and then the computation produces a wrong NaN!

Two examples of unexpected tag word:

tag : 255 31 = 11 11 11 11 00 01 11 11
tag : 252 63 = 11 11 11 00 00 11 11 11

This can be observed in the hmatrix quickCheck tests. I have also found a small deterministic test case not using quickCheck, but it still requires the hmatrix package. Curiously, when I try to simplify the problem defining locally some functions the problem disappears.

Note also that the problem disappears if cabal install with --disable-optimization.

I will simplify the testcase as much as possible.

comment:13 Changed 9 years ago by aruiz

It is very difficult to get a small test case without dependencies. (For instance, the problem disappears just when a more specialized signature of a function is added, or if I try to reduce dependencies just copying the required definitions). I think that this problem is very unlikely to happen in normal programs, but here is a test case using the attached debugging version of hmatrix:

$ sudo apt-get install libgsl0-dev libatlas-base-dev

$ ls
hmatrix-debug-0.11.0.4.tar.gz
$ tar -xzf hmatrix-debug-0.11.0.4.tar.gz 
$ cd hmatrix-debug-0.11.0.4/

$ cabal install -f-vector -f-binary -fdebugfpu -fdebugnan
Resolving dependencies...
[1 of 2] Compiling Config           ( Config.hs, dist/setup/Config.o )
[2 of 2] Compiling Main             ( Setup.lhs, dist/setup/Main.o )
Linking ./dist/setup/setup ...
Configuring hmatrix-debug-0.11.0.4...
Checking foreign libraries... OK gsl lapack 
Preprocessing library hmatrix-debug-0.11.0.4...
Building hmatrix-debug-0.11.0.4...
[ 1 of 36] Compiling Data.Packed.Internal.Signatures (...)
...
[36 of 36] Compiling Numeric.LinearAlgebra.Tests (...)
Registering hmatrix-debug-0.11.0.4...
Installing library in /home/brutus/.cabal/lib/hmatrix-debug-0.11.0.4/ghc-7.0.2
Registering hmatrix-debug-0.11.0.4...

The problem is exposed by a simple deterministic program:

$ ghc --make bug.hs  -fforce-recomp -o bug1
[1 of 1] Compiling Main             ( bug.hs, bug.o )
Linking bug1 ...

$ ./bug1 
Warning: Nonempty FPU Stack. TAG = ff 1f
NaN multR Output
 13 x 13: -nan 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 
Warning: Nonempty FPU Stack. TAG = ff df
Warning: Nonempty FPU Stack. TAG = ff df
False

If we ran cabal test we will get lots of errors.

All problems disappear if we disable optimization:

$ cabal clean
cleaning...
$ cabal install -f-vector -f-binary -fdebugfpu -fdebugnan --disable-optimization
Resolving dependencies...
...
Registering hmatrix-debug-0.11.0.4...

$ ghc --make bug.hs  -fforce-recomp -o bug2
[1 of 1] Compiling Main             ( bug.hs, bug.o )
Linking bug2 ...
$ ./bug2
True

Now all tests pass.

The flag -fdebugfpu checks for the fpu tag word at the beginning of the foreign functions and shows a message if the stack is nonempty, but the fpu is initialized so there will be no computation errors.

The flag -fdebugnan restores the fpu after fsave to let the computation go on with the possibly bad fpu stack and checks if a NaN is produced in the foreign calls for the matrix product.

The fpu checks are in asm_finit() in lib/Numeric/LinearAlgebra/LAPACK/lapack-aux.c and are called by the general wrapper infrastructure in Data.Packed.Internal.Common.hs. The function producing the bug is defined in lib/Numeric/LinearAlgebra/Tests.hs using lib/Numeric/LinearAlgebra/Tests/Properties.hs.

I hope this is useful. Any suggestion to simplify the test case will be welcome.

Changed 9 years ago by aruiz

Changed 9 years ago by aruiz

Attachment: hook.2.c added

corrected check for empty fpu stack

comment:14 Changed 9 years ago by igloo

I can't reproduce this, on Linux/x86, Linux/amd64, or OS X 64.

comment:15 Changed 9 years ago by simonmar

Ok, I'll look into this. Assuming that it's not easily reproducible, I'll add an assertion to the RTS to check the state of the FPU stack and run as many tests as I can find until it triggers.

comment:16 Changed 9 years ago by simonmar

I wasn't able to reproduce it with bug.hs either. However, I have instrumented the RTS with calls to check the FPU stack all over the place, and one of them randomly triggers in my stage2 compiler while compiling hmatrix. So there is definitely something wrong, but whether I can catch it happening is another matter.

comment:17 Changed 9 years ago by aruiz

It is strange, I can reproduce it in different x86 computers... Also, I am now running all my programs based on hmatrix with the stack check and it triggers frequently and reproducibly in some of them. But they are not useful testcases because they have too many dependencies.

I don't want you to waste more time in this, but when I run

$ cabal clean
$ cabal install -f-vector -f-binary -fdebugfpu -fdebugnan
$ cabal test

I get something like this. Don't you get any error?

They are probably not very useful, but the executables built from bug.hs with and without optimization are here: bug1 bug2.

comment:18 Changed 9 years ago by simonmar

I think I've found it, Thanks for all your help in tracking this one down, it was a very tricky one to find!

Fri Mar 25 16:12:34 GMT 2011  Simon Marlow <marlowsd@gmail.com>
  * Fix #4914 (I hope)
  
  Here's a bit of erroneous code:
  
  00000c5c <s1ad_info>:
       c5c:       8b 45 08                mov    0x8(%ebp),%eax
       c5f:       d9 46 03                flds   0x3(%esi)
       c62:       dd d9                   fstp   %st(1)
       c64:       d9 55 08                fsts   0x8(%ebp)
       c67:       89 c6                   mov    %eax,%esi
       c69:       c7 45 00 24 0c 00 00    movl   $0xc24,0x0(%ebp)
       c70:       f7 c6 03 00 00 00       test   $0x3,%esi
       c76:       75 ac                   jne    c24 <s1ac_info>
  
  So we should be doing some ffrees before the jne.  The code that
  inserts the ffrees wasn't expecting to do it for a conditional jump,
  because they are usually local, but we have a late optimisation that
  shortcuts jumps-to-jumps, and that can result in a non-local
  conditional jump.
  
  This at least fixes an instance of the bug that I was able to
  reproduce, let's hope there aren't any more.

I'll leave the bug open until you can confirm that the bug is fixed.

comment:19 Changed 9 years ago by aruiz

Excellent!!

I have tested the more recent snapshot ghc-7.1.20110325-i386-unknown-linux.tar.bz2 of 26-Mar-2011, 05:18, which I assume it includes the fix, and all problems have disappeared. I have run the hmatrix tests a few times and the other complex test cases without finding any single case of nonempty fpu stack or wrong nans.

I think the bug is fixed.

Thanks!

comment:20 Changed 9 years ago by igloo

Resolution: fixed
Status: newclosed

Thanks for testing!

comment:21 Changed 4 years ago by Herbert Valerio Riedel <hvr@…>

In 34eaf2b/ghc:

Fix two occurences of `x86_HOST_ARCH`

The proper name for the define is `i386_HOST_ARCH`

One was introduced back in 2011 via
035b8ebb5405efbcbfd3474821a877add1feca1e / #4914
and the other one more recently via
4905b83a2d448c65ccced385343d4e8124548a3b

We may want to add some validation to catch such typos early on...

Reviewed By: erikd

Differential Revision: https://phabricator.haskell.org/D1664
Note: See TracTickets for help on using tickets.