Opened 5 years ago

Closed 4 years ago

#9304 closed bug (duplicate)

Floating point woes; Different behavior on 32- vs 64-bit x86

Reported by: lerkok Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.3
Keywords: floating point Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #9276 Differential Rev(s):
Wiki Page:

Description

I've the following snippet:

x, y, r  :: Double
x = -4.4
y = 2.4999999999999956
r = x * y

Using GHC 7.8.3, on a Mac, I get the following response from ghci:

*Main> decodeFloat r
(-6192449487634421,-49)

Using GHC 7.8.3, on a Linux machine, I get the following response:

*Main> decodeFloat r
(-6192449487634422,-49)

Note that off-by-one difference in the first component of the output.

I'm not 100% sure as to which one is actually correct; but the point is that these are IEEE floating point numbers running on the same architecture (Intel X86), and thus should decode in precisely the same way.

While I observed this with 7.8.3; I don't think this is a new regression; I suspect it'll show up in older versions as well.

Also, for full disclosure: I ran the Mac version naively; but the Linux version on top of a VirtualBox image. I'd very much doubt that would make a difference, but there might be 32/64-bit concerns. So, if someone can validate the Linux output on a true 64-bit Linux machine, that would really help track down the issue further.

Change History (31)

comment:1 Changed 5 years ago by schyler

Blocked By: 9276 added

comment:2 Changed 5 years ago by carter

could you try building the 32bit linux code with -sse2? otherwise on 32bit system I BELIEVE the x87 (80bit) floats will be used.

It could be simply that you're using the 80bit floats! (and that will result in these differences in the least significant bit)

Last edited 5 years ago by carter (previous) (diff)

comment:3 Changed 5 years ago by lerkok

Carter: I compiled with the "-msse2" flag; and I still see the same response on linux; it still differs from the one I see on the Mac.

If someone can try on a true 64-bit linux machine, that'd be really good.

comment:4 Changed 5 years ago by carter

I ran it on a 64 bit linux server i have using ghci

Prelude> :set -XScopedTypeVariables
Prelude> let x :: Double = -4.4
Prelude> let y :: Double = 2.4999999999999956
Prelude> decodeFloat (x*y)
(-6192449487634421,-49)

so if anything, it looks like its 32bit vs 64bit

could you try running the above snippet in GHCi on your 32bit machine?

comment:5 Changed 5 years ago by lerkok

So, it appears that the one ending with 21 is the likely correct result; as opposed to 22.

Is this an issue with some underlying library (glibc etc.); or an issue with GHC itself?

comment:6 Changed 5 years ago by carter

its the last bit, it could just be different rounding! lets not assume anything is wrong, it could be both are compliant

what is your linux distro and hardware and cpu? could you give me more information?

comment:7 Changed 5 years ago by hvr

Fyi, decodeFloat is based on somewhat hacky C implementation inside integer-gmp: source:ghc/libraries/integer-gmp/cbits/float.c#L191

I wouldn't be surprised if the C implementation was slightly unsound...

comment:8 Changed 5 years ago by ekmett

I'm not 100% sure as to which one is actually correct; but the point is that these are IEEE floating point numbers running on the same architecture (Intel X86), and thus should decode in precisely the same way.

I personally wouldn't expect that at all.

I'd expect it all comes down to whether the optimizer decided to keep the intermediate result in a larger temporary because it ran it through the old x87 hardware or if it decided to go through SSE, etc. This will happen.

It will give different answers, even on the same machine and OS at different call sites, when the optimizer spits out different code for different inlinings, etc.

Floating point answers are very fragile. A difference of only one ulp is actually pretty good. ;)

There is a relevant note in: http://www.haskell.org/ghc/docs/latest/html/users_guide/bugs.html#bugs-ghc

On 32-bit x86 platforms when using the native code generator, the -fexcess-precision option is always on. This means that floating-point calculations are non-deterministic, because depending on how the program is compiled (optimisation settings, for example), certain calculations might be done at 80-bit precision instead of the intended 32-bit or 64-bit precision. Floating-point results may differ when optimisation is turned on. In the worst case, referential transparency is violated, because for example let x = E1 in E2 can evaluate to a different value than E2[E1/x].

One workaround is to use the -msse2 option (see Section 4.16, “Platform-specific Flags”, which generates code to use the SSE2 instruction set instead of the x87 instruction set. SSE2 code uses the correct precision for all floating-point operations, and so gives deterministic results. However, note that this only works with processors that support SSE2 (Intel Pentium 4 or AMD Athlon 64 and later), which is why the option is not enabled by default. The libraries that come with GHC are probably built without this option, unless you built GHC yourself.

Also note I'd suspect a profound lack of interaction of that flag with ghci's bytecode interpreter. The -msse2 thing is probably doing you no good in the REPL. Also as noted above, the libraries that ship with GHC aren't build that way, so if base is doing your multiplication for you, you're probably just smacking into generated code that had -fexcess-precision turned on.

comment:9 Changed 5 years ago by lerkok

@ekmett: I do agree that expecting consistency is a hard sell; but I think this particular case is a real bug on 32-bit Linux.

@carter: I think you are spot on that the Linux 32-bit version is doing something funky, and producing garbage as a result.

I've used an infinite precision calculator to compute the result of the multiplication. The real answer is -10.9999999999999806. It turns out that this gets rounded to -10.99999999999998 when run on Mac/Linux-64 bit; which I have not double-checked, but I am willing to think is the correct rounding with the default rounding mode of round-nearest-even. At least it looks reasonable without looking at individual bits of the lower order, and I've also some indirect evidence to this, as the failing test case came from an interaction with the Z3 SMT solver, which provided a model for a problem that turned out to be false in the 32-bit Linux realm. (The test passes on 64-bit Mac.)

However, on Linux-32 bit, I get the following result: 10.999999999999982 for the multiplication.

That clearly is incorrectly rounded, even if the intermediate result is taken to be infinitely precise. Thus, this leads me to think that the 32-bit linux implementation is buggy somehow.

I think Carter's earlier experiment suggests 64-bit works just file (both Mac and Linux), and we have evidence that the result in 32-bit Linux is buggy. I have no way of getting to a 32-bit Mac with a modern GHC at least; but if someone can replicate it there, it would provide more evidence if this is a true 32-bit issue; or if Linux plays a role too.

This may not be a huge deal, as the 32-bit machines are becoming more and more obsolete, but we should not cop-out behind floating-point inconsistencies by saying "whatever happens happens." I think Haskell has to play a leading role here and 'Double' should really mean IEEE754 64-bit double-precision value regardless of the architecture; but that's whole another can of worms, obviously.

comment:10 Changed 5 years ago by carter

@lerkok, why is it "clearly incorrectly rounded"? I'm having trouble believing that claim. It could simply be that the way your input literals are encoded with more bits of precision in the x87 case.

could you please try doing the original program as a self contained program like

module Main where


main = do 
   x <- return ((-4.4)::Double)
   y <- return ((2.4999999999999956):: Double)
   putStrLn $ show $ decodeFloat (x*y)

compile with

ghc main.hs -O1 -msse2 -fforce-recomp ; ./main ; ghc main.hs -O1 -fexcess-precision -fforce-recomp ; ./main

and tell us if theres a difference in the output in those two cases?

Last edited 5 years ago by carter (previous) (diff)

comment:11 Changed 5 years ago by lerkok

@carter: Good point; I haven't considered the implications of encoding of the input literals. Nonetheless, the experiment you suggested prints precisely the same results:

[ubuntu-VirtualBox]~/precision>ghc main.hs -o main -msse2 -fforce-recomp ; ./main ; ghc main.hs -o main -fexcess-precision -fforce-recomp; ./main
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
(-6192449487634422,-49)
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
(-6192449487634422,-49)

which still differ from the behavior on 64-bit platforms. So, I'm still inclined to think something is off here.

comment:12 Changed 5 years ago by lerkok

Another point: The calls for decodeFloat x and decodeFloat y where x and y are the multiplicand and the multiplier in question return exactly the same results on both 32-bit linux and 64-bit mac; which suggest the encoding of the inputs is not to blame here.

comment:13 Changed 5 years ago by carter

ok, lets try again a different way!

{-# LANGUAGE MagicHash #-}
module Main where
import GHC.Types
import GHC.Prim 

main = do 
   x <- return ((-4.4)::Double)
   y <- return ((2.4999999999999956):: Double)
   putStrLn $ show $ decodeFloat (myTimes x y)

{-# NOINLINE myTimes#-}
myTimes (D# a) (D# b) =  D# res 
   where 
       res = a  *## b

If the results are the same, then the culprit is decodeFloat, not the floating point arithmetic

comment:14 in reply to:  11 Changed 5 years ago by carter

woops, you did -o rather than -O ! could you do the version thats in the edit of my comment please :) (just in case)

Replying to lerkok:

@carter: Good point; I haven't considered the implications of encoding of the input literals. Nonetheless, the experiment you suggested prints precisely the same results:

[ubuntu-VirtualBox]~/precision>ghc main.hs -o main -msse2 -fforce-recomp ; ./main ; ghc main.hs -o main -fexcess-precision -fforce-recomp; ./main

comment:15 Changed 5 years ago by lerkok

@carter: Ah, that's better! With "-O", I get the correct result. (i.e., the one that matches the Mac.)

In fact, no other flags needed. With "-O" I get the correct behavior; regardless of -msse2/-fexcess-precision, and without "-O" I get the incorrect one.

So, when compiled with optimizations it's correct, but not without? I'm not sure what to make of this.

comment:16 Changed 5 years ago by carter

did you include the force recomp flag?!

comment:17 Changed 5 years ago by lerkok

yes.. (just double-checked now to make double-sure!)

optimization considered harmful?

comment:18 Changed 5 years ago by carter

ok good :)

could you test the MagicHash version i linked to also please? :)

comment:19 Changed 5 years ago by lerkok

Result for magic hash:

no flags: the 22 result -msse2: The 21 result -fexcess-precision: The 22 result

presence of -O doesn't seem to change anything..

Last edited 5 years ago by lerkok (previous) (diff)

comment:20 Changed 5 years ago by carter

could you paste the command AND the results? Its hard to know precisely which you mean

eg

$ shell command

$results 

comment:21 Changed 5 years ago by lerkok

Sure.. Here you go:

[ubuntu-VirtualBox]~/precision/magichash>cat main.hs                                       
{-# LANGUAGE MagicHash #-}
module Main where
import GHC.Types
import GHC.Prim

main = do
   x <- return ((-4.4)::Double)
   y <- return ((2.4999999999999956):: Double)
   putStrLn $ show $ decodeFloat (myTimes x y)

{-# NOINLINE myTimes #-}
myTimes (D# a) (D# b) =  D# res
   where 
       res = a  *## b
[ubuntu-VirtualBox]~/precision/magichash>rm main.hi; rm main.o; ghc main.hs -o main; ./main
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
(-6192449487634422,-49)
[ubuntu-VirtualBox]~/precision/magichash>rm main.hi; rm main.o; ghc main.hs -msse2 -o main; ./main
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
(-6192449487634421,-49)
[ubuntu-VirtualBox]~/precision/magichash>rm main.hi; rm main.o; ghc main.hs -fexcess-precision -o main; ./main
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
(-6192449487634422,-49)

comment:22 Changed 5 years ago by lerkok

And here're the results for the original program:

[ubuntu-VirtualBox]~/precision/regular>cat main.hs
x, y, r :: Double
x = -4.4
y = 2.4999999999999956
r = x * y

main :: IO ()
main = print $ decodeFloat r
[ubuntu-VirtualBox]~/precision/regular>rm main.hi; rm main.o; ghc main.hs -o main; ./main
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
(-6192449487634422,-49)
[ubuntu-VirtualBox]~/precision/regular>rm main.hi; rm main.o; ghc main.hs -msse2 -o main; ./main
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
(-6192449487634422,-49)
[ubuntu-VirtualBox]~/precision/regular>rm main.hi; rm main.o; ghc main.hs -fexcess-precision -o main; ./main
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
(-6192449487634422,-49)

comment:23 Changed 5 years ago by carter

Could you please run the exact command I asked about earlier, exactly please :-)

I've been traveling all day, I'll try to spin up a 32bit vm later

comment:24 Changed 5 years ago by lerkok

@carter: I lost track of what to run on what program.. it might be easiest if you could spin up a 32bit VM; nothing really urgent on this anyhow. If you can't however, I'm happy to give it a go.. maybe we should find some other way of resolving this outside the trac so people don't get spammed on this any further.

comment:25 Changed 5 years ago by rwbarton

Priority: highnormal
Summary: Floating point woes; Different behavior on Mac vs LinuxFloating point woes; Different behavior on 32- vs 64-bit x86

The optimizer is constant-folding the multiplication in Carter's first test program, but not the second (thanks to NOINLINE). The one mystery to me is why this constant folding is apparently performed using 64-bit floating point while multiplication in GHCi is apparently performed using 80-bit floating point. Edward's comments explain the rest of the observed behavior. You can see the same behavior in C:

#include <stdio.h>

int main(void)
{
  double x, y;
  x = -4.4;
  y = 2.4999999999999956;
  printf("%.30lf\n", x*y);
  return 0;
}
rwbarton@morphism:/tmp/fp$ gcc -m32 mul.c -o mul
rwbarton@morphism:/tmp/fp$ ./mul
-10.999999999999982236431605997495
rwbarton@morphism:/tmp/fp$ gcc -m64 mul.c -o mul
rwbarton@morphism:/tmp/fp$ ./mul
-10.999999999999980460074766597245

The Haskell Report makes no guarantee that Double is even IEEE floating-point arithmetic of any kind, and GHC's behavior is documented: https://www.haskell.org/ghc/docs/7.6.1/html/users_guide/bugs.html#bugs-ghc.

comment:26 Changed 5 years ago by simonpj

Is this really a bug at all? As Reid says, there are no guarantees about the precision of Double. A difference in the least significant bit of a Double on 64 bit vs 32 bit seems a remarkably small divergence to me.

However I don't understand Reid's comment that "the optimiser is constant-folding the multiplication". Which optimiser? The user-guide link just says that some calculations are done in 80-bit without going back to 64-bit at each intermediate.

Simon

comment:27 Changed 5 years ago by lerkok

@simonpj: I'm not sure if it's a real bug or not. There's an observed difference in behavior on 32-bit where compiling with "-O" produces a different result compared to when compiled without "-O"; and that's at least mildly disconcerting.

@carter has all the context, and I think his expertise will come in handy when he gets around to spinning up a 32-bit VM and looks at it in some more detail. If he concludes this is par-for-the-course; then we can remove the ticket.

I might be in the minority, but I do not think the difference in the ulp for Double on 64-bit vs 32-bit a remarkably small divergence.. I wish Double meant precisely the same thing, regardless of architecture; i.e., the 64-bit IEEE double-precision number. But I also understand that this is a minor enough concern for most Haskell users to trickle up in the chain of importance.

Last edited 5 years ago by lerkok (previous) (diff)

comment:28 in reply to:  26 Changed 5 years ago by rwbarton

Replying to simonpj:

However I don't understand Reid's comment that "the optimiser is constant-folding the multiplication". Which optimiser?

I mean the rule

primOpRules nm DoubleMulOp   = mkPrimOpRule nm 2 [ binaryLit (doubleOp2 (*)), ... ]

which we can see firing by compiling with -ddump-rule-firings:

...
Rule fired: *##
...

The constant folder represents literal Double values with Rational, but (unless excess precision is enabled) truncates the result of each operation by converting to Double and back, so it should match the answer obtained by using 64-bit Double throughout, as it does in this case.

comment:29 Changed 5 years ago by carter

@lerkok I think @rwbarton has the problem nailed.

If I'm understanding everything correctly, the culprit here is that GHC's const evaluation tooling doesn't respect the implicit floating point model induced by -msse2 vs -f-excess-precision ?

comment:30 Changed 5 years ago by lerkok

@carter: Yes; I think @rwbarton is quite right in his analysis!

@rwbarton: Thanks for nailing this one!

I'd consider this a bug; but I'm not sure how important it is to reserve cycles. @simonpj can definitely make that call. Maybe it can be merged with @carter's ticket #9276, and addressed within that context.

comment:31 in reply to:  30 Changed 4 years ago by thomie

Blocked By: 9276 removed
Resolution: duplicate
Status: newclosed

Replying to lerkok:

Maybe it can be merged with @carter's ticket #9276, and addressed within that context.

Ok. #9276 contains a link back to this ticket.

Note: See TracTickets for help on using tickets.