#4019 closed bug (fixed)
deriving Ord can produce incorrect and inefficient instances
Reported by: | rl | Owned by: | |
---|---|---|---|
Priority: | low | Milestone: | |
Component: | Compiler | Version: | 6.13 |
Keywords: | deriving | Cc: | gracjanpolak@… |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Incorrect result at runtime | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description
This bug was spotted by Barak Pearlmutter in http://www.haskell.org/pipermail/haskell-cafe/2010-April/076762.html.
data T = T Double deriving( Eq, Ord ) *Main> T (0/0) > T (0/0) True *Main> (0/0) > (0/0) False
This happens because the derived Ord instance only defines compare and relies on default method definitions for everything else. Comparisons involving NaNs always return False, however, compare (arbitrarily) returns GT in this case.
Irrespective of this particular wart, this is what GHC ultimately produces for (<=):
T.$fOrdT_$c<= = \ (x_ahF :: T.T) (y_ahG :: T.T) -> case x_ahF of _ { T.T a1_afL -> case y_ahG of _ { T.T b1_afM -> case a1_afL of _ { GHC.Types.I# x#_ah1 -> case b1_afM of _ { GHC.Types.I# y#_ah5 -> case GHC.Prim.<# x#_ah1 y#_ah5 of _ { GHC.Bool.False -> GHC.Prim.==# x#_ah1 y#_ah5; GHC.Bool.True -> GHC.Bool.True } } } } }
Note that the definition uses two comparisons even though (<=) for Double uses just one: (<=##). In general, relying on default method definitions when deriving Ord can be inefficient because the individual comparison operators might very well be faster than compare for the wrapped types.
Attachments (1)
Change History (17)
comment:1 Changed 10 years ago by
comment:2 Changed 10 years ago by
It's easy enough for me to generate whatever code we want for Ord instances. Would someone like to write down what GHC should generate? There appear to be two issues:
- Efficiency: this may just be a question of generating explicit code rather than relying on default methods. That's fine with me: can someone please specify what would do the job? You can see what GHC does right now with
-ddump-deriv
- Correctness: if comparisons involving NaNs always False, then it is possible for a>b and b>a to both be true, even though a and b differ, which breaks one of the properties of a total order. I'm not sure what to do here. Does anyone else know?
comment:3 Changed 10 years ago by
You are quite right, these are two separate issues. I originally thought that they are symptoms of the same problem but this is obviously not the case.
With respect to efficiency, ideally automatically derived Ord methods would use appropriate comparison operators instead of compare for comparing the last field of a constructor. That is, deriving Ord should generate this:
(a,b) > (c,d) = case compare a c of GT -> True EQ -> b > d -- instead of compare b d == GT LT -> False
But perhaps this is too minor (although if would be interesting to find out if this has a measurable effect on nofib).
As to correctness, IEEE floating point comparisons simply aren't a total ordering. I don't know what to do here, either, but I think this is more of a language definition problem than a GHC issue. For the moment, I would perhaps simply say that comparisons involving NaNs produce unspecified results (i.e., it is an unchecked precondition of compare and friends that they aren't applied to a !NaN).
comment:4 Changed 10 years ago by
Milestone: | → 6.14.1 |
---|
This ticket seems to be mixing several issues. Based on the original description, I thought this was the problem you were reporting:
This derived Ord instance:
data T = T Double deriving( Eq, Ord )
looks like this (after Haskellifying the generated code):
instance Ord T where compare a b = cmp_eq a b where cmp_eq (T a1) (T b1) = compare a1 b1 }
so it only lifts a minimum number of methods. But it ought to lift all of them, rather than relying on the default methods being both correct and efficient:
instance Ord T where compare a b = cmp_eq a b where cmp_eq (T a1) (T b1) = compare a1 b1 } (<=) a b = cmp_eq a b where cmp_eq (T a1) (T b1) = (<=) a1 b1 } (<) a b = cmp_eq a b where cmp_eq (T a1) (T b1) = (<) a1 b1 } (>=) a b = cmp_eq a b where cmp_eq (T a1) (T b1) = (>=) a1 b1 } (>) a b = cmp_eq a b where cmp_eq (T a1) (T b1) = (>) a1 b1 } max a b = cmp_eq a b where cmp_eq (T a1) (T b1) = max a1 b1 } min a b = cmp_eq a b where cmp_eq (T a1) (T b1) = min a1 b1 }
which makes sense to me and I imagine is easy to fix.
I suggest we leave this ticket to deal with that issue, and open separate tickets for other issues.
Changed 10 years ago by
Attachment: | more-precise-eq-and-ord-derived-instances.dpatch added |
---|
comment:5 Changed 10 years ago by
Cc: | gracjanpolak@… added |
---|
I implemented what has been said in this thread i. e. "automatically derived Ord methods would use appropriate comparison operators instead of compare for comparing the last field of a constructor".
Before:
$ ghc.exe -ddump-deriv ../dumpderiv.hs -fforce-recomp ==================== Derived instances ==================== InstInfo: GHC.Classes.Eq M.T { GHC.Classes./= a_adD b_adE = GHC.Classes.not ((GHC.Classes.==) a_adD b_adE) GHC.Classes.== (M.T a1_adB) (M.T b1_adC) = ((a1_adB GHC.Classes.== b1_adC)) } InstInfo: GHC.Classes.Ord M.T { GHC.Classes.compare a_adF b_adG = cmp_eq_adH a_adF b_adG where cmp_eq_adH (M.T a1_adI) (M.T b1_adJ) = GHC.Classes.compare a1_adI b1_adJ }
now:
$ inplace/bin/ghc-stage1.exe -ddump-deriv ../dumpderiv.hs -fforce-recomp [1 of 1] Compiling M ( ..\dumpderiv.hs, ..\dumpderiv.o ) ==================== Derived instances ==================== InstInfo: GHC.Classes.Eq M.T { GHC.Classes./= (M.T a1_a7q) (M.T b1_a7r) = ((a1_a7q GHC.Classes./= b1_a7r)) GHC.Classes.== (M.T a1_a7o) (M.T b1_a7p) = ((a1_a7o GHC.Classes.== b1_a7p)) } InstInfo: GHC.Classes.Ord M.T { GHC.Classes.>= a_a7M b_a7N = cmp_eq_a7O a_a7M b_a7N where cmp_eq_a7O (M.T a1_a7P) (M.T b1_a7Q) = (GHC.Classes.>=) a1_a7P b1_a7Q GHC.Classes.> a_a7H b_a7I = cmp_eq_a7J a_a7H b_a7I where cmp_eq_a7J (M.T a1_a7K) (M.T b1_a7L) = (GHC.Classes.>) a1_a7K b1_a7L GHC.Classes.<= a_a7C b_a7D = cmp_eq_a7E a_a7C b_a7D where cmp_eq_a7E (M.T a1_a7F) (M.T b1_a7G) = (GHC.Classes.<=) a1_a7F b1_a7G GHC.Classes.< a_a7x b_a7y = cmp_eq_a7z a_a7x b_a7y where cmp_eq_a7z (M.T a1_a7A) (M.T b1_a7B) = (GHC.Classes.<) a1_a7A b1_a7B GHC.Classes.compare a_a7s b_a7t = cmp_eq_a7u a_a7s b_a7t where cmp_eq_a7u (M.T a1_a7v) (M.T b1_a7w) = GHC.Classes.compare a1_a7v b1_a7w }
Patch attached. As this is my first stab at GHC hacking, don't hesitate to comment on my code. I'll be happy to redo the patch if necessary.
comment:6 Changed 10 years ago by
Thank you very much for producing a patch. By a rare coincidence I spent some time on Friday night doing the job myself! Moreover, I completely re-engineered the general scheme for Ord
comparisons, as you'll see when you look at my patch, which should give a useful performance boost.
So, with apologies to you, I'll incorporate my patch, but with many thanks for your work. Please don't be discouraged. This doesn't happen often. There are quite a few tickets of a similar nature that we'd like help with, such as
- #3676: realToFrac conversions
- #3744: comparisons against minBound and maxBound are not optimised away
- #3065: quot is sub-optimal
- #2269: Word type to Double or Float conversions
- #2271: floor, ceiling, round :: Double -> Int are awesomely slow
- #1434: slow conversion Double to Int
Simon
comment:7 Changed 10 years ago by
It is OK, Simon! It was good programming exercise and learning from the best:)
I tried to get my head around #2269 some time ago, but without useful result. I'll look into other issues soon!
comment:8 Changed 9 years ago by
Milestone: | 7.0.1 → 7.0.2 |
---|
comment:9 Changed 9 years ago by
Milestone: | 7.0.2 → 7.2.1 |
---|
comment:10 Changed 8 years ago by
Milestone: | 7.2.1 → 7.4.1 |
---|
comment:11 Changed 8 years ago by
Milestone: | 7.4.1 → 7.6.1 |
---|---|
Priority: | normal → low |
comment:12 Changed 7 years ago by
Milestone: | 7.6.1 → 7.6.2 |
---|
comment:14 Changed 5 years ago by
difficulty: | → Unknown |
---|---|
Milestone: | 7.10.1 |
This was fixed in 2010. A regression test is missing.
commit f3c7ab8dbd5a46ef5a7aeeb398a6d4bc1482e606
Author: Simon Peyton Jones <> Date: Mon May 10 13:33:33 2010 +0000 Re-engineer the derived Ord instance generation code (fix Trac #4019) As well as fixing #4019, I rejigged the way that Ord instances are generated, which should make them faster in general. See the Note [Generating Ord instances]. I tried to measure the performance difference from this change, but the #4019 fix only removes one conditional branch per iteration, and I couldn't measure a consistent improvement. But still, tihs is better than before.
comment:15 Changed 4 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Let's skip that regression test on a bug that was fixed 5 years ago.
comment:16 Changed 2 years ago by
Keywords: | deriving added |
---|
Here is another comment of mine from that thread.
Let's look at pairs as an example. At the moment, (>) is implemented basically like this:
Of course, this means that (0/0,'a') > (0/0,'a'). So we could change the implementation:
But now we compare a to c twice which is very bad for, say, ([Int],Int). Clearly, we want to use the first definition but it leads to inconsistent results for Doubles. I don't see how to solve this while keeping IEEE semantics of silent NaNs.