#4344 closed proposal (invalid)
Better toRational for Float and Double
Reported by: | daniel.is.fischer | Owned by: | daniel.is.fischer |
---|---|---|---|
Priority: | normal | Milestone: | Not GHC |
Component: | Compiler | Version: | 6.12.3 |
Keywords: | toRational, performance | Cc: | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Runtime performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description
The implementation of toRational
in the Real
instances of Float
and Double
is less than ideal.
instance Real Float where toRational x = (m%1)*(b%1)^^n where (m,n) = decodeFloat x b = floatRadix x
The propsed implementation of powers for Rational
s (#4337) would (when (^^)
is included) alone yield a great boost, but here we can do even better.
I have benchmarked three versions of toRational
against the current implementation, first, an inlined version of the proposed power modification:
{-# SPECIALISE toRat :: Float -> Rational, Double -> Rational #-} toRat :: RealFloat a => a -> Rational toRat x = case decodeFloat x of (m,e) -> case floatRadix x of b -> if e < 0 then (m % (b^(negate e))) else (m * b^e) :% 1
If the exponent is nonnegative, we need not reduce (even though that reduction would be comparatively cheap since the denominator is 1, it's not free).
Next, in GHC.Float there is the condition that the floatRadix
be 2, hence we can eliminate the call to floatRadix
and inline. That allows to skip the reduction also in some cases where the exponent is negative:
{-# SPECIALISE toRat2 :: Float -> Rational, Double -> Rational #-} toRat2 :: RealFloat a => a -> Rational toRat2 x = case decodeFloat x of (m,e) | e < 0 -> if even m then m % (2 ^ (-e)) else m :% (2 ^ (-e)) | otherwise -> (m * 2^e) :% 1
Finally, powers of 2 can be more efficiently calculated via bit-shifting and the test for evenness is usually faster as a bit-test:
{-# SPECIALISE toRat3 :: Float -> Rational, Double -> Rational #-} toRat3 :: RealFloat a => a -> Rational toRat3 x = case decodeFloat x of (m,e) | e < 0 -> case 1 `shiftL` (-e) of !d -> if fromInteger m .&. (1 :: Int) == 0 then m % d else m :% d | otherwise -> (m `shiftL` e) :% 1
The results vary of course depending on the sample of numbers one converts, but the trend is clear:
Current: 100 ms - 123 ms Inlined: 32 ms - 40 ms Specialised: 25 ms - 31 ms Shifting: 15 ms - 23 ms
Of course, the value of that is limited, the real bottleneck in realToFrac
is fromRational
, which raises the times for the benchmarks by about 450 ms when added instead of a dummy conversion Rational -> Float
. And using realToFrac
for the conversion Double -> Float
, per the rewrite rule, the benchmarks are done in about 1 ms.
Attachments (7)
Change History (24)
comment:1 Changed 9 years ago by
comment:2 Changed 9 years ago by
Well, I would replace the body of toRational
in the Real
instances of Float
and Double
with more or less the body of toRat3
.
The bang on d
makes no difference with optimisations and little difference without, so that'll go. If testing also for e == 0
makes a difference at all, it's minuscule (less than 1%), so I'd leave that out.
I would like to test whether something can be gained by determining the number of trailing 0-bits of m
to avoid all gcd
s before I propose a specific patch.
Since GHC.Float is GHC-internal and maintainer is cvs-ghc and not libraries, should the proposal go through the process nevertheless or would I just attach the patch here and you validate and push directly?
comment:3 Changed 9 years ago by
Okay, how ugly are you willing to accept?
Counting zeros and shifting instead of calling out to gcd
in the case of negative exponent and even significand reduces the time needed for my benchmarking loops by a factor of a bit above 2.
The code passed over 1.5 million QuickCheck tests, so I'm pretty confident I've got it right.
So far, I've used GHC.Integer.GMP.Internals (for the ByteArray#), which makes it not portable (and it doesn't work for Int
s shorter than 32 bits).
But the significand is at most 53 bits, so I could use Int64 for the work and stay portable. I'll try whether that is competitive. It would make the code simpler, so it could even be faster.
comment:4 Changed 9 years ago by
Very nice. The simpler code is indeed faster. With
floatRat :: Float -> Rational floatRat x | x == 0 = 0 :% 1 | otherwise = case decodeFloat x of (m, e) | e >= 0 -> (m `shiftL` e) :% 1 | otherwise -> case fromInteger m :: Int of m' | m' .&. 1 == 0 -> shiftOutZeros m' (-e) | otherwise -> m :% (1 `shiftL` (-e)) doubRat :: Double -> Rational doubRat x | x == 0 = 0 :% 1 | otherwise = case decodeFloat x of (m,e) | e >= 0 -> (m `shiftL` e) :% 1 | otherwise -> shiftOutZeros (fromInteger m :: Int64) (-e) {-# SPECIALISE shiftOutZeros :: Int -> Int -> Rational, Int64 -> Int -> Rational #-} -- Precondition: first arg nonzero, second arg > 0 -- To be called only from 'toRational' shiftOutZeros :: (Integral a, Bits a) => a -> Int -> Rational shiftOutZeros m e | e <= t = fromIntegral (m `shiftR` e) :% 1 | t < 8 = fromIntegral (m `shiftR` t) :% (1 `shiftL` (e-t)) | otherwise = shiftOutZeros (m `shiftR` 8) (e-8) where !t = zeros `unsafeAt` (fromIntegral m .&. 255) -- Cache of number of trailing 0-bits zeros :: Array Int Int zeros = listArray (0,255) [ 8, 0, 1, 0, 2, 0, 1, 0 ...
I'm even faster than with indexIntArray#
(5-10%). With an unboxed array for the cache, it's a few percent faster again. Is any kind of unboxed array available in GHC.Float?
Note that on my 32-bit system, for Float
it's faster to test for evenness before going to shiftOutZeros
while for Double
it's not (not if I (.&.)
with an Int64
at least, must try whether (.&.)
-ing with an Int
is faster).
For 64-bit systems, Double
should use the exact same code as Float
, presumably.
comment:6 Changed 9 years ago by
not if I (.&.) with an Int64 at least, must try whether (.&.)-ing with an Int is faster
Indeed it is. Not much, about 3%, but hey.
comment:7 Changed 9 years ago by
Thanks very much for your work on this numeric stuff. It's a real help. Plese sing out when you think you've converged, make a patch (or write out the specific final version), attach to this ticket, and submit to the libraries list.
Simon
comment:8 Changed 9 years ago by
Owner: | set to daniel.is.fischer |
---|
I think I have it (both, toRational and fromRational), currently compiling. When tests and benchmarks pass, I'll submit. Previous benchmarks promise a good speedup.
comment:9 Changed 9 years ago by
Above is an archive containing new implementations for toRational and fromRational from/to Double and Float and QuickCheck tests and benchmarks.
To build the modules, the integer-gmp package is required, the patch will include the logarithm functions for integer-simple too.
The QuickCheck programme accepts (optional) parameters
- number of tests to perform for each property
- size parameter for the generator
- maximal number of tests to discard (irrelevant)
If you use QuickCheck-2.3.*, don't set the number of test cases too high, it eats memory.
The simple* benchmarks require only the core libs shipped with GHC, the test size can be set with "-bd num" on the command line.
The criterion* benchmarks require criterion, you can set the test size with "-bd num" as the first two command line args, all subsequent args are passed to criterion's defaultMain.
All benchmarks measure Float -> Double resp. Double -> Float conversion, I've forgotten to write tests for Rational -> Float/Double conversions for Rational's whose denominator is not a power of 2. I intend to add that soon.
Please test extensively on 64-bits to make sure I haven't made a glitch there.
comment:10 Changed 9 years ago by
Type: | bug → proposal |
---|
Proposal: include faster implementations for
- toRational :: Float -> Rational
- toRational :: Double -> Rational
- fromRational :: Rational -> Float
- fromRational :: Rational -> Double
The semantics of these functions shall remain the same as it is now, only their speed will be affected.
For fromRational, a fast integer logarithm is essential. Without access to the internal representation of Integer, it would be significantly slower, so this proposal includes the addition of modules defining the needed integer logarithm functions to the packages integer-gmp and integer-simple.
Since Int
is not available in the integer-* packages, those functions would return Int#
values. I suggest adding wrappers integerLog2 :: Integer -> Int and integerLogBase :: Integer -> Integer -> Int to some module in base because those functions are far more generally useful. Currently integerLogBase is available from GHC.Float, so we might stick them there, although it's not very intuitive. Suggestions welcome.
Discussion period: one week, until 29^{th} October 2010 (I'd like to see it in 7.0).
Changed 9 years ago by
Attachment: | criterionFromRational.hs added |
---|
Criterion Benchmark for fromRational
Changed 9 years ago by
Attachment: | simpleFromRational.hs added |
---|
Benchmark for fromRational without criterion
Changed 9 years ago by
Attachment: | FloatingRationalConversions.tar.gz added |
---|
Benchmarks and QC tests for new fromRational and toRational implementations
comment:12 Changed 9 years ago by
Included the fromRational benchmarks in the bundle and fixed a glitch in the simple benchmarks (by which double2Float and float2Double were always reported as taking zero time).
Changed 9 years ago by
Attachment: | logarithms-integer-gmp.dpatch added |
---|
Changed 9 years ago by
Attachment: | logarithms-integer-simple.dpatch added |
---|
Changed 9 years ago by
Attachment: | fix4344.dpatch added |
---|
comment:13 Changed 9 years ago by
Milestone: | → Not GHC |
---|
comment:14 Changed 9 years ago by
Cc: | gideon@… added |
---|
comment:15 Changed 9 years ago by
Whoops. Out of sight, out of mind. Let me add comments and see what I can do to get the logarithm stuff fast for integer-simple, then I'll open a new ticket and re-propose. There should be enough time for 7.0.2 without a rush.
comment:16 Changed 9 years ago by
Resolution: | → invalid |
---|---|
Status: | new → closed |
Proposal tickets are no longer needed as part of the library submissions process. Instead, a normal ticket should be created once consensus has been achieved. Please see the process description for details.
comment:17 Changed 9 years ago by
Cc: | gideon@… removed |
---|
Sounds great. Do you have a specific patch to propose?
Simon