Ticket #20 (closed defect: wontfix)

Opened 3 years ago

Last modified 3 years ago

Optimization strategy unclear (-Odph / -O2)

Reported by: choener Owned by:
Priority: major Milestone: 0.7
Version: 0.6 Keywords: documentation, optimization
Cc: alexey.skladnoy@…

Description

The vector tutorial states that -Odph should be used. On ghc 6.12.1 this leads to abysmal performance with some functions. Simple program attached. n=20000 has 60s (dph) vs. 5s (O2).

"g n" is for comparison only and requires 12s (O2) if you are interested. -fno-method-sharing is required for good vector performance (otherwise: 17s).

{-# OPTIONS_GHC -fno-method-sharing #-}
module Main where

import qualified Data.Vector.Unboxed as V
import System.Environment (getArgs)

f :: Int -> Int
f n = V.sum $ V.concatMap (\k -> V.enumFromN 1 k) $ V.enumFromN 1 n

g :: Int -> Int
g n = sum $ concatMap (\k -> enumFromTo 1 k) $ enumFromTo 1 n

main = do
  (a:_) <- getArgs
  let n = read a :: Int
  print $ f n

Change History

  Changed 3 years ago by rl

In general, I'd recommend -Odph for GHC 6.12 and -O2 -fno-method-sharing for 6.13. Unfortunately, concatMap is rather badly implemented in the 0.6.0 branch of vector which might cause this effect. The current version in darcs seems to perform about the same with -O2 and -Odph.

Unfortunately, GHC 6.12 is still a bit shaky when it comes to the kind of optimisations that vector needs so it might well be the case that -Odph is worse than -O2 in some cases. I would be very interested in other examples!

  Changed 3 years ago by rl

See also #16.

follow-up: ↓ 5   Changed 3 years ago by Khudyakov

  • cc alexey.skladnoy@… added

From experience -Odph results in poor perfomance and failsafe choice is -O2.

Here is benchmark results for functions from ticket #18. GHC 6.12 was used. G column is for functions which use foldl from Data.Vector.Generic. G' use strict foldl. Us are the same but use foldls from Data.Vector.Unboxed.

All values are in ms.

Comp. flag  |   G      G'   |   U    U'   |
------------|---------------|-------------|
            | 190     44    | 190   46    |
-O2         |   1.08   1.08 |  47.7  6.53 |
-O2 -fn.m.s |   1.09   1.08 | 178   41.1  |
-Odph       | 126     35    | 177   41.0  |

Perfomance of code compiled with -Odph is comparable with unoptimized code. And -fno-method-sharing flag only makes thing worse instead of improving them as was suggested in #18.

  Changed 3 years ago by choener

So, a set of small but useful functions that are to be tested. That should be possible. It is basically what I want to have for my libraries, too. And they use vector heavily.

(The above is useful, in a slightly different setting, I use it to generate all sets of (k,l) pairs with 0<k+l<=30).

in reply to: ↑ 3   Changed 3 years ago by rl

I forgot to mention that in the program from #18, you have to remove the INLINE pragmas to get valid results. This is because the functions aren't getting inlined anyway (GHC quite understandably sees no reason to do so) but since they are marked as INLINE, they aren't getting optimised, either. However, GHC specialises the function that is based on Data.Vector.Generic. The specialisation doesn't have an INLINE pragma and is getting optimised. The final result is that you are benchmarking a completely unoptimised Unboxed version vs. a fully specialised and optimised Generic version.

GHC's behaviour is clearly quite confusing here but it won't change in 6.12. It is all fixed in 6.13 which has a completely different inlining mechanism and optimises INLINE functions.

  Changed 3 years ago by rl

  • priority changed from minor to major

  Changed 3 years ago by Khudyakov

Indeed. Here are results for functions without INLINE pragma. Difference between generic and unboxed versions are gone.

Comp. flag  |   G      G'   |   U     U'   |
------------|---------------|--------------|
            | 223     44    | 191    45    |
-O2         |   1.09   1.08 |   1.09  1.08 |
-O2 -fn.m.s |   1.09   1.08 |   1.09  1.08 |
-Odph       | 126     25.6  | 125    26.7  |

As far I understand function without INLINE pragma won't get fused if imported from another module.

  Changed 3 years ago by rl

Gosh, -Odph is really slow here. That can be fixed by manually eta-expanding the functions. Change

mean = fini . U.foldl go (T 0 0)

to

mean xs = fini (U.foldl go (T 0 0) xs)

This is because -Odph turns on -finline-if-enough-args which prevents functions from being inlined if they aren't applied to enough arguments. This, in turn, leads (.) to not being inlined in this case which is rather bad. I'm not sure why GHC doesn't eta-expand automatically here. As usual, this works fine in 6.13.

You are quite right that no fusion will happen if the functions don't get inlined and that it is thus desirable to give those functions an INLINE pragma. With 6.12, you then have to make sure that GHC really does inline them everywhere they are used. This is a bit hard to achieve sometimes, as evidenced by this discussion. Unfortunately, there is nothing the library can do about it - it's a compiler problem.

In this particular example, there is a way to keep the functions as they are in your original program, including INLINE pragmas, by simply changing

main = defaultMain [ bench "mean 1e5" $ nf mean  vec1e5
                   , bench "mean'1e5" $ nf mean' vec1e5
                   ]

to

main = defaultMain [ bench "mean 1e5" $ nf (\xs -> mean  xs) vec1e5
                   , bench "mean'1e5" $ nf (\xs -> mean' xs) vec1e5
                   ]

Now both functions do get inlined and everything works as expected, both with -O2 and with -Odph. I should have thought of this earlier.

  Changed 3 years ago by rl

  • status changed from new to closed
  • resolution set to wontfix

I'm closing this ticket since 7.0 is about to come out. There, -O2 should always be sufficient.

Note: See TracTickets for help on using tickets.