id,summary,reporter,owner,description,type,status,priority,milestone,version,resolution,keywords,cc
18,folds for unboxed vectors are slow.,Khudyakov,rl,"I found that fold based functions which operate on unboxed vectors are much slower (~7-50 times) than functions which operate on generic vectors. Benchmark is below.

{{{
{-# LANGUAGE FlexibleContexts #-}
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as G
import Criterion.Main

data T = T {-# UNPACK #-}!Double {-# UNPACK #-}!Int

-- Slow function
mean :: U.Vector Double -> Double
mean = fini . U.foldl go (T 0 0)
  where
    fini (T a _) = a
    go (T m n) x = T m' n'
        where m' = m + (x - m) / fromIntegral n'
              n' = n + 1
{-# INLINE mean #-}

-- fast function
mean' :: (G.Vector v Double) => v Double -> Double
mean' = fini . G.foldl go (T 0 0)
  where
    fini (T a _) = a
    go (T m n) x = T m' n'
        where m' = m + (x - m) / fromIntegral n'
              n' = n + 1
{-# INLINE mean' #-}

vec1e5 :: U.Vector Double
vec1e5 = U.replicate (10^5) 0

main :: IO ()
main = defaultMain [ bench ""mean 1e5"" $ nf mean  vec1e5
                   , bench ""mean'1e5"" $ nf mean' vec1e5
                   ]
}}}

Here is profiling results. '''U''' column is for mean, '''G''' is for mean'.
{{{
       | U(ms) | G(ms) |
foldl  | 47.3  |  1.08 |
foldl' |  6.54 |  1.09 |
foldr  | 44.3  |  1.99 |
fodlr' |  7.44 |  1.02 |
}}}
It looks like that monomorphic variant isn't properly optimized. Results are consistent between GHC6.10.4 and GHC6.12.1.
",defect,closed,major,,0.6,wontfix,,
