Opened 6 years ago

Last modified 5 years ago

#7977 new feature request

Optimization: Shift dropped list heads by coeffecient to prevent thunk generation

Reported by: schyler Owned by:
Priority: low Milestone:
Component: Compiler Version: 7.4.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Consider the following snippet(s) equivalent to ([a..b] !! n), the source of (!!) and the source of drop:

normal_list :: Int -> Int
normal_list n = head $ drop n [a..b]

shifted_list :: Int -> Int
shifted_list n = head $ drop (n-n) [(a+n)..b]
xs     !! n | n < 0 = undefined
[]     !! _         = undefined
(x:_)  !! 0         = x
(_:xs) !! n         = xs !! (n-1)
drop n xs     | n <= 0 =  xs
drop _ []              =  []
drop n (_:xs)          =  drop (n-1) xs

Notice the (_:xs) matching in these functions as a result of WHNF.

In the first case, normal_list, thunks are generated for x in ({-x-}_:xs) of the target list and overhead is seen in the pattern matching/guard of n in drop.

In the second case, shifted_list, this overhead can be completely removed by adding a coefficient such that the list starts at the programmatically defined lower bound, a, plus the known fact that the head is dropped n times.

Hence, given the example above, consider:

[x * x + 3 | x <- [1..]] !! n
-- versus
[x * x + 3 | x <- [(1+n)..]] !! (n-n)
-- which is optimized into
[x * x + 3 | x <- [(n+1)..]] !! 0
-- which is effectively
head [x * x + 3 | x <- [(n+1)..]]

The operation is turned from O(n) into O(1).

Consider benchmark proving GHC 7.4.2 does not make this optimization under -O2:

import Criterion.Main

normal_list :: Int -> Int
normal_list n = head $ drop n [1..]

shifted_list :: Int -> Int
shifted_list n = head $ drop (n-n) [(1+n)..]

main = defaultMain
    [ bench "normal_list 1000" $ whnf normal_list 1000
    , bench "shifted_list 1000" $ whnf shifted_list 1000
    ]
C:\Users\Kyle\Desktop>ghc -O2 listco.hs
[1 of 1] Compiling Main             ( listco.hs, listco.o )
Linking listco.exe ...

C:\Users\Kyle\Desktop>listco.exe
warming up
estimating clock resolution...
mean is 4.644044 us (160001 iterations)
found 319255 outliers among 159999 samples (199.5%)
  159256 (99.5%) low severe
  159999 (100.0%) high severe
estimating cost of a clock call...
mean is 310.3118 ns (34 iterations)

benchmarking normal_list 1000
Warning: Couldn't open /dev/urandom
Warning: using system clock for seed instead (quality will be lower)
mean: 7.352463 us, lb 7.058339 us, ub 7.646574 us, ci 0.950
std dev: 1.478087 us, lb 1.478066 us, ub 1.478200 us, ci 0.950
variance introduced by outliers: 94.651%
variance is severely inflated by outliers

benchmarking shifted_list 1000
mean: 46.42819 ns, lb 45.44244 ns, ub 47.21689 ns, ci 0.950
std dev: 4.495757 ns, lb 4.035428 ns, ub 4.832396 ns, ci 0.950
variance introduced by outliers: 77.960%
variance is severely inflated by outliers

Change History (4)

comment:1 Changed 6 years ago by simonpj

difficulty: Unknown

I believe that you are proposing adding an optimising transformation to GHC, but I don't know exactly what is the general transformation you propose. Certainly one might optimise some special cases, but are they common? Are they easy to spot in Core?

comment:2 Changed 6 years ago by schyler

For any list produced with ranges or comprehensions using ranges, it's possible to determine the number of heads dropped. As such, it is thereby possible to perform a transformation on this list such that the operation is performed in constant time (as opposed to the current linear time).

The following Prelude functions (all containing _:xs match cases) would benefit:

  • tail
  • last
  • length
  • drop
  • (!!)


The work required in heavy high-level analysis of _:xs may outweigh just writing specific rules for these transformations. One such high-level transformation is outlined in the list comprehension example in the ticket.

comment:3 Changed 6 years ago by schyler

difficulty: UnknownDifficult (2-5 days)
Milestone: _|_
Priority: normallow

comment:4 Changed 5 years ago by thomie

Type of failure: None/UnknownRuntime performance bug

Not technically a "Runtime performance bug", but easier to find this way.

Note: See TracTickets for help on using tickets.