Opened 6 years ago

Last modified 5 years ago

## #7977 new feature request

# Optimization: Shift dropped list heads by coeffecient to prevent thunk generation

Reported by: | schyler | Owned by: | |
---|---|---|---|

Priority: | low | Milestone: | ⊥ |

Component: | Compiler | Version: | 7.4.2 |

Keywords: | Cc: | ||

Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |

Type of failure: | Runtime performance bug | Test Case: | |

Blocked By: | Blocking: | ||

Related Tickets: | Differential Rev(s): | ||

Wiki Page: |

### Description

Consider the following snippet(s) equivalent to ([a..b] !! n), the source of (!!) and the source of drop:

normal_list :: Int -> Int normal_list n = head $ drop n [a..b] shifted_list :: Int -> Int shifted_list n = head $ drop (n-n) [(a+n)..b]

xs !! n | n < 0 = undefined [] !! _ = undefined (x:_) !! 0 = x (_:xs) !! n = xs !! (n-1)

drop n xs | n <= 0 = xs drop _ [] = [] drop n (_:xs) = drop (n-1) xs

Notice the (_:xs) matching in these functions as a result of WHNF.

In the first case, normal_list, thunks are generated for x in ({-x-}_:xs) of the target list and overhead is seen in the pattern matching/guard of n in drop.

In the second case, shifted_list, this overhead can be completely removed by adding a coefficient such that the list starts at the programmatically defined lower bound, a, plus the known fact that the head is dropped n times.

Hence, given the example above, consider:

[x * x + 3 | x <- [1..]] !! n -- versus [x * x + 3 | x <- [(1+n)..]] !! (n-n) -- which is optimized into [x * x + 3 | x <- [(n+1)..]] !! 0 -- which is effectively head [x * x + 3 | x <- [(n+1)..]]

The operation is turned from O(n) into O(1).

Consider benchmark proving GHC 7.4.2 does not make this optimization under -O2:

import Criterion.Main normal_list :: Int -> Int normal_list n = head $ drop n [1..] shifted_list :: Int -> Int shifted_list n = head $ drop (n-n) [(1+n)..] main = defaultMain [ bench "normal_list 1000" $ whnf normal_list 1000 , bench "shifted_list 1000" $ whnf shifted_list 1000 ]

C:\Users\Kyle\Desktop>ghc -O2 listco.hs [1 of 1] Compiling Main ( listco.hs, listco.o ) Linking listco.exe ... C:\Users\Kyle\Desktop>listco.exe warming up estimating clock resolution... mean is 4.644044 us (160001 iterations) found 319255 outliers among 159999 samples (199.5%) 159256 (99.5%) low severe 159999 (100.0%) high severe estimating cost of a clock call... mean is 310.3118 ns (34 iterations) benchmarking normal_list 1000 Warning: Couldn't open /dev/urandom Warning: using system clock for seed instead (quality will be lower) mean: 7.352463 us, lb 7.058339 us, ub 7.646574 us, ci 0.950 std dev: 1.478087 us, lb 1.478066 us, ub 1.478200 us, ci 0.950 variance introduced by outliers: 94.651% variance is severely inflated by outliers benchmarking shifted_list 1000 mean: 46.42819 ns, lb 45.44244 ns, ub 47.21689 ns, ci 0.950 std dev: 4.495757 ns, lb 4.035428 ns, ub 4.832396 ns, ci 0.950 variance introduced by outliers: 77.960% variance is severely inflated by outliers

### Change History (4)

### comment:1 Changed 6 years ago by

difficulty: | → Unknown |
---|

### comment:2 Changed 6 years ago by

For any list produced with ranges or comprehensions using ranges, it's possible to determine the number of heads dropped. As such, it is thereby possible to perform a transformation on this list such that the operation is performed in constant time (as opposed to the current linear time).

The following Prelude functions (all containing _:xs match cases) would benefit:

- tail

- last

- length

- drop

- (!!)

The work required in heavy high-level analysis of _:xs may outweigh just writing specific rules for these transformations. One such high-level transformation is outlined in the list comprehension example in the ticket.

### comment:3 Changed 6 years ago by

difficulty: | Unknown → Difficult (2-5 days) |
---|---|

Milestone: | → _|_ |

Priority: | normal → low |

### comment:4 Changed 5 years ago by

Type of failure: | None/Unknown → Runtime performance bug |
---|

Not technically a "Runtime performance bug", but easier to find this way.

**Note:**See TracTickets for help on using tickets.

I believe that you are proposing adding an optimising transformation to GHC, but I don't know exactly what is the general transformation you propose. Certainly one might optimise some special cases, but are they common? Are they easy to spot in Core?