Opened 9 years ago

Last modified 4 years ago

#4096 new feature request

New primops for indexing: index*OffAddrUsing# etc

Reported by: simonpj Owned by:
Priority: low Milestone:
Component: Compiler Version: 6.12.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Here's an email from Roman, and response from Simon. Making a ticket so it's kept as an idea.

In package vector, primitive vectors (the ones that Data.Vector.Unboxed is built on top of) are represented as follows (ByteArray and friends are wrappers for various GHC primitives provided by package primitive):

data Vector a = Vector Int               -- offset into the ByteArray
                       Int               -- length
                       ByteArray         -- data

This representation supports cheap slicing which is quite crucial. However, indexing into such vectors is a bit more expensive than necessary:

index (Vector i _ arr) j = indexByteArray arr (i+j)

Ultimately, this requires 2 additions to get the element's address:

  <base address off the ByteArray> + ((i + j) * <size of element>)

I'd like to always allocate pinned ByteArrays and store the starting address of the vector instead of the offset:

data Vector a = Vector Addr
                       Int
                       ByteArray

This would make indexing cheaper as it would require only one addition:

index (Vector addr i _) = indexOffAddr addr i

This is quite a big deal if indexing happens in an inner loop (some algorithms become up to 20% faster). Of course, the backend could optimise the offset-based version by performing partial redundancy elimination but it doesn't and it probably wouldn't get all interesting cases even if it did. So the second version is better.

The problem is that I can't implement it because I must touch the ByteArray after accessing the memory. This results in code like this which hides the constructor, breaking various optimisations:

  case indexIntOffAddr# addr# i# of { n# ->
  case touch# arr# realWorld# of { _ -> I# n# }}

After thinking about this for a while, I came up with two possible solutions. One is to provide a "pure" version of touch#:

  use# :: o -> o' -> o'

such that use# x y = y. This would change the code above to:

  I# (use# arr# (indexIntOffAddr# addr# i#))

I don't know how to implement this, though, because use# would have to be able to return arbitrary (unboxed) types and the code generator doesn't really seem to support this.

A perhaps simpler solution is to add a new set of primitives:

  indexIntOffAddrUsing# :: o -> Addr# -> Int# -> Int#
  ...

These would take an additional argument which they'd touch and otherwise ignore. The code would then become:

  I# (indexIntOffAddrUsing# arr# addr# i#)

Incidentally, the index*OffAddr# primitives don't seem to be used anywhere. [Not true: there are a handful of uses of index*OffAddr# in libraries/base, mainly GHC.Base.]

Simon replies: indexIntOffAddrUsing# seems like the way to go, I can't think of a better alternative.

Attachments (1)

patch.gz (131.1 KB) - added by rl 9 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 9 years ago by igloo

Milestone: 6.16.1

Changed 9 years ago by rl

Attachment: patch.gz added

comment:2 Changed 9 years ago by rl

I implemented this and it gives very nice speedups (up to 1.5) for the vector benchmarks. The patch is attached. It has one big problem, though. The new primops must be ok-for-speculation regardless of their first argument (the boxed term that is being touched). I hacked this by explicitly testing for these primops (all 16 of them) in exprOkForSpeculation but that's really just a hack. I'm not sure how to handle this properly. The situation is somewhat similar to division primops which are also hardwired in exprOkForSpeculation.

comment:3 Changed 9 years ago by simonmar

Here's a thought. One alternative you suggested was

use# :: o -> o' -> o'

which is problematic because the o' argument/result must have open kind, and you said the code generator doesn't seem to support this. I don't know if this really is problematic or whether it just looks that way, but even if it is, we could specialise use# to each unboxed type:

useInt# :: o -> Int# -> Int#
uneFloat# :: o -> Float# -> Float#
...

this seems slightly preferable to the indexBlahOffAddrUsing# family, because there are fewer new primops, they're more generic, and they're trivial to implement. Can you see any problems with this?

comment:4 Changed 8 years ago by igloo

Milestone: 7.4.17.6.1
Priority: normallow

comment:5 Changed 7 years ago by igloo

Milestone: 7.6.17.6.2

comment:6 Changed 5 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:7 Changed 5 years ago by thomie

difficulty: Unknown
Type of failure: None/UnknownRuntime performance bug

comment:8 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:9 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:10 Changed 4 years ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.