Opened 2 years ago

Last modified 19 months ago

#14196 new feature request

Replace ArrayArray# with either UnliftedArray# or Array#

Reported by: andrewthad Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.2.1
Keywords: LevityPolymorphism Cc: akio
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


Just to be clear, there currently is nothing named UnliftedArray#. I would like to propose that such a thing possibly be added, with the long-term goal of deprecating and removing ArrayArray#. The interface for it would look like this:

data UnliftedArray# (a :: TYPE 'UnliftedRep)
data MutableUnliftedArray# s (a :: TYPE 'UnliftedRep)

indexUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). UnliftedArray# a -> Int# -> a
writeUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). MutableUnliftedArray# s a -> Int# -> a -> State# s -> State# s
readUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). MutableUnliftedArray# s a -> Int# -> State# s -> (# State# s, a #)
unsafeFreezeUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). MutableUnliftedArray# s a -> State# s -> (#State# s, UnliftedArray# a#)
newUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). Int# -> a -> State# s -> (# State# s, MutableUnliftedArray# s a #)

You would also have a few other things like sameMutableUnliftedArray#, sizeofMutableArray#, unsafeThawUnliftedArray#, copyUnliftedArray#, etc. The main difficulty that I see in doing this is that I'm not sure if you can tell a primop that it takes a polymorphic argument whose kind is something other than TYPE LiftedRep. The bodies of all of the functions I listed above could simply be copied from the ArrayArray# functions.

There are a few alternatives I've heard discussed. One is to make Array# accepted both boxed or unboxed types. There is a brief discussion of this in the UnliftedDataTypes proposal (0). Indeed, it appears that some have managed to smuggle unlifted data into Array# already, with the caveats one would expect (1). This interface would look like:

data Array# (a :: TYPE k)
data MutableArray# s (a :: TYPE k)
indexArray# :: Array# a -> Int -> a -- same as before
indexUnliftedArray# :: forall (a :: TYPE UnliftedRep). Array# a -> Int -> a

So instead of having Array# and UnliftedArray# as separate data types, we could have Array# handle both cases, but we would need to make a duplicate of every function that operates on Array#. This follows all of the appropriate rules for when levity polymorphism is allowed, and it should be backwards-compatible with the previous non-levity-polymorphic definition of Array#. I'm not sure how many things in the GHC runtime assume that the values inside an array are lifted, so this might cause a bunch of problems. If it is possible, this approach would be kind of neat because then SmallArray# could be similarly adapted.

Anyway, I just wanted to get this out there. I would be happy to try implementing the first proposal (involving UnliftedArray#) at some point in the next six months. Any feedback on the idea would be appreciated. Also, any comments on how appropriate this is for someone new to contributing to GHC would be appreciated as well. Thanks.

Change History (9)

comment:1 Changed 2 years ago by simonpj

What is the problem you are trying to solve here? I'm lost.

comment:2 Changed 2 years ago by andrewthad

It looks like I forgot to provide the motivation for this. There are two problems with the current ArrayArray# interface: (1) Duplicated code and (2) lack of expressiveness, which pushes unsafeCoerce# onto end users. I'll go into both of these.

Several of the primops for ArrayArray# have two variants. For example:

indexByteArrayArray# :: ArrayArray# -> Int# -> ByteArray#
indexArrayArrayArray# :: ArrayArray# -> Int# -> ArrayArray#

We see the same thing for the read and write operations on MutableArrayArray# except that now we've got four variants:

readByteArrayArray# :: MutableArrayArray# s -> Int# -> State# s -> (#State# s, ByteArray##)
readMutableByteArrayArray# :: MutableArrayArray# s -> Int# -> State# s -> (#State# s, MutableByteArray# s#)
readArrayArrayArray# :: MutableArrayArray# s -> Int# -> State# s -> (#State# s, ArrayArray##)
readMutableArrayArrayArray# :: MutableArrayArray# s -> Int# -> State# s -> (#State# s, MutableArrayArray# s#)

Under the hood, all four of these have the exact same implementation as we can see in

emitPrimOp _      [res] ReadArrayArrayOp_ByteArray          [obj,ix]   = doReadPtrArrayOp res obj ix
emitPrimOp _      [res] ReadArrayArrayOp_MutableByteArray   [obj,ix]   = doReadPtrArrayOp res obj ix
emitPrimOp _      [res] ReadArrayArrayOp_ArrayArray         [obj,ix]   = doReadPtrArrayOp res obj ix
emitPrimOp _      [res] ReadArrayArrayOp_MutableArrayArray  [obj,ix]   = doReadPtrArrayOp res obj ix

I consider this duplication a minor problem. It's not very costly, and it's easy to see what's going on. The real problem is that, despite the all the duplication, this interface still only captures a fraction of what ArrayArray# can really offer. The end user must explicitly use unsafeCoerce# to do a bunch of things (storing MVar#, Array#, etc.), and even when they want a ByteArray# (which the interface does safely allow), they must implicitly perform an unsafe coercion on every access (read/write/index) because the type system never really tells us what's in an ArrayArray#. In the primitive package, there is a lot of unsafeCoerce# that is required to make it work:

Basically, the current interface interface doesn't take advantage of the type system. What we have are:

  • implicit unsafe coercion on every access
  • copying functions (copyArrayArray#, etc.) that don't ensure that the elements in both arrays are of the same type.
  • explicit unsafeCoerce# required whenever you want to: store Array# a inside of ArrayArray#, store MutableByteArray# s inside of ArrayArray#, store MutVar# s a inside of ArrayArray#, etc.

For the most part, the advantages to having UnliftedArray# a are similar to the advantages of having Array# having the type variable a. Array# doesn't suffer from any of the aforementioned problems. Imagine if the interface for Array# looked like this:

data Array#
data MutableArray# s
indexArray# :: Array# -> Int -> a
readMutableArray# :: MutableArray# s -> Int -> a -> State# s -> State# s

This would be terrible, and I'm glad it wasn't done that way. ArrayArray# was done that way because prior to GHC 8.0, type variables could only have lifted runtime representations. But now that we've had this for a while, I think it's time to look at cleaning up this interface. Plus, in the future, it might be worth having something like SmallArray# but for unlifted data. I wouldn't want to see the same bulky-but-inexpressive interface show up again.

comment:3 Changed 2 years ago by simonpj

OK now I get it. I had to look at the documentation for ArrayArray#. If I may say it like this:

The current primitive data type ArrayArray# is a heap-allocated array of pointers to unlifted objects. Any kind of unlifted objects would be fine, provide they are represented by a pointer. Specifically, we can put both Array# and ByteArray# (and I suppose another ArrayArray#) inside an ArrayArray#. But doing so is jolly awkwerd because ArrayArray# is not a parameterised type.

Why isn't it parameterised? Becuase previously we had no way to quantify over unlifted types. But now we do. So we can make ArrayArray# into a paremterised type, namely your new

data UnliftedArray# (a :: TYPE 'UnliftedRep)

Your idea here is that UniftedArray# is an array of pointers to unlifted values.

Cool, I like it. Levity polymorphism is great.

PS: I'm not expressing an opinion about naming, or about transition plans. Just about the principle.

Last edited 2 years ago by simonpj (previous) (diff)

comment:4 Changed 2 years ago by akio

Cc: akio added

comment:5 Changed 19 months ago by bgamari

This would probably make for a good proposal since it will affect a relatively large number of users. That being said, given that UnliftedArray# is a strict generalization of ArrayArray# I expect it won't be *that* hard to expose the existing ArrayArray# interface built on top of UnliftedArray#. The only annoying issue is that users often import GHC.Prim directly, which has no Haskell module associated with it.

comment:6 Changed 19 months ago by andrewthad

If UnliftedNewtypes, described in, is accepted, it will be trivial to reimplement ArrayArray#, and its related functions:

newtype ArrayArray# = ArrayArray# (UnliftedArray# ByteArray#)

It's not important to use ByteArray# here. You could use any unlifted type since you're just going to unsafe coerce it when you pull in out or put it in. I guess these operators would need to go in GHC.Exts or something like that, which would probably still break some people's code since we all love to import GHC.Prim so much, but at least it provides a good backwards-compatibility option.

I'll wait to see if UnliftedNewtypes is accepted. If so, I'll write this up as a proposal.

comment:7 Changed 19 months ago by bgamari

For what it's worth, I wouldn't be opposed to breaking users of GHC.Prim; this module really is an internal implementation detail of the compiler and I think we would be unduly tying our hands by guaranteeing compatibility.

comment:8 Changed 19 months ago by andrewthad

Agreed. I think that requiring people to import GHC.Exts instead of GHC.Prim is acceptable.

comment:9 Changed 19 months ago by andrewthad

UnliftedNewtypes was accepted. I'll begin working on a proposal for UnliftedArray#.

Note: See TracTickets for help on using tickets.