Ticket #15 (new enhancement)

Opened 3 years ago

Last modified 21 months ago

Add "generalised scan"

Reported by: rl Owned by:
Priority: minor Milestone:
Version: 0.7 Keywords:
Cc:

Description

In any case, vector could well provide an operation like this:

cant_think_of_a_name :: Vector v a => Int -> (v a -> a) -> v a

The function would take the initialised prefix of the vector (starting with empty) and produce the next element. This would require a bit of hackery underneath but the interface would be safe and pure. Would something like this be useful?

See http://www.haskell.org/pipermail/glasgow-haskell-users/2010-May/018816.html

Note that we can't unsafeFreeze the vector until we are done because that would break GC for boxed vectors. Also, we can't recycle the vector because a could be a thunk with references to the intermediate v a.

Change History

Changed 3 years ago by Khudyakov

I want to point out that this function quite similar to unfold and three more variant exist. I don't advocate their addition though.

general_unfoldr1 :: Vector v a => Int → (b → v a → (a,b)) → b → v a
general_unfoldr1 :: Vector v a => Int → (b → v a → Maybe (a,b)) → b → v a
general_unfoldr1 :: Vector v a => (b → v a → Maybe (a,b)) → b → v a

Changed 3 years ago by choener

To give a generalised argument why this is useful: a number of dynamic programming algorithms work on n-dimensional data structures where all calculations happen on a hyperplane that moves toward a point. In 2d, we calculate successive diagonal entries, for example.

Some hackary with ST and unsafeFreeze gives the same result but requires ST (or IO) and just doesn't look cool (and see rl's comment).

The fact, that we calculated diagonals is unimportant, one can always transform the indices.

Changed 3 years ago by rl

  • priority changed from major to critical

Changed 3 years ago by choener

cant_think_of_a_name :: Vector v a => Int -> (v a -> Int -> a) -> v a

This is actually how it should be. The function needs the current vector and the index where we are at. If you tell me, that in terms of speed there will be no difference because I can just query the length, that's ok. too. ;-)

I can't decide if the function (eg f below) is supposed to be a small kernel that would typically be only a few lines of assembly, or if it is more like half a program. Guessing from my code, both will be true.

Example

cant_think_of_a_name 100 f

f v k
  | k == 0    = 1
  | k == 1    = 1
  | otherwise = v!(k-1) + v!(k-2)

Boring example, I know...

Changed 3 years ago by rl

Indeed, querying the length is just as fast in this case so there is no reason to pass it separately. I'm still not entirely sure how to best implement this for boxed vectors which is the only reason it isn't in the repository yet. The next vector release will include this, though.

Changed 3 years ago by anonymous

More of a note for me...

Lets say I have

do
  (tx,txM) <- mkTable 0 n
  (ty,tyM) <- mkTable 0 n
  forM_ [n,n-1..0] $ \i -> do
    write txM i $ opA tx ty i n
    write tyM i $ opB tx    i n
  return (tx,ty)

mkTable creates space for some elements in 'snd' and unsafefreezes everything in 'fst'. With the generalized scan, I would not have 2 tables of 1-tuples but 1 table of a 2-tuple per cell.

This badly breaks some algorithms. Consider that I want to extend stuff to include a third table. Now the functions opA and opB would receice a table of 3-tuples and would have to do indexing in another way.

This is no problem, if we can unzip the tuples in O(1). It is O(n) but if fusion fires correctly, everything should come down to loops that correctly index the tuple elements.

Otherwise, I have to take a closer look at all this anyway, as sometimes fusion does not happen. I have, for example, two function doing exactly the same (one does enumFromN, the other enumFromStepN) and one produces a nice loop, the other generates an intermediate array. But that will be a report, once I know more...

Changed 3 years ago by rl

  • version changed from 0.6 to 0.7
  • milestone changed from 0.7 to 0.8

Changed 3 years ago by rl

FWIW, unzip is O(1) for Data.Vector.Unboxed.

Changed 3 years ago by choener

It's worth a lot as it means I can unzip the input vector to get the individual components and use the functions on them without big problems -- of course, right now I use PrimitiveArrays? as they have less overhead in some cases (I think).

Changed 3 years ago by rl

I would be very interested to see the cases where vector is slower! Perhaps I can fix it :-)

Changed 3 years ago by choener

Actually, I think you have already written a bug report about that. I just do not know the ID. It was s.th. like have to do an extra addition with vector instead of just having a pointer to the current position (or s.th. like that?).

Anyways, while not having found that bug, I saw that there is one with data families and no unpacking which means that I might have to re-check some of my assumptions in PrimitiveArray?. If that is the case I could as well base them (again) on vector.

I have to think about my backend stuff for matrices, anyway. Maybe I should just employ criterion and ghc7 and see what is the fastest way...

Changed 2 years ago by choener

I have had some time and measured 1-dimensional PrimitiveArrays? vs. Vector. In both cases the game was to read from a NOINLINE data store (to simulate a large array i would be working on) and sum the elements.

Both backends now perform the same performancewise (using Vector.Unboxed.unsafeIndex). This means I'll be switching PrimitiveArray? again to act as a simple wrapper for multi-dimensional arrays on top of vector.

At least that is the plan.

Gruss, Christian

Changed 21 months ago by rl

  • priority changed from critical to minor
  • milestone 0.8 deleted

This is finally implemented by:

Sat Aug 20 00:40:52 BST 2011 Roman Leshchinskiy <rl@…>

  • Add constructN and constructrN

I wonder if I also should implement:

construct :: (v a -> Maybe a) -> v a
constructr :: (v a -> Maybe a) -> v a

I'll leave this open for now because of this.

Note: See TracTickets for help on using tickets.