48 | | The vector version uses the same recursive radix-2 decimation in time (DIT) algorithm as the Repa version, but is not rank generalised. It applies a recursive 1d FFT to each row and then transposes the matrix, twice each. Recursive FFT algorithms tend to be slower than in-place ones because ~~the data is copied~~ into new vectors at each recursion. A 512 point FFT is built from two 256 point FFTs, which are build from 4 128 point FFTs and so on. The result of each FFT is a new vector which needs to be allocated, filled, then unboxed again during the next recursion. The base case is a 2 point vector, so there are 256 of them to allocate then unbox at the lowest step. |

| 48 | The vector version uses the same recursive radix-2 decimation in time (DIT) algorithm as the Repa version, but is not rank generalised. It applies a recursive 1d FFT to each row and then transposes the matrix, twice each. Recursive FFT algorithms tend to be slower than in-place ones because intermediate data is written into new vectors at each recursion. A 512 point FFT is built from two 256 point FFTs, which are build from 4 128 point FFTs and so on. The result of each FFT is a new vector which needs to be allocated, filled, then unboxed again during the next recursion. The base case is a 2 point vector, so there are 256 of them to allocate then unbox at the lowest step. |