Version 30 (modified by benl, 4 years ago)

--

fft2d-highpass

Applies a highpass filter to a BMP image, using a recursive, radix-2 Decimation In Time (DIT) Fast Fourier Transform. The first argument is the lower cutoff frequency.

Each of the RGB channels is converted to the frequency domain, the lower frequencies are set to zero, then the channels are converted back to the image domain.

 repa-fft2d-highpass 2 lena.bmp lena-high2.bmp

Code

The main algorithm is at FFT.hs

The wrapper is at Main.hs

Test Data

http://en.wikipedia.org/wiki/Lenna is a standard test image.

lena.bmp lena-high2.bmp
Error: Macro Image(WikiStart:lena-high2-thumb.jpg) failed
Attachment 'wiki:WikiStart: lena-high2-thumb.jpg' does not exist.
full size full size

Runtime

Head version. Compiled with GHC 6.13.20100309. 512x512 image.

Running on a Intel i7 iMac. 2.8Ghz, 4 cores x 2 threads/core. 256k L1, 8MB L2, 8GB main memory.

Times stated include IO.

Threads Time(s)
1 6.93
2 4.07
3 3.42
4 3.13
5 3.15
6 3.08
7 3.08
8 3.39

Comparisons

Version Time(s) Source
Using Data.Vector.Unboxed 2.88 FFT.hs
Using Jones's inplace C implementation 0.24 Jones.c
Using FFTW using Estimate mode 0.09 FFTW.c

The vector version uses the same recursive radix-2 decimation in time (DIT) algorithm as the Repa version, but is not rank generalised. It applies a recursive 1d FFT to each row and then transposes the matrix, twice each. Recursive FFT algorithms tend to be slower than in-place ones because intermediate data is written into new vectors at each recursion. A 512 point FFT is built from two 256 point FFTs, which are built from 4 128 point FFTs and so on. The result of each FFT is a new vector which needs to be allocated, filled, then unboxed again during the next recursion. The base case is a 2 point vector, so there are 256 of them to allocate then unbox at the lowest step.

Jones's version also uses a 1d radix-2 DIT FFT kernel, but it first reorders the values then performs an in-place transform using three nested loops. Using an in-place algorithm gives better locality, and avoids the need to allocate and unbox all the intermediate vectors.

FFTW contains deep magic, and is comparable with vendor optimised versions. "Estimate" mode is not the most magical. The "Measure" mode of FFTW performs runtime profiling to determine the best way to do an FFT on your hardware. However, this process can require several seconds of startup time, which is why I haven't enabled it.

The FFTW version also uses a precomputed wavetable of sin and cos values, but the others (including Jones's) do not.

Attachments