How one apply correctly FFT in image denoising

Question

I'm writing program (Qt widgets/c++) for removing noise from images. As denoising method, i selected non local means method. This method has incredible quality of restored images (that's why it's the only denoising method in OpenCV), but has huge computation cost, so i made a lot of modified variants of this method (some with multithreading, some algorithmic). But, i'm having problem with the one, involving FFT

I followed all the steps of this article (only one page, 1430) and all works perfectly, except for FFT part, there just 2 lines about it in the paper and i can't understand, HOW should one use fft

This problem has bothered me for months, any help or insight would be greatly appriciated.

Shortened version of question: How can i get summed squared difference of two arrays on the image (the one at top and one in the middle, values are colors) quickly? ( O(n^2) is huge cost, there are lots of this kind of operations, paper above states, that it can be done via FFT with O(n*log n) (says that this 2 arrays forming circular convolution somehow))

enter image description here

What did you finally end up doing for computing FFT ? Even if FFT is precomputed, the point-wise multiplication and addition of all patch elements takes $O(|P|)$ time where $|P|$ is the size of the patch. How did you overcome this ? — curryage, Oct 28 '14 at 04:49

sansuiso · Accepted Answer · 2016-11-23T13:21:21.217

5

The trick inside the paper is the following:

What you want to compute is $\sum_{i \in W} |I(x+i)-I(y+i)|^2$, where $I$ is an image, $x$ and $y$ two noisy pixels and $i$ is a 2D offset used to define a patch.
Expanding the expression yields: $\sum_i I^2(x+i) + \sum_i I^2(y+i) - 2 \sum_i I(x+i)I(y+i) = A + B - 2C$.
$A$ and $B$ are computed using a squared integral image, i.e., an integral image from the squared original image.
$C$ is the convolution between the two patches centered on $x$ and $y$. Thus, it can be computed in the Fourier domain, where it becomes a multiplication. You get the value of $C$ by computing the Fourier transform of the patch around $x$, the patch around $y$, pointwise-multiplying these results and taking the inverse Fourier transform of the multplication result.

The Fourier transform is obviously a 2D transform since you are working with 2D data. What you obtain for a given patch is a 2D array of complex values.

Additional notes

In my opinion this article is not the best NL-means speedup strategy. Experiments I did way back in 2007/2008 show that pre-selection of patches are better (both in terms of speed and quality of the results). I have started blogging about these here, but unfortunately I am looking for time to finish the posts.

The original NL-means papers mention blockwise implementations that can be interesting. There are fundamentally 2 ways in implementing NL-means:

writing a denoising loop for every pixel in the image
writing a denoising loop for each patch, then back-project the patches to form an image.

The first impolementation is the original approach, because in 2005 memory and multicore CPUs were expensive. I chose on the other hand number 2 on recent hardware in the past 2 years. It depends on your typical image size and if you want to be able to compute domain transforms like DFT/DCT (as in the proposed paper and in BM3D).

edited Nov 23 '16 at 13:21

answered May 27 '13 at 15:27

sansuiso

3,869
16
26

Thanks a lot for your answer, that's exactly what i needed, all was ready and working a long time ago, except for 4th item in that list, but now it is a lot clearer. Though one more question, if you don't mind: what will return Fourier transform of patch x or y? Array, vector, or single value? And what is required to use inverse transform? Becouse i am thinking about precomputing fft for every pixel (patches centered around it) and write results to 2d array before denoising and then just use those matrix to get inverse fft, but i don't know if this will be enough for inverse fft – Shf May 27 '13 at 20:54
oh, and should i use 2d fft, or translate patch to 1d array? by the way, i was planning to write after this patchwise implementation anyway, thanks for an advise :) something similar to this also a long time ago- http://www.ipol.im/pub/art/2011/bcm_nlm/ – Shf May 27 '13 at 20:58
I've updated the answer. – sansuiso May 28 '13 at 07:36
ok, so i can precompute FFT for patches, centered around every pixel before donoising though it will take a lot of memory (m*n*size_of_patch*size_of_patch*sizeof(double)), but when i will count weights, i would still need to pointwise multiply 2 complex arrays and after that do inverse fft on received 2d array, it's even more then O(n^2) if i'm not mistaken – Shf May 28 '13 at 08:38
The complexity of the FFT depends on the size oif the patches which is much smaller than the size of the image. Thus, you have something like N (pixels) * P log(P) where P is the size of a patch and that is much smaller than the original N^2. – sansuiso May 28 '13 at 10:55
direct approach (loop through array) will give O(P^2), P is patch size, though, i don't see, how will fft give O(P*log(P)) even with precomputed arrays for every pixel. It still will need to multiply pointwise complex arrays (seems like exactly the same O(P^2) if done directly) and after that do inverse fft, how will this reduce computation difficulty? – Shf May 28 '13 at 11:15
FFT is logarithmic wrt data size (that's why it's fast). Then, pointwise multiplying means one multiplication per array element, while direct convolution means to perform all the multiplications per source pixel, which basically means a lot. – sansuiso May 28 '13 at 11:22
yep, got that, misunderstood pointwise multiplication term, thought that it is direct loop through array. Well, then, looks like everithing is sorted out now. Though that precomputed 2d array of complex forwarf fft arrays still bothers me, well, anyway, this still helped really a lot, thank you – Shf May 28 '13 at 11:49
You're welcome. Depending on your platform, you will find optimized libraries / dedicated hardware for FFT (see fftw, cuda, vendor-specific libs on embedded...).Anyway, when I did extensive tests with A. Buades back in 2006, the pre-selection trick brought a huge speedup and it is far less complicated than FFT. – sansuiso May 28 '13 at 11:54
target is all platforms, that qt supports :) but for now, i'm happy with linux and windows, selected fftw a long time ago fo this – Shf May 28 '13 at 12:15
it seems, that inverse Fourier transform in fftw and other libraries returns not a single value, but also 2d array. Am i misunderstanding something and there is a way to force return of only 1 element? – Shf May 29 '13 at 12:17
2

Good answer, but how are you deriving that $C$ is a convolution? The way it is written it is a point wise element-by-element product. Where is the convolution? – TheGrapeBeyond Oct 20 '13 at 03:00

How one apply correctly FFT in image denoising

1 Answers1