What is a good distance measure for matching SIFT descriptors depending on the distribution of their noise?

Question

I have read some papers about distance measures like Euclidean, Manhattan or Chi-Square for matching gradient based image descriptors like those computed from the SIFT Algorithm (128-D vectors). Most of them state, that one or the other measure is more suitable for that task and that this decision depends on the noise distribution assumed (Like L2 is suitable if Gaussian noise is assumed or L1 for Laplacian noise).

So my question is: how do I determine the distribution of my descriptors and how does that affect the choice for a distance measure?

In one paper the authors stated that the distribution would be the distances between SIFT descriptors of correctly matching keypoints. But doesn't that mean this distribution is biased by the choice of the distance metric itself?!

D. Lowe and [M. Brown](http://research.microsoft.com/pubs/74482/cvpr05.pdf) proposed effective filtering of feature matches assuming the correct matches have normal distribution of noise while incorrect matches have uniform noise distribution. The inliers and outliers are separated based on 1-nn/2-nn distance threshold. Maybe this will help a little bit. — Libor, Feb 14 '13 at 16:46
@Libor, yep that is the assumption and method of Lowe. But other papers like the one from [Yangqing Jia](http://www.eecs.berkeley.edu/~jiayq/papers/nips11gcl.pdf) assume another kind of distribution. But how can I conclude the correct distribution of my descriptors (which depends from the image patches I suppose)? — jstr, Feb 15 '13 at 10:50
I would start with least-squares matching of the distribution (the one proposed in the paper) on a very large dataset. Then computing something like Mahalanobis distance based on the results. I am not much into statistics and using L2 distance myself in my feature matching software. This topic interests me but I am afraid I don't have enough background to tackle your problem properly. — Libor, Feb 15 '13 at 16:27
I agree with libor, start with least-squares. But instead of using the regular L2 distance (2-norm) use the 2-norm squared, it is faster and still achieves the same task. just think L2 is `sqrt(sum(Xi^2) for all i)` L^2 is just `sum(Xi^2)` for all `i`. so you don't need to do the `sqrt` operation — andrew, Apr 17 '15 at 19:25

score 1 · Answer 1 · edited Dec 19 '21 at 08:35

For obtaining distributions or modes in your, a basic approach is to cluster the data. That will of course bias the distribution towards the metric, or in other words the distribution will be tied to the metric. For example, it is possible to quantize the SIFT descriptors by using vocabulary trees to visual words and then plot the 1D histogram of the individual words. That's only one possible way to do this from a large collection of SIFT descriptors. Another approach would use nonparametric methods (Parzen Window, Meanshift...). In either case the distribution will be data-dependent and quite complex.

I think this question also deserves to mention the Root Sift approach from the paper Three things everyone should know to improve object retrieval, where authors propose to use the Hellinger distance instead of Euclidean. The implementation is very easy (just take a sqrt) and the results are consistently improved. K-means can still be used. Apparently, Hellinger is a better metric than Euclidean for SIFT.

What is a good distance measure for matching SIFT descriptors depending on the distribution of their noise?

1 Answers1