I have read some papers about distance measures like Euclidean, Manhattan or Chi-Square for matching gradient based image descriptors like those computed from the SIFT Algorithm (128-D vectors). Most of them state, that one or the other measure is more suitable for that task and that this decision depends on the noise distribution assumed (Like L2 is suitable if Gaussian noise is assumed or L1 for Laplacian noise).
So my question is: how do I determine the distribution of my descriptors and how does that affect the choice for a distance measure?
In one paper the authors stated that the distribution would be the distances between SIFT descriptors of correctly matching keypoints. But doesn't that mean this distribution is biased by the choice of the distance metric itself?!