Replacement for angular distance metric

Question

I am looking for a distance metric that could be used instead of cosine/angular distance for high dimensional data. Metric that is limited the same way as cosine/angular distance is would be great.

The problem I have with cosine/angular distance is its ignorance to magnitude.

If given a vector (2,2) and a vector (3,3) or even a vector (100,100), cosine/angular distance says that these vectors are all similar. L2 distance says these are not similar. But L2 is not suited for high dimensional data. Based on this Q

In my case, the vectors, dimension from 500 to 4k, are generated by CNN, and I need to be able to cluster them. While learning the network, I will be using triplet-loss, and when the model "finishes" learning.

I will be also using the same metric for as baseline model (histograms, static features, SURF)

Welcome to our site. If you could explain the objective of your clustering, or at least elaborate on the sense in which Euclidean distance is "not suited for high dimensional data," then we would have some information about how to answer this question. — whuber, Feb 09 '19 at 13:51
Cosine is *not* more robust for high dimensional data. This has long been refuted (cosine is also affected by the curse of dimensionality). In fact, the non-length ignorant equivalent to cosine is (squared) Euclidean. — Has QUIT--Anony-Mousse, Feb 09 '19 at 17:34
Instead of choosing a distance based on some vague "recommendation" from some random internet posts, try to understand what the proper metric is. Here probably: how does the CNN use/optimize similarity. Probably dot product. — Has QUIT--Anony-Mousse, Feb 09 '19 at 17:35

Replacement for angular distance metric

0 Answers0