Distance functions refer to functions used for quantifying the notion of distance between members of a set, or between objects.
Questions tagged [distance-functions]
326 questions
328
votes
8 answers
Why is Euclidean distance not a good metric in high dimensions?
I read that 'Euclidean distance is not a good distance in high dimensions'. I guess this statement has something to do with the curse of dimensionality, but what exactly? Besides, what is 'high dimensions'? I have been applying hierarchical…

teaLeef
- 3,497
- 3
- 12
- 11
84
votes
6 answers
Why does k-means clustering algorithm use only Euclidean distance metric?
Is there a specific purpose in terms of efficiency or functionality why the k-means algorithm does not use for example cosine (dis)similarity as a distance metric, but can only use the Euclidean norm? In general, will K-means method comply and be…

curious
- 971
- 1
- 7
- 7
80
votes
6 answers
Choosing a clustering method
When using cluster analysis on a data set to group similar cases, one needs to choose among a large number of clustering methods and measures of distance. Sometimes, one choice might influence the other, but there are many possible combinations of…

Brett
- 5,708
- 3
- 29
- 41
51
votes
4 answers
Kullback–Leibler vs Kolmogorov-Smirnov distance
I can see that there are a lot of formal differences between Kullback–Leibler vs Kolmogorov-Smirnov distance measures.
However, both are used to measure the distance between distributions.
Is there a typical situation where one should be used…

Greg
- 613
- 1
- 5
- 7
49
votes
3 answers
What is the distribution of the Euclidean distance between two normally distributed random variables?
Assume you are given two objects whose exact locations are unknown, but are distributed according to normal distributions with known parameters (e.g. $a \sim N(m, s)$ and $b \sim N(v, t))$. We can assume these are both bivariate normals, such that…

Nick
- 3,327
- 6
- 28
- 24
47
votes
2 answers
Hierarchical clustering with mixed type data - what distance/similarity to use?
In my dataset we have both continuous and naturally discrete variables. I want to know whether we can do hierarchical clustering using both type of variables. And if yes, what distance measure is appropriate?

Beta
- 5,784
- 9
- 33
- 44
35
votes
5 answers
Measuring the "distance" between two multivariate distributions
I'm looking for some good terminology to describe what I'm trying to do, to make it easier to look for resources.
So, say I have two clusters of points A and B, each associated to two values, X and Y, and I want to measure the "distance" between A…

Emile
- 1,057
- 1
- 10
- 16
33
votes
1 answer
Comparing hierarchical clustering dendrograms obtained by different distances & methods
[The initial title "Measurement of similarity for hierarchical clustering trees" was later changed by @ttnphns to better reflect the topic]
I am performing a number of hierarchical cluster analyses on a dataframe of patient records (e.g. similar to…

Wouter
- 2,102
- 3
- 17
- 26
21
votes
4 answers
Euclidean distance score and similarity
I'm just working with the book Collective Intelligence (by Toby Segaran) and came across the Euclidean distance score. In the book the author shows how to calculate the similarity between two recommendation arrays (i.e. $\textrm{person} \times…

navige
- 325
- 1
- 2
- 6
21
votes
2 answers
Is there an unbiased estimator of the Hellinger distance between two distributions?
In a setting where one observes $X_1,\ldots,X_n$ distributed from a distribution with density $f$, I wonder if there is an unbiased estimator (based on the $X_i$'s) of the Hellinger distance to another distribution with density $f_0$,…

Xi'an
- 90,397
- 9
- 157
- 575
20
votes
1 answer
When to use weighted Euclidean distance and how to determine the weights to use?
I have a set of data where each data consist of $n$ different measures. For each measure, I have a benchmark value. I would like to know how close each data is to the benchmark value.
I thought of using the Weighted Euclidean Distance like…

Sara
- 1,347
- 4
- 13
- 16
19
votes
4 answers
Is it ok to use Manhattan distance with Ward's inter-cluster linkage in hierarchical clustering?
I am using hierarchical clustering to analyze time series data. My code is implemented using the Mathematica function DirectAgglomerate[...], which generates hierarchical clusters given the following inputs:
a distance matrix D
the name of the…

Rachel
- 191
- 1
- 5
16
votes
1 answer
Clustering: Should I use the Jensen-Shannon Divergence or its square?
I am clustering probability distributions using the Affinity Propagation algorithm, and I plan to use Jensen-Shannon Divergence as my distance metric.
Is it correct to use JSD itself as the distance, or JSD squared? Why? What differences would…

AlcubierreDrive
- 263
- 1
- 2
- 6
16
votes
1 answer
What is the optimal distance function for individuals when attributes are nominal?
I do not know which distance function between individuals to use in case of nominal (unordered categorical) attributes.
I was reading some textbook and they suggest Simple Matching function but some books suggest that I should change the nominal to…

Jane Doe
- 311
- 1
- 2
- 6
15
votes
5 answers
Best distance measure to use to compare vectors of angles
Context
I have two sets of data that I want to compare. Each data element in both sets is a vector containing 22 angles (all between $-\pi$ and $\pi$). The angles relate to a given human pose configuration, so a pose is defined by 22 joint…

Josh
- 595
- 1
- 4
- 14