Highest Voted 'distance-functions' Questions - Statistical Analysis Stack Exchange

328

votes

8 answers

Why is Euclidean distance not a good metric in high dimensions?

I read that 'Euclidean distance is not a good distance in high dimensions'. I guess this statement has something to do with the curse of dimensionality, but what exactly? Besides, what is 'high dimensions'? I have been applying hierarchical…

asked May 18 '14 at 17:50

teaLeef

3,497
3
12
11

84

votes

6 answers

Why does k-means clustering algorithm use only Euclidean distance metric?

Is there a specific purpose in terms of efficiency or functionality why the k-means algorithm does not use for example cosine (dis)similarity as a distance metric, but can only use the Euclidean norm? In general, will K-means method comply and be…

clustering k-means distance-functions euclidean

asked Jan 07 '14 at 11:53

curious

971
1
7
7

80

votes

6 answers

Choosing a clustering method

When using cluster analysis on a data set to group similar cases, one needs to choose among a large number of clustering methods and measures of distance. Sometimes, one choice might influence the other, but there are many possible combinations of…

clustering distance-functions methodology

asked Oct 18 '10 at 15:58

Brett

5,708
3
29
41

51

votes

4 answers

Kullback–Leibler vs Kolmogorov-Smirnov distance

I can see that there are a lot of formal differences between Kullback–Leibler vs Kolmogorov-Smirnov distance measures. However, both are used to measure the distance between distributions. Is there a typical situation where one should be used…

distributions distance-functions kolmogorov-smirnov-test kullback-leibler

asked Apr 07 '11 at 11:39

Greg

613
1
5
7

49

votes

3 answers

What is the distribution of the Euclidean distance between two normally distributed random variables?

Assume you are given two objects whose exact locations are unknown, but are distributed according to normal distributions with known parameters (e.g. $a \sim N(m, s)$ and $b \sim N(v, t))$. We can assume these are both bivariate normals, such that…

normal-distribution distance-functions

asked Apr 05 '11 at 19:10

Nick

3,327
6
28
24

47

votes

2 answers

Hierarchical clustering with mixed type data - what distance/similarity to use?

In my dataset we have both continuous and naturally discrete variables. I want to know whether we can do hierarchical clustering using both type of variables. And if yes, what distance measure is appropriate?

clustering similarities distance-functions mixed-type-data gower-similarity

asked Sep 07 '11 at 16:18

Beta

5,784
9
33
44

35

votes

5 answers

Measuring the "distance" between two multivariate distributions

I'm looking for some good terminology to describe what I'm trying to do, to make it easier to look for resources. So, say I have two clusters of points A and B, each associated to two values, X and Y, and I want to measure the "distance" between A…

multivariate-analysis terminology distance-functions

asked Oct 28 '10 at 13:06

Emile

1,057
1
10
16

33

votes

1 answer

Comparing hierarchical clustering dendrograms obtained by different distances & methods

[The initial title "Measurement of similarity for hierarchical clustering trees" was later changed by @ttnphns to better reflect the topic] I am performing a number of hierarchical cluster analyses on a dataframe of patient records (e.g. similar to…

r clustering distance-functions similarities dendrogram

asked Jul 07 '13 at 07:57

Wouter

2,102
3
17
26

21

votes

4 answers

Euclidean distance score and similarity

I'm just working with the book Collective Intelligence (by Toby Segaran) and came across the Euclidean distance score. In the book the author shows how to calculate the similarity between two recommendation arrays (i.e. $\textrm{person} \times…

distance-functions similarities

asked Mar 23 '13 at 12:05

navige

325
1
2
6

21

votes

2 answers

Is there an unbiased estimator of the Hellinger distance between two distributions?

In a setting where one observes $X_1,\ldots,X_n$ distributed from a distribution with density $f$, I wonder if there is an unbiased estimator (based on the $X_i$'s) of the Hellinger distance to another distribution with density $f_0$,…

density-function unbiased-estimator distance-functions functional-data-analysis hellinger

asked Jun 01 '12 at 09:36

Xi'an

90,397
9
157
575

20

votes

1 answer

When to use weighted Euclidean distance and how to determine the weights to use?

I have a set of data where each data consist of $n$ different measures. For each measure, I have a benchmark value. I would like to know how close each data is to the benchmark value. I thought of using the Weighted Euclidean Distance like…

distance-functions

asked Sep 07 '11 at 17:00

Sara

1,347
4
13
16

19

votes

4 answers

Is it ok to use Manhattan distance with Ward's inter-cluster linkage in hierarchical clustering?

I am using hierarchical clustering to analyze time series data. My code is implemented using the Mathematica function DirectAgglomerate[...], which generates hierarchical clusters given the following inputs: a distance matrix D the name of the…

clustering distance-functions ward

asked Apr 08 '11 at 07:47

Rachel

191
1
5

16

votes

1 answer

Clustering: Should I use the Jensen-Shannon Divergence or its square?

I am clustering probability distributions using the Affinity Propagation algorithm, and I plan to use Jensen-Shannon Divergence as my distance metric. Is it correct to use JSD itself as the distance, or JSD squared? Why? What differences would…

machine-learning clustering entropy distance-functions

asked Feb 25 '11 at 18:01

AlcubierreDrive

263
1
2
6

16

votes

1 answer

What is the optimal distance function for individuals when attributes are nominal?

I do not know which distance function between individuals to use in case of nominal (unordered categorical) attributes. I was reading some textbook and they suggest Simple Matching function but some books suggest that I should change the nominal to…

distance-functions distance similarities association-measure categorical-data

asked Apr 11 '13 at 04:59

Jane Doe

311
1
2
6

15

votes

5 answers

Best distance measure to use to compare vectors of angles

Context I have two sets of data that I want to compare. Each data element in both sets is a vector containing 22 angles (all between $-\pi$ and $\pi$). The angles relate to a given human pose configuration, so a pose is defined by 22 joint…

measurement distance-functions circular-statistics

asked Feb 04 '11 at 21:33

Josh

595
1
4
14

Questions tagged [distance-functions]