Questions tagged [distance]

Measure of distance between distributions or variables, such as Euclidean distance between points in n-space.

Mathematically a distance, $d$, or metric, is a function that satisfies the following properties. For two points $x, y, z$:

$d(x,y) \geq 0$
$d(x,y) = 0 \implies x = y$
$d(x,y) = d(y,x)$
$d(x,z) \leq d(x,y) + d(y,z)$

Note that certain concepts of distance in probability theory do not satisfy these properties. In particular, the KL-distance between two distributions is not symmetric, and doesn't satisfy the third property above.

Euclidean distance, Manhattan distance and Hamming distance are all common metrics.

672 questions

160

votes

9 answers

Bottom to top explanation of the Mahalanobis distance?

I'm studying pattern recognition and statistics and almost every book I open on the subject I bump into the concept of Mahalanobis distance. The books give sort of intuitive explanations, but still not good enough ones for me to actually really…

asked Jun 19 '13 at 12:41

jjepsuomi

5,207
11
34
47

votes

5 answers

Intuition on the Kullback–Leibler (KL) Divergence

I have learned about the intuition behind the KL Divergence as how much a model distribution function differs from the theoretical/true distribution of the data. The source I am reading goes on to say that the intuitive understanding of 'distance'…

distributions distance intuition kullback-leibler

asked Jan 01 '16 at 17:03

cgo

7,445
10
42
61

votes

2 answers

Choosing the right linkage method for hierarchical clustering

I am performing hierarchical clustering on data I've gathered and processed from the reddit data dump on Google BigQuery. My process is the following: Get the latest 1000 posts in /r/politics Gather all the comments Process the data and compute an…

clustering distance unsupervised-learning hierarchical-clustering

asked Feb 13 '16 at 22:09

Kevin Eger

votes

4 answers

Maximum Mean Discrepancy (distance distribution)

I have two data sets (source and target data) which follow different distributions. I am using MMD - that is a non-parametric distribution distance - to compute marginal distribution between the source and target data. source data, Xs target data,…

machine-learning distributions distance feature-engineering domain-adaptation

asked Apr 28 '17 at 15:45

Mahsa

votes

1 answer

Earth Mover's Distance (EMD) between two Gaussians

Is there a closed-form formula for (or some kind of bound on) the EMD between $x_1\sim N(\mu_1, \Sigma_1)$ and $x_2 \sim N(\mu_2, \Sigma_2)$?

normal-distribution distance

asked Jan 29 '14 at 13:24

ifog

votes

1 answer

Converting similarity matrix to (euclidean) distance matrix

In Random forest algorithm, Breiman (author) constructs similarity matrix as follows: Send all learning examples down each tree in the forest If two examples land in the same leaf increment corresponding element in similarity matrix by 1 Normalize…

random-forest distance similarities euclidean

asked Sep 12 '12 at 09:59

Uros K

votes

3 answers

Distribution of difference between two normal distributions

I have two probability density functions of normal distributions: $$f_1(x_1 \; | \; \mu_1, \sigma_1) = \frac{1}{\sigma_1\sqrt{2\pi} } \; e^{ -\frac{(x-\mu_1)^2}{2\sigma_1^2} }$$ and $$f_2(x_2 \; | \; \mu_2, \sigma_2) = \frac{1}{\sigma_2\sqrt{2\pi} }…

distributions normal-distribution distance

asked Dec 12 '15 at 22:15

Martijn

votes

1 answer

Using correlation as distance metric (for hierarchical clustering)

I would like to hierarchically cluster my data, but rather than using Euclidean distance, I'd like to use correlation. Also, since the correlation coefficient ranges from -1 to 1, with both -1 and 1 denoting "co-regulation" in my study, I am…

correlation clustering distance hierarchical-clustering

asked Aug 07 '15 at 20:25

Megatron

votes

1 answer

Can the Mantel test be extended to asymmetric matrices?

The Mantel test is usually applied to symmetric distance/difference matrices. As far as I understand, an assumption of the test is that the measure used to define differences must be at least a semi-metric (meet the standard requirements of a metric…

statistical-significance assumptions distance

asked Aug 17 '12 at 10:42

Tom Seaton

votes

8 answers

Perform K-means (or its close kin) clustering with only a distance matrix, not points-by-features data

I want to perform K-means clustering on objects I have, but the objects aren't described as points in space, i.e. by objects x features dataset. However, I am able to compute the distance between any two objects (it is based on a similarity…

machine-learning clustering data-mining k-means distance

asked Jul 24 '12 at 17:02

mouse

votes

4 answers

Why are mixed data a problem for euclidean-based clustering algorithms?

Most classical clustering and dimensionality reduction algorithms (hierarchical clustering, principal component analysis, k-means, self-organizing maps...) are designed specifically for numeric data, and their input data are seen as points in a…

clustering dimensionality-reduction distance self-organizing-maps mixed-type-data

asked Oct 29 '14 at 13:02

jubo

1,042
6
12

votes

1 answer

Link between variance and pairwise distances within a variable

Please, prove that if we have two variables (equal sample size) $X$ and $Y$ and the variance in $X$ is greater than in $Y$, then the sum of squared differences (i.e., squared Euclidean distances) between data points within $X$ is also greater than…

variance distance

asked Dec 21 '11 at 14:11

ttnphns

51,648
40
253
462

votes

4 answers

Comparing two histograms using Chi-Square distance

I want to compare two images of faces. I calculated their LBP-histograms. So now I need to compare these two histograms and get something that will tell how much these histograms are equal (0 - 100%). There are many ways of solving this task, but…

chi-squared-test distance image-processing histogram correspondence-analysis

asked Nov 29 '15 at 12:34

Anton Holovin

votes

9 answers

Pairwise Mahalanobis distances

I need to calculate the sample Mahalanobis distance in R between every pair of observations in a $n \times p$ matrix of covariates. I need a solution that is efficient, i.e. only $n(n-1)/2$ distances are calculated, and preferably implemented in…

r algorithms distance

asked Jul 26 '13 at 22:51

ahfoss

1,289
1
8
22

votes

5 answers

How I can convert distance (Euclidean) to similarity score

I am using $k$ means clustering to cluster speaker voices. When I compare an utterance with clustered speaker data I get (Euclidean distance-based) average distortion. This distance can be in range of $[0,\infty]$. I want to convert this distance to…

clustering k-means distance euclidean

asked Jun 23 '15 at 13:27

Muhammad

2 3

…

44 45 Next