Questions tagged [metric]

A metric is a function that outputs a distance between 2 elements of a set & meets certain strict criteria (some 'distance' functions are not metrics).

A metric is a function that outputs a distance between 2 elements of a set. To meet the definition of a metric, a distance function must fulfill the following criteria:

There is no distance between an element and itself: $d(x_i,x_i)=0$.
If the distance between two elements is $0$, those elements are equivalent: $d(x_i,x_j)=0\implies x_i=x_j$.
All distances are non-negative: $d(x_i,x_j)\ge0$.
The distance between two elements is the same in either direction: $d(x_i,x_j)=d(x_j,x_i)$.
The distance between two elements is less than or equal to the sum of the distances between those elements and a third: $d(x_i,x_j)\le d(x_i,x_k)+d(x_j,x_k)$

343 questions

328

votes

8 answers

Why is Euclidean distance not a good metric in high dimensions?

I read that 'Euclidean distance is not a good distance in high dimensions'. I guess this statement has something to do with the curse of dimensionality, but what exactly? Besides, what is 'high dimensions'? I have been applying hierarchical…

asked May 18 '14 at 17:50

teaLeef

3,497
3
12
11

votes

5 answers

What is the advantages of Wasserstein metric compared to Kullback-Leibler divergence?

What is the practical difference between Wasserstein metric and Kullback-Leibler divergence? Wasserstein metric is also referred to as Earth mover's distance. From Wikipedia: Wasserstein (or Vaserstein) metric is a distance function defined between…

distributions kullback-leibler metric wasserstein

asked Aug 01 '17 at 13:54

Thomas Fauskanger

votes

6 answers

Percentage of overlapping regions of two normal distributions

I was wondering, given two normal distributions with $\sigma_1,\ \mu_1$ and $\sigma_2, \ \mu_2$ how can I calculate the percentage of overlapping regions of two distributions? I suppose this problem has a specific name, are you aware of any…

normal-distribution similarities metric bhattacharyya

asked Jun 22 '11 at 07:59

Ali Salehi

votes

4 answers

Recall and precision in classification

I read some definitions of recall and precision, though it is every time in the context of information retrieval. I was wondering if someone could explain this a bit more in a classification context and maybe illustrate some examples. Say for…

machine-learning metric

asked Jun 26 '13 at 09:22

Olivier_s_j

1,055
2
11
25

votes

5 answers

Loss function and evaluation metric

When building a learning algorithm we are looking to maximize a given evaluation metric (say accuracy), but the algorithm will try to optimize a different loss function during learning (say MSE/entropy). Why are the evaluation metrics not used as…

machine-learning loss-functions metric

asked Nov 28 '18 at 19:10

Jesús Ros

votes

5 answers

How to control the cost of misclassification in Random Forests?

Is it possible to control the cost of misclassification in the R package randomForest? In my own work false negatives (e.g., missing in error that a person may have a disease) are far more costly than false positives. The package rpart allows the…

r classification random-forest loss-functions metric

asked Jan 04 '13 at 11:02

user5944

votes

2 answers

Comparing clusterings: Rand Index vs Variation of Information

I was wondering if anybody had any insight or intuition behind the difference between the Variation of Information and the Rand Index for comparing clusterings. I have read the paper "Comparing Clusterings - An Information Based Distance" by Marina…

machine-learning clustering metric

asked Mar 20 '12 at 19:59

Amelio Vazquez-Reina

17,546
26
74
110

votes

3 answers

Jensen Shannon Divergence vs Kullback-Leibler Divergence?

I know that KL Divergence is not symmetric and it cannot be strictly considered as a metric. If so, why is it used when JS Divergence satisfies the required properties for a metric? Are there scenarios where KL divergence can be used but not JS…

probability distributions kullback-leibler metric

asked Sep 29 '14 at 18:23

user2761431

votes

2 answers

How do you compare two Gaussian Processes?

Kullback-Leibler divergence is a metric to compare two probability density functions, but what metric is used to compare two GP's $X$ and $Y$?

gaussian-process metric

asked May 29 '13 at 00:20

pushkar

votes

4 answers

Is there any probability distance that preserves all properties of a metric?

In studying Kullback–Leibler distance, there are two things we learn very quickly is that it does not respect neither the triangle inequality nor the symmetry, required properties of a metric. My question is whether there is any metric of…

distributions distance metric

asked Feb 17 '14 at 17:14

Jorge Leitao

1,219
10
24

votes

4 answers

Is triangle inequality fulfilled for these correlation-based distances?

For hierarchical clustering I often see the following two "metrics" (they aren't exactly speaking) for measuring the distance between two random variables $X$ and $Y$: $\newcommand{\Cor}{\mathrm{Cor}}$ \begin{align} d_1(X,Y) &= 1-|\Cor(X,Y)|, …

correlation clustering distance metric

asked Jan 27 '15 at 14:18

Linda

votes

1 answer

Clustering inertia formula in scikit learn

I would like to code a kmeans clustering in python using pandas and scikit learn. In order to select the good k, I would like to code the Gap Statistic from Tibshirani and al 2001 (pdf). I would like to know if I could use inertia_ result from…

clustering python k-means scikit-learn metric

asked Dec 02 '13 at 16:20

Scratch

votes

2 answers

Metrics for covariance matrices: drawbacks and strengths

What are the "best" metrics for covariance matrices, and why? It is clear to me that Frobenius&c are not appropriate, and angle parametrizations have their issues too. Intuitively one might want a compromise between these two, but I would also like…

covariance metric

asked Jul 01 '13 at 16:19

Quartz

votes

3 answers

Distance metric and curse of dimensions

Some where I read a note that if you have many parameters $(x_1, x_2, \ldots, x_n)$ and you try to find a "similarity metric" between these vectors, you may have a "curse of dimensioality". I believe it meant that most similarity scores will be…

distance similarities metric

asked Jan 23 '12 at 11:14

Gerenuk

1,833
3
14
20

votes

2 answers

Does a distance have to be a "metric" for an hierarchical clustering to be valid on it?

Let us say that we define a distance, which is not a metric, between N items. Based on this distance we then use an Agglomerative hierarchical clustering. Can we use each of the known algorithm (single/maximum/avaerage linkage etc), to get…

clustering multilevel-analysis metric hierarchical-clustering

asked Aug 04 '11 at 21:38

Tal Galili

19,935
32
133
195

2 3

…

22 23 Next