How I can convert distance (Euclidean) to similarity score

Question

I am using $k$ means clustering to cluster speaker voices. When I compare an utterance with clustered speaker data I get (Euclidean distance-based) average distortion. This distance can be in range of $[0,\infty]$. I want to convert this distance to a $[0,1]$ similarity score. Please guide me on how I can achieve this.

TrynnaDoStat · Answer 1 · 2015-06-23T13:54:36.683

23

If $d(p_1,p_2)$ represents the euclidean distance from point $p_1$ to point $p_2$,

$$\frac{1}{1 + d(p_1, p_2)}$$

is commonly used.

edited Jun 23 '15 at 13:54

answered Jun 23 '15 at 13:34

TrynnaDoStat

7,414
3
23
39

Please correct me if I am wrong, if we have $X = (x_1,x_2,x_3,...,x_t)$ and $Y = (Y_1,Y_2,Y_3,...,Y_n)$ where each $ x $ and $ y $ is of dimension $ D $. Then we can define similarity such as, $$Similarity = \frac{1}{t} \sum\limits_{i=1}^t \frac{1}{ 1+ minDistance(x_i, Y)}$$. – Muhammad Jun 23 '15 at 22:27
1

I understand that the plus 1 in the denominator is to avoid divide by zero error. But I have found that the plus one value disproportionately affects d(p1,p2) values that are greater than 1 and ultimately reduces similarity score significantly. Is there another way to do this? Maybe s = 1-d(p1,p2) – aamir23 Aug 08 '18 at 02:13

score 13 · Answer 2 · edited Jun 23 '15 at 16:04

13

You could also use: $\frac{1}{e^{dist}}$ where dist is your desired distance function.

edited Jun 23 '15 at 16:04

Danica

21,852
1
59
115

answered Jun 23 '15 at 15:43

NeuroMorphing

525
2
12

Can you please give any reference book/documentation related to this equation in which you found it? @Dougal – Justlife Jun 11 '17 at 12:55
@AnimeshKumarPaul I didn't write this answer, just improved its formatting. But it's frequently used as a version of e.g. a "generalized RBF kernel"; see e.g. [here](https://stats.stackexchange.com/q/64126/9964). That question concerns whether the output is a positive definite kernel; if you don't care about that, though, it at least satisfies an intuitive notion of similarity that more distant points are less similar. – Danica Jun 11 '17 at 13:02
@Justlife : Google for this one "encyclopedia of distances" and pick the result with the pdf document. – NeuroMorphing Jun 12 '17 at 23:04
Which is similar to: $sim = e^{-dist}$ – user3352632 Feb 16 '21 at 20:19

Sycorax · Answer 3 · 2015-06-23T16:55:41.170

It sounds like you want something akin to cosine similarity, which is itself a similarity score in the unit interval. In fact, a direct relationship between Euclidean distance and cosine similarity exists!

Observe that $$ ||x-x^\prime||^2=(x-x^\prime)^T(x-x^\prime)=||x||+||x^\prime||-2||x-x^\prime||. $$

While cosine similarity is $$ f(x,x^\prime)=\frac{x^T x^\prime}{||x||||x^\prime||}=\cos(\theta) $$ where $\theta$ is the angle between $x$ and $x^\prime$.

When $||x||=||x^\prime||=1,$ we have $$ ||x-x^\prime||^2=2(1-f(x,x^\prime)) $$ and $$ f(x,x^\prime)=x^T x^\prime, $$

so

$$ 1-\frac{||x-x^\prime||^2}{2}=f(x,x^\prime)=\cos(\theta) $$ in this special case.

From a computational perspective, it may be more efficient to just compute the cosine, rather than Euclidean distance and then perform the transformation.

score 8 · Answer 4 · answered Jun 23 '15 at 15:59

8

How about a Gaussian kernel ?

$K(x, x') = \exp\left( -\frac{\| x - x' \|^2}{2\sigma^2} \right)$

The distance $\|x - x'\|$ is used in the exponent. The kernel value is in the range $[0, 1]$. There is one tuning parameter $\sigma$. Basically if $\sigma$ is high, $K(x, x')$ will be close to 1 for any $x, x'$. If $\sigma$ is low, a slight distance from $x$ to $x'$ will lead to $K(x,x')$ being close to 0.

answered Jun 23 '15 at 15:59

wij

1,893
11
18

1

Note that this answer and [@Unhandled exception's](http://stats.stackexchange.com/a/158303/9964) are very related: this is $\exp\left( - \gamma d(x, x')^2 \right)$, where that one [introducing a scaling factor] is $\exp\left( - \gamma d(x, x') \right)$, a Gaussian kernel with $\sqrt{d}$ as the metric. This will [still be a valid kernel](http://stats.stackexchange.com/a/148031/9964), though the OP doesn't necessarily care about that. – Danica Jun 23 '15 at 16:08

score 1 · Answer 5 · answered Apr 10 '19 at 00:38

1

If you are using a distance metric that is naturally between 0 and 1, like Hellinger distance. Then you can use 1 - distance to obtain similarity.

answered Apr 10 '19 at 00:38

Brad

522
4
10

How I can convert distance (Euclidean) to similarity score

5 Answers5

Linked