Euclidean distance score and similarity

Question

I'm just working with the book Collective Intelligence (by Toby Segaran) and came across the Euclidean distance score. In the book the author shows how to calculate the similarity between two recommendation arrays (i.e. $\textrm{person} \times \textrm{movie} \mapsto \textrm{score})$ .

He calculates the Euclidean distance for two persons $p_1$ and $p_2$ by $$d(p_1, p_2) = \sqrt{\sum_{i~\in~\textrm{item}} (s_{p_1} - s_{p_2})^2} $$

This makes completely sense to me. What I don't really understand is why he calculates at the end the following to get a "distance based similarity":

$$ \frac{1}{1 + d(p_1, p_2)} $$

So, I somehow get that this must be the conversion from a distance to a similarity (right?). But why does the formular looks like this? Can someone explain that?

There can be many ways to convert dissimilarities and similarities into each other - the specific formula depends on what make sense to you and for the future analysis. In that textbook the author preferred the formula you show for some reason; someone else in a different situation might choose another formula. The most _geometrically correct_ way to convert _euclidean_ distance into a similarity would follow from [cosine theorem](http://stats.stackexchange.com/a/36158/3277) under data-are-centered condition and is described [here](http://stats.stackexchange.com/a/12503/3277) in par. 1. — ttnphns, Mar 23 '13 at 12:51
Ok! But If I understand right you don't really convert the euclidean distance into a similarity, but you just use a different function that returns you values within 0 and 1 (because of the cosine), right? I mean it seems different to me than calculating all the distances and then converting them to a similarity by e.g. interpolating between the smallest and the largest distance. Right? — navige, Mar 23 '13 at 13:26
If you have a square symmetric matrix of squared euclidean distances and you perform "double centering" operation on it then you get the matrix of the scalar products which would be observed when you put the origin od the euclidean space in the centre of your configuration of objects. These scalar products _are_ angle-type similarities. They are much like _covariances_. They are not bound within range 0-1, they can be negative, positive, and diagonal elements are not necessarily 1. Still, they are similarities. — ttnphns, Mar 23 '13 at 13:55

score 18 · Accepted Answer · answered Mar 23 '13 at 12:18

18

The inverse is to change from distance to similarity.

The 1 in the denominator is to make it so that the maximum value is 1 (if the distance is 0).

The square root - I am not sure. If distance is usually larger than 1, the root will make large distances less important; if distance is less than 1, it will make large distances more important.

answered Mar 23 '13 at 12:18

Peter Flom

94,055
35
143
276

1

Sorry! Square root was wrong. The author actually put it in the second formula, but left it out in the first. So it shouldn't be there – navige Mar 23 '13 at 12:20
1

Yes, but your hint with setting the maximum value to 1 makes sense! Thanks! – navige Mar 23 '13 at 12:21

score 6 · Answer 2 · edited Mar 23 '13 at 17:19

6

To measure the distance and similarity (in the semantic sense) the first thing to check is if you are moving in a Euclidean space or not. An empirical way to verify this is to estimate the distance of a pair of values for which you know the meaning.

edited Mar 23 '13 at 17:19

gung - Reinstate Monica

132,789
81
357
650

answered Mar 23 '13 at 16:59

Claudio Martines

61
2

Jay Patel · Answer 3 · 2020-09-30T06:54:26.637

2

As you mentioned you know the calculation of Euclidence distance so I am explaining the second formula. Euclidean formula calculates the distance, which will be smaller for people or items who are more similar. Like if they are the same then the distance is 0 and totally different then higher than 0.

However, we need a function that gives a higher value. This can be done by adding 1 to the function(so you don't get a division-by-zero error and the maximum value remains 1) and inverting it. Like if distance 0 then the similarity score 1/1=1

Let say the Euclidean distance between item 1 and item 2 is 4 and between item 1 and item 3 is 0 (means they are 100% similar). These are the distance of items in a virtual space. smaller the distance value means they are near to each other means more likely to similar. Now we want numerical value such that it gives a higher number if they are much similar. So we can inverse distance value. But what if we have distance is 0 that's why we add 1 in the denominator. so similarity score for item 1 and 2 is 1/(1+4) = 0.2 and for item1 and item 3 is 1/(1+0) = 1

edited Sep 30 '20 at 06:54

answered Aug 06 '18 at 03:44

Jay Patel

21
2

1

I don't understand this answer. – Michael R. Chernick Aug 06 '18 at 05:05
1

ok let say the Euclidean distance between item 1 and item 2 is 4 and between item 1 and item 3 is 0 (means they are 100% similar). These are the distance of items in a virtual space. smaller the distance value means they are near to each other means more likely to similar. Now we want numerical value such that it gives a higher number if they are much similar. So we can inverse distance value. But what if we have distance is 0 that why we add 1 in the denominator. so similarity score for item 1 and 2 is 1/(1+4) = 0.2 and for item1 and item 3 is 1/(1+0) = 0 – Jay Patel Aug 08 '18 at 10:33
1

Maybe you are talking about some sort of distance measure but Euclidean distance follows a specific formula regarding a vector space. – Michael R. Chernick Aug 08 '18 at 18:36
1

I AM EXPLAINING why WE calculates at the end the following to get a "distance based similarity": $1/1+d(p1,p2)$ – Jay Patel Aug 11 '18 at 04:46

score 1 · Answer 4 · answered Aug 20 '20 at 04:18

Euclidean is basically calculate the dissimilarity of two vectors, because it'll return 0 if two vectors are similar. While Cosine Similarity gives 1 in return to similarity. Somewhat the writer on that book wants a similarity-based measure, but he wants to use Euclidean. So, in order to get a similarity-based distance, he flipped the formula and added it with 1, so that it gives 1 when two vectors are similar. Go give it a check, try it with 2 vectors contain same values.

Anyway, may I know on what page did you find that formula? I need that for my thesis. Thank you in advance. — vinesia yolanda, Aug 20 '20 at 04:28

Euclidean distance score and similarity

4 Answers4