17

Recently I have started looking for the definition of normalized Euclidean distance between two real vectors $u$ and $v$. So far, I have discovered two apparently unrelated definitions:

http://en.wikipedia.org/wiki/Mahalanobis_distance

and

http://reference.wolfram.com/language/ref/NormalizedSquaredEuclideanDistance.html

I am familiar with the context of the Wikipedia definition. However, I am yet to discover any context for the Wolfram.com definition:

NormalizedSquaredEuclideanDistance[u,v] is equivalent to 1/2*Norm[(u-Mean[u])-(v-Mean[v])]^2/(Norm[u-Mean[u]]^2+Norm[v-Mean[v]]^2)

$$ NED^2[u,v] = 0.5 \frac{ Var[u-v] }{ Var[u] + Var[v] }$$

=========================================================================

The intuitive meaning of this definition is not very clear. Any help on this will be appreciated.

Update:

I find that the following intuitive explanation for the Wolfram.com definition is given here

I am repeating that below:

Note that it is a DistanceFunction option for ImageDistance. Maybe that helps some to see the context where it is used.

The relation to SquaredEuclideanDistance is:

NormalizedSquaredEuclideanDistance[x, y] == (1/2) SquaredEuclideanDistance[x - Mean[x], y - Mean[y]]/ (Norm[x - Mean[x]]^2 + Norm[y - Mean[y]]^2)

So we see it is "normalized" "squared euclidean distance" between the "difference of each vector with its mean"...

What is the meaning about 1/2 at the beggining of the formula?

The 1/2 is just there such that the answer is bounded between 0 and 1, rather than 0 and 2.

Charlie Parker
  • 5,836
  • 11
  • 57
  • 113
PTDS
  • 679
  • 1
  • 4
  • 10
  • 1
    the wolfram link doesn't work for me, can you explicitly give the definition they are thinking about? – Charlie Parker Nov 27 '20 at 18:54
  • what does var(u-v) mean? are they vectors, random variables, functions what are they? – Charlie Parker Nov 27 '20 at 19:02
  • what do you think of cosine similarity or R^2? – Charlie Parker Nov 27 '20 at 19:06
  • it seems this answer explains the wolfram equation better than wolfram: https://stackoverflow.com/questions/38161071/how-to-calculate-normalized-euclidean-distance-on-two-vectors/54170399 – Charlie Parker Nov 27 '20 at 19:44
  • is this useful to you? https://math.stackexchange.com/questions/488964/the-definition-of-nmse-normalized-mean-square-error or https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/ ? – Charlie Parker Nov 27 '20 at 20:37
  • what is wrong with explained variance? https://scikit-learn.org/stable/modules/generated/sklearn.metrics.explained_variance_score.html#sklearn.metrics.explained_variance_score – Charlie Parker Nov 27 '20 at 20:37
  • @ Charlie Parker: Please clarify the question(s). I'll try to write a detailed answer. – PTDS Nov 29 '20 at 05:21
  • crossposted: https://www.reddit.com/r/AskStatistics/comments/k6ncx8/definition_of_normalized_euclidean_distance/ https://www.quora.com/unanswered/How-can-normalized-Euclidean-distance-be-defined https://stats.stackexchange.com/questions/136232/definition-of-normalized-euclidean-distance – Charlie Parker Dec 04 '20 at 15:49
  • Can you confirm me what the equation for NED is? is it $$ NED[u,v] = 0.5 \frac{ Var[u-v] }{ Var[u] + Var[v] }$$? – Charlie Parker Dec 08 '20 at 23:15
  • what is wrong with this normalized MSE: $$ NMSE[u,v] = \frac{MSE(u, v) }{ Var[u- v]}$$ ? – Charlie Parker Jan 28 '21 at 22:06
  • this might be of interest to other readers: https://discuss.pytorch.org/t/how-does-one-compute-the-normalized-euclidean-distance-similarity-in-a-numerically-stable-way-in-a-vectorized-way-in-pytorch/110829 – Charlie Parker Feb 03 '21 at 18:22
  • $var[u - v] = v[u] + v[v] - 2 (u, v)$ so, $ned(u, v) = 1/2 - (u, v) /((u) + (v))$. – Hunaphu Feb 11 '21 at 23:21

4 Answers4

8

The normalized squared euclidean distance gives the squared distance between two vectors where there lengths have been scaled to have unit norm. This is helpful when the direction of the vector is meaningful but the magnitude is not. It's not related to Mahalanobis distance.

Aaron
  • 3,025
  • 14
  • 24
  • Thanks a lot, Aaron. Please search the string "normalized Euclidean distance" in the Wikipedia page http://en.wikipedia.org/wiki/Mahalanobis_distance and let me know if the definition given there is wrong. – PTDS Feb 04 '15 at 07:47
  • 1
    I think one desirable property of NED is that it should always lie between 0 and 1. The Wolfram.com definition has this property. The Wikipedia definition, squared and divided by N [also replace the SD s_i in the denominator by the RANGE of the i-th component of the vector], also has this property. If I am not mistaken, your definition does not have this property, right? – PTDS Feb 05 '15 at 20:56
  • @PTDS the string doesn't yield anything to me. Can you share what you had in mind? – Charlie Parker Nov 27 '20 at 18:54
  • @Charlie Parker: Please search for "standardized Euclidean distance" instead – PTDS Nov 28 '20 at 19:42
  • @PTDS I did. Thanks. When you say it ranges between 0-1 do you mean that it ranges in expectation? also is the standard deviation wrt x or y? `where si is the standard deviation of the xi and yi over the sample set.` i.e. that over xi AND yi is confusing me. I don't know what that the wiki article means. When I think of standard deviations I think of a single distribution and in general xi and yi might have different distributions (e.g. one is a set of predictions where our predictor function induces a new random variable F being compared to Y). – Charlie Parker Dec 01 '20 at 17:29
4

The weighted Minkowski distance of order $q$ between two real vectors $u, v \in \mathbb{R}^n$ is given by

$$d^{(q)} (u, v) = \left(\sum_{i=1}^n w_i (u_i - v_i)^q \right)^\frac{1}{q}$$

[See equation $3.1.7$, Clustering Methodology for Symbolic Data By Lynne Billard, Edwin Diday (2019)]

If we choose $w_i = \frac{1}{n}$ and $q = 2$, we have the so called "normalized Euclidean distance" between $u$ and $v$

$$d_{NE}^2(u, v) = \frac{1}{n} \sum_{i=1}^n \left(u_i - v_i \right)^2$$

Unfortunately, the above definition does not have nice properties...

Another definition given at Wolfram.com has one nice property; $d_W$ is always between $0$ and $1$

NormalizedSquaredEuclideanDistance[u,v] is equivalent to 1/2*Norm[(u-Mean[u])-(v-Mean[v])]^2/(Norm[u-Mean[u]]^2+Norm[v-Mean[v]]^2)

For computational purposes, I simplified the definitions given above:

$$d_{NE}^2(u, v) = \mathrm{Var}(u-v) + (\bar{u} - \bar{v})^2$$

$$NED^2(u, v) = d_W ^2(u, v) = \frac{1}{2}\frac{\mathrm{Var}(u-v)}{\mathrm{Var}(u) + \mathrm{Var}(v)}$$ where $\mathrm{Var}(x) = \displaystyle \frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^2$ and $\bar{x} = \frac{\sum_{i=1}^n x_i}{n}$

A few properties/special cases:

If (i) $||u||_2 = ||v||_2 = 1$, i.e., $\sum_{i=1}^n u_i^2 = \sum_{i=1}^n v_i^2 = 1$

and

(ii) $\bar{u} = \bar{v} = 0$, i.e., $\sum_{i=1}^n u_i = \sum_{i=1}^n v_i = 0$

then

(A) $\mathrm{Var}(u) = \mathrm{Var}(v) = \frac{1}{n}$, $\mathrm{Cov}(u, v) = \frac{1}{n}\sum_{i=1}^n u_i v_i$ and $\rho(u,v) = \sum_{i=1}^n u_i v_i = \cos \theta$

where $\theta$ is the angle between the vectors $u$ and $v$

(B) $$d_{NE}^2(u, v) = \frac{2}{n}(1 - \cos \theta)$$

(C) $$d_W^2 (u, v) = \frac{1}{2}(1 - \cos \theta)$$

Discussion:

Suppose we define the following distance measure $d_E^2(u, v)$ between the vectors $u, v \in \mathbb{R^n}$

$$d_E^2(u, v) = \frac{\sum_{i=1}^n (u_i - v_i)^2}{\sum_{i=1}^n u_i^2 + \sum_{i=1}^n v_i^2}$$

This measure lies between $0$ and $\sqrt{2}$

The Wolfram.com definition is closely related to the above. Instead of $u$ and $v$, it considers the mean centered version of the above definition and adds a factor of $\frac{1}{2}$ so that the value lies between $0$ and $1$

Proof: $$\sum_{i=1}^n (u_i - v_i)^2 \geq 0 \implies \frac{2\sum_{i=1}^n u_i v_i}{\sum_{i=1}^n u_i^2 + \sum_{i=1}^n v_i^2} \leq 1$$

$$\sum_{i=1}^n (u_i + v_i)^2 \geq 0 \implies -1 \leq \frac{2\sum_{i=1}^n u_i v_i}{\sum_{i=1}^n u_i^2 + \sum_{i=1}^n v_i^2}$$

Combining the above two inequalities:

$$-1 \leq \frac{2\sum_{i=1}^n u_i v_i}{\sum_{i=1}^n u_i^2 + \sum_{i=1}^n v_i^2} \leq 1$$

Or, $$-1 \leq -\frac{2\sum_{i=1}^n u_i v_i}{\sum_{i=1}^n u_i^2 + \sum_{i=1}^n v_i^2} \leq 1$$

Or, $$0 \leq 1-\frac{2\sum_{i=1}^n u_i v_i}{\sum_{i=1}^n u_i^2 + \sum_{i=1}^n v_i^2} \leq 2$$

Or, $$0 \leq \frac{1}{2}\left( 1-\frac{2\sum_{i=1}^n u_i v_i}{\sum_{i=1}^n u_i^2 + \sum_{i=1}^n v_i^2} \right)\leq 1$$

Or, $$0 \leq \frac{1}{2} d_E^2(u, v)\leq 1$$

Or, $$0 \leq d_E(u, v)\leq \sqrt{2}$$

How do we prove that $$0 \leq d_W ^2(u, v) = \frac{1}{2}\frac{\mathrm{Var}(u-v)}{\mathrm{Var}(u) + \mathrm{Var}(v)} \leq 1$$

Proof:

We are required to prove that (TPT)

$$0 \leq \frac{1}{2}\frac{\mathrm{Var}(u-v)}{\mathrm{Var}(u) + \mathrm{Var}(v)} \leq 1$$

i.e., TPT $$0 \leq \mathrm{Var}(u-v) \leq 2(\mathrm{Var}(u) + \mathrm{Var}(v))$$

Now $\mathrm{Var}(u-v) \geq 0$ since variance is always non-negative.

We need TPT $$\mathrm{Var}(u-v) \leq 2(\mathrm{Var}(u) + \mathrm{Var}(v))$$

i.e., TPT $$\mathrm{Var}(u-v) = \mathrm{Var}(u) + \mathrm{Var}(v) - 2 \mathrm{Cov}(u, v) \leq 2(\mathrm{Var}(u) + \mathrm{Var}(v))$$

i.e., TPT $$\mathrm{Var}(u) + \mathrm{Var}(v) + 2 \mathrm{Cov}(u, v) \geq 0$$

i.e., TPT $$\mathrm{Var}(u+v) \geq 0$$ which is always true since variance is always non-negative.

Charlie Parker
  • 5,836
  • 11
  • 57
  • 113
PTDS
  • 679
  • 1
  • 4
  • 10
  • I think at the root of what is bothering me is that $d^2_w(X_n, Y_n)$ lacks motivation. It seems to come out of nowhere to me and doesn't measure anything intuitively obvious to me. For example, why isn't it preferred to some sort of symmetric version of proportion of explained variance something like $$ R^2_{special}(Y_n, F_n) = \frac{EVar[F_n, \bar y] + EVar[Y_n, \bar f] }{ Var[Y_n] + Var[Y_n] } $$. For me this $R^2_{special}(Y_n, F_n)$ seems really nice. It's intuitive (proportion of total variance explained for each vector in either direction). It is symmetric. Likely btw -1 to 1. etc. – Charlie Parker Dec 01 '20 at 17:53
  • Where $$ EVar[F_n, \bar y] = \frac{1}{n} \sum^n_{i=1} (f_n - \bar y)^2$$ is the explain variance. Note the above definition of $R^2_{special}$ is some sort of modified $R^2$ that tries to be normalized (e.g. between -1 to 1) and is crucially symmetric. These two properties are key to me. I don't know if its possible to guarantee for sure that its always bounded in an interval e.g. [-1,1] but as long as its sort of bounded most of the time I'm personally ok. Perfect guarantees are good I don't like it when cosines come into the picture. Angle measurements are obviously good metrics to me. – Charlie Parker Dec 01 '20 at 17:57
  • stupid SO doesn't let me correct but it should be $$ Var[F_n] $$ for one of the terms in the denominator. – Charlie Parker Dec 01 '20 at 18:00
  • Sorry some typos that SO doesn't let me correct. I personally don't like cosine similarities. I don't know what the angle between two data sets mean when the vectors are data sets (my usually application). I do like perfect guarantees but they are hard to come by (e.g. guaranteeing that it's for sure bounded is likely hard and questionable if its worth the effort). In machine learning we rarely divide by the magnitude of the vector so its not transformation I tend to like. In summary variances seem much better as they are related to the distribution. – Charlie Parker Dec 01 '20 at 19:15
  • Please see the "discussion" part in the above answer. – PTDS Dec 02 '20 at 02:07
  • your clarification are useful although my central question is still unaddressed. Why are we not using explained like variances in the numberator but instead we are using squared error like metrics in the numerator. Also, perhaps its obvious to you but I don't see why wolfram's NED is guaranteed to be between 0-1, especially since these quantities can be random variables so how isn't it a statement in expectation (or something like that. – Charlie Parker Dec 03 '20 at 16:21
  • I wish there were better existing definitions! Unfortunately there are (probably) only three: two of them are mentioned in my answer and the third one is the Wikipedia definition. The strength of the two definitions in my answer is you can always compute the distance between $u$ and $v$ when you don’t have any extra information. – PTDS Dec 03 '20 at 18:49
  • If you have extra information, you can use Wikipedia definition and also can come up with some new definitions as you suggested. – PTDS Dec 03 '20 at 18:50
  • Also I am trying to show why $0 \leq d_E(u, v) \leq \sqrt{2}$ – PTDS Dec 03 '20 at 19:22
  • can you confirm me if this is the equation for NED? $$ NED[u,v] = 0.5 \frac{ Var[u-v] }{ Var[u] + Var[v] }$$ Is that correct? – Charlie Parker Dec 08 '20 at 23:16
  • 1
    Right, it is actually the square root of it. Please see the expression for $d_W^2 (u, v)$ – PTDS Dec 08 '20 at 23:32
  • One question, I saw you have proofs that $d_E$ is bounded $[0,1]$ but but I didn't see a proof for $NED$ (your $d_W$). Wolfram doesn't have one either. Can you show me or link me to a proof of it please? – Charlie Parker Jan 28 '21 at 17:45
  • 1
    Yes, a direct proof is definitely possible. I'll try to add one soon. – PTDS Jan 28 '21 at 18:29
  • thank you! let me know so I can thank you with a bounty :) (then we can delete these two comments to clean up the comment section) – Charlie Parker Jan 28 '21 at 18:54
  • 1
    @Charlie Parker I have added the proof. – PTDS Jan 28 '21 at 19:11
  • Also, is there any relation of $NED$ (your $d^2_w$) with Mohalonobis distance $d^2_M(x,y) = (x - y)^\top S^{-1} (x -y) $ where $S = Cov(x,y)$ and x,y come from the same distribution? https://en.wikipedia.org/wiki/Mahalanobis_distance – Charlie Parker Jan 28 '21 at 19:49
  • btw it seems SO doesn't let me award the bounty until 23 hours pass, apologies! Probably should have chosen the reward existing answer...? anyway lets wait. – Charlie Parker Jan 28 '21 at 19:51
  • what is wrong with this normalized MSE: $$ NMSE[u,v] = \frac{MSE(u, v) }{ Var[u- v]}$$ ? – Charlie Parker Jan 28 '21 at 22:06
  • Please define $MSE(u, v)$. Is it equal to $d_{NE}(u, v)$ where $d_{NE}^2(u, v) = \frac{1}{n} \sum_{i=1}^n \left(u_i - v_i \right)^2$ ? – PTDS Feb 01 '21 at 19:18
  • MSE (mean squared error) is equal to $d^2_{NE}(u,v)$ (btw thanks for including the equation). – Charlie Parker Feb 01 '21 at 21:55
  • Since $d_{NE}^2(u, v) = \mathrm{Var}(u-v) + (\bar{u} - \bar{v})^2$, observe that (according to your definition) $NMSE[u, v] \geq 1$ However, if you use the other definition, then $d_W^2(u, v) \leq 1$ – PTDS Feb 01 '21 at 22:00
  • do you think it makes sense to define normalize euclidean **similarity** as $$NES(u,v) = 1 - NED(u,v)$$ ? – Charlie Parker Feb 02 '21 at 22:49
  • Probably yes, provided you use $d_W(u,v)$ as the definition of $NED(u,v)$ – PTDS Feb 02 '21 at 23:14
  • yes I suggest $$ NED(u,v) = d_W(u, v) = \frac{1}{2}\frac{\mathrm{Var}(u-v)}{\mathrm{Var}(u) + \mathrm{Var}(v)}$$. Note that I am not suggesting any normalization like $\| u \| = \| v \| = 1$ which I hope doesn't make a difference. Let me know if NED = $d_w$ is sufficient. It might be important to emphasizes we take the square root first to get $d_W(u,v)$. – Charlie Parker Feb 03 '21 at 16:57
  • fyi, I wonder if it's important to add a small value to the denominator for numerical stability. I see that happens often in pytorch with a value of `eps=1e-8`. What do you think? – Charlie Parker Feb 03 '21 at 18:10
  • I just realized I also have the special case when the vectors are of size 1 (i.e. a single number). In that case the variance is not defined of course. I decided to replace the equation above with square differences and dividing by the square...as in $$ NED^2(a, b) = \frac{1}{2} \frac{(a-b)^2}{ a^2 + b^2 + \epsilon} $$ what do you think of this...is it sort of strange? Does it make sense? Code: `ned_2 = 0.5 * ((x1 - x2)**2 / (x1**2 + x2**2 + eps))` – Charlie Parker Feb 09 '21 at 23:44
  • 1
    It makes sense... 1. It is dimensionless 2. $0 \leq NED^2(a, b) \leq 1$ – PTDS Feb 10 '21 at 00:33
  • Perhaps to shed some light on what seemed a naive question for cosine similarity it doesn't make sense e.g. $$ CosineSim(a, b) = \frac{ a^T b }{ \|a\| \|b\| } = \frac{a b}{a b } = 1$$ which is weird but NED doesn't have this issue... – Charlie Parker Feb 10 '21 at 20:16
  • when the MSE $d^2_E$ decreases should NED $d^2_W$ always decreased? I wanted to use 1-NED as a proxy to regression accuracy. – Charlie Parker Mar 03 '21 at 20:31
  • Is it possible to maybe add an additional name to match the OPs notation? e.g. $d_w = NED$. – Charlie Parker Sep 27 '21 at 17:32
2

Here is one way of thinking about the Normalised Squared Euclidean Distance $NED^2$, defined as $$NED^2(u,v) = 0.5 \frac{ \text{Var}(u-v) }{ \text{Var}(u) + \text{Var}(v) }$$ for two vectors $u,v\in\mathbb{R}^k$.

This definition does not appear very much in the scientific literature. I can see at least two problems with this definition. First, it does not make sense in the case where the dimension $k=1$ because all the variances are zero in this case. Secondly, if both $u$ and $v$ are constant, i.e. $u_i=c$, $v_i=c'$ for all $i$, then the distance is undefined regardless of $k$. In fact, this second scenario covers the first as a special case.

One principle by which to handle these problems with the definition is to consider the underlying context. Although not well used in the literature, I suspect that $NED^2$ was originally defined in the context of image processing of the kind described in this question. Here, an image is regarded as a vector of pixel intensity values, so we may think of $u$ and $v$ as representing images we wish to compare, with $k$ being the number of pixels in each image. Frequently, in image processing, we are only interested in relative spatial variation in pixel intensities rather than the absolute values of the pixel intensities, which motivates the use of a distance measure which 'de-means' the pixel intensity vectors. Two images which are 'shifts' of each other, so that $u_i=v_i+c$ for all $i$, are essentially 'the same' for many purposes. So, roughly speaking, $NED^2$ quantifies the variation in the difference image $u-v$, normalised by the sum of the variation apparent in the two original images $u$,$v$.

With this in mind, let's go back to the problem with the definition of $NED^2$. If $u_i=c$, $v_i=c'$ for all $i$, then $NED^2$ is undefined. However, the images are just shifts of one another, so should be regarded as essentially the same. Therefore, in the context of image processing I suggest that $NED^2$ should be set to zero in all cases where it is apparently undefined.

How should you proceed if you are working in in a different context or application area? I can see three possible outcomes:

  1. The same principles apply as for image processing, so you define $NED^2$ to be zero in all undefined cases.
  2. The context motivates an alternative definition of $NED^2$ in the undefined cases e.g. the one you have proposed in the above question.
  3. The problems with the definition motivate you to reject $NED^2$ as a useful distance measure, so you look for other measures instead.

Which of these three options applies will depend on your problem and your reasons for using $NED^2$.

Additional details: more formally, $NED^2$ is really a distance measure on the quotient vector space $\mathbb{R}^k/\mathbb{R}\mathbf{1}$ where $\mathbf{1}$ denotes the vector $(1,1,\cdots,1)$. This is just a consequence of $NED^2$ being invariant to adding multiples of $\mathbf{1}$ to either $u$ or $v$. On the quotient space $NED^2$ is defined everywhere except at $([\mathbf{0}],[\mathbf{0}])$, where $\mathbf{0}$ is the zero vector and $[\mathbf{0}]=\mathbb{R}\mathbf{1}$ is the equivalence class of the zero vector. In this setting it seems very logical to define $NED^2$ to be zero at this single undefined point.

S. Catterall
  • 3,672
  • 1
  • 10
  • 18
  • when the MSE decreases should NED always decreased? I wanted to use 1-NED as a proxy to regression accuracy. – Charlie Parker Mar 03 '21 at 20:32
  • (btw one strong point for NED^2 that say R^2 doesn't have is that it is guaranteed to always be bounded by 0 and 1. I've ran experiments were R^2 gives non-sensical results which are not interpretable at all and interpretable is important to me right now) – Charlie Parker Mar 03 '21 at 21:09
0

I believe this is the correct implementation in pytorch (should be easy to translate to numpy etc):

import torch.nn as nn


def ned(x1, x2, dim=1, eps=1e-8):
    ned_2 = 0.5 * ((x1 - x2).var(dim=dim) / (x1.var(dim=dim) + x2.var(dim=dim) + eps))
    return ned_2 ** 0.5

def nes(x1, x2, dim=1, eps=1e-8):
    return 1 - ned(x1, x2, dim, eps)

dim = 1  # apply cosine accross the second dimension/feature dimension

k = 4  # number of examples
d = 8  # dimension of feature space
x1 = torch.randn(k, d)
x2 = x1 * 3
print(f'x1 = {x1.size()}')
ned_tensor = ned(x1, x2, dim=dim)
print(ned_tensor)
print(ned_tensor.size())
print(nes(x1, x2, dim=dim))

output:

x1 = torch.Size([4, 8])
tensor([0.4472, 0.4472, 0.4472, 0.4472])
torch.Size([4])
tensor([0.5528, 0.5528, 0.5528, 0.5528])

feel free to comment if you see anything wrong.


Related:


Update: Edge cases taken care of

def ned_torch(x1: torch.Tensor, x2: torch.Tensor, dim=1, eps=1e-8) -> torch.Tensor:
    """
    Normalized eucledian distance in pytorch.

    Cases:
        1. For comparison of two vecs directly make sure vecs are of size [B] e.g. when using nes as a loss function.
            in this case each number is not considered a representation but a number and B is the entire vector to
            compare x1 and x2.
        2. For comparison of two batch of representation of size 1D (e.g. scores) make sure it's of shape [B, 1].
            In this case each number *is* the representation of the example. Thus a collection of reps
            [B, 1] is mapped to a rep of the same size [B, 1]. Note usually D does decrease since reps are not of size 1
            (see case 3)
        3. For the rest specify the dimension. Common use case [B, D] -> [B, 1] for comparing two set of
            activations of size D. In the case when D=1 then we have [B, 1] -> [B, 1]. If you meant x1, x2 [D, 1] to be
            two vectors of size D to be compare feed them with shape [D].

    https://discuss.pytorch.org/t/how-does-one-compute-the-normalized-euclidean-distance-similarity-in-a-numerically-stable-way-in-a-vectorized-way-in-pytorch/110829
    https://stats.stackexchange.com/questions/136232/definition-of-normalized-euclidean-distance/498753?noredirect=1#comment937825_498753
    """
    # to compute ned for two individual vectors e.g to compute a loss (NOT BATCHES/COLLECTIONS of vectorsc)
    if len(x1.size()) == 1:
        # [K] -> [1]
        ned_2 = 0.5 * ((x1 - x2).var() / (x1.var() + x2.var() + eps))
    # if the input is a (row) vector e.g. when comparing two batches of acts of D=1 like with scores right before sf
    elif x1.size() == torch.Size([x1.size(0), 1]):  # note this special case is needed since var over dim=1 is nan (1 value has no variance).
        # [B, 1] -> [B]
        ned_2 = 0.5 * ((x1 - x2)**2 / (x1**2 + x2**2 + eps)).squeeze()  # Squeeze important to be consistent with .var, otherwise tensors of different sizes come out without the user expecting it
    # common case is if input is a batch
    else:
        # e.g. [B, D] -> [B]
        ned_2 = 0.5 * ((x1 - x2).var(dim=dim) / (x1.var(dim=dim) + x2.var(dim=dim) + eps))
    return ned_2 ** 0.5

def nes_torch(x1, x2, dim=1, eps=1e-8):
    return 1 - ned_torch(x1, x2, dim, eps)

def nes_torch(x1, x2, dim=1, eps=1e-8):
    return 1 - ned_torch(x1, x2, dim, eps)

repo: https://github.com/brando90/Normalized-Eucledian-Distance-and-Similarity

Charlie Parker
  • 5,836
  • 11
  • 57
  • 113