1

If we want to calculate the squared distance between 2 vectors, $x$ and $y$, we use the dot product:

$$||x-y||^2 = (x-y)(x-y)^T = xx^T - 2xy + yy^T$$

The question is, how to generalize this concept to more than one vector, i.e. calculating the Euclidean Distance Matrix between two sets of vectors, $X$ and $Y$.

In this article, the author states the method of vectorizing the calculation of the Euclidean Distance Matrix:

  def compute_distances_no_loops(self, X):
        dists = -2 * np.dot(X, self.X_train.T) + np.sum(self.X_train**2, axis=1) 
                + np.sum(X**2, axis=1)[:, np.newaxis]
        return dists

But I don't see where this came from. How could the above formula be derived?

  • From [here](https://stats.stackexchange.com/a/36158/3277) it can be extend into matrix notation. Let $X$ be a `n X p` dataset between rows of which you want to compute the `n X n` matrix of squared euclidean distances $D^2$. (1) $M= XX'$ (dot products between rows. (2) $h^2= diag(M)$. (3) Replicate this column as an outer product: $h^2=h^2o$, where $o$ is the row of $1$s of length `n`. (4) Last, $D^2= h^2+{h^2}'-2M$. – ttnphns Mar 13 '19 at 13:44
  • @ttnphns So there is no way to generalize this formula from the dot product formula stated in the question? – ElegantLogic Mar 13 '19 at 18:35
  • What do you mean saying "between two sets of vectors"? How do you define Euclidean distance between sets (bunches) of vectors? Can you show a picture of it? – ttnphns Mar 13 '19 at 19:22
  • @ttnphns We have an (n x d) matrix $X$ where every row is a point in d-dimensional space. The Euclidean distance matrix $D$ is the (n x n) matrix, where the elements of the $i$th row are the squared distance between the $i$th point and every other point. I was originally asking this question in relation with the pairwise similarity matrix calculated in t-SNE algorithm, as it uses the Euclidean distance matrix in its implementation. In that implementation, the author used the vectorized form for calculating the distance matrix, and I wanted to know where it came from--how it could be derived. – ElegantLogic Mar 14 '19 at 09:22
  • If you want to compute the matrix nXn between all n vectors pairwisely then my initial comment is the answer how to do it; it directly _is_ the implementation of your right hand side formula. – ttnphns Mar 14 '19 at 09:53
  • @ttnphns Yes, I realize that. But what I was looking for is a way to derive that formula from the definition of the dot product above. A way to generalize it. The verification is straightforward, but I wanted to know where it came from. – ElegantLogic Mar 14 '19 at 12:54
  • But if follows from what the dot product matrix is https://stats.stackexchange.com/a/22520/3277. – ttnphns Mar 14 '19 at 13:18
  • @ttnphns Yes I see. I just wish I could know the thought process that lead to that formula. But it is not clear from all I've seen so far, it seems as if it magically came by. – ElegantLogic Mar 14 '19 at 16:43

0 Answers0