4

I have this algorithm to compute the RBF kernel and it seems to work just fine. But I would like to understand what kind of operations are involved, for example:

  • What are the trnorms vectors? What are they for?
  • What is the meaning of creating the matrices k1 and k2?
  • Is this algorithm any different from the sklearn implementation?
def compute_RBF(mat1, mat2, sigma):

    trnorms1 = np.mat([(v * v.T)[0, 0] for v in mat1]).T
    trnorms2 = np.mat([(v * v.T)[0, 0] for v in mat2]).T

    k1 = trnorms1 * np.mat(np.ones((mat2.shape[0], 1), dtype=np.float64)).T

    k2 = np.mat(np.ones((mat1.shape[0], 1), dtype=np.float64)) * trnorms2.T

    k = k1 + k2

    k -= 2 * np.mat(mat1 * mat2.T)

    k *= - 1./(2 * np.power(sigma, 2))

    return np.exp(k)
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Lorenzo Norcini
  • 43
  • 1
  • 1
  • 6
  • 2
    Voting leave open: Seems focused on the statistical meaning behind the code, i.e. not really a programming question. – Sean Easter Oct 07 '16 at 20:23

1 Answers1

8

Say that mat1 is $n \times d$ and mat2 is $m \times d$.

Recall that the Gaussian RBF kernel is defined as $k(x, y) = \exp\left( - \frac{1}{2 \sigma^2} \lVert x - y \rVert^2 \right)$. But we can write $\lVert x - y \rVert^2$ as $(x - y)^T (x - y) = x^T x + y^T y - 2 x^T y$. The code uses this decomposition.

First, the trnorms1 vector stores $x^T x$ for each input $x$ in mat1, and trnorms2 stores $y^T y$ for each $y$ in mat2.

Then, the k1 matrix is obtained by multiplying the $n \times 1$ matrix of $x^T x$ entries by a $1 \times m$ matrix of ones, getting an $n \times m$ matrix with $x^T x$ entries repeated across the rows, so that k1[i, j] is $x_i^T x_i$.

The next line does basically the same thing for the $y$ norms repeated across columns, getting an $n \times m$ matrix with k2[i, j] of $y_j^T y_j$.

k is then their sum, so that k[i, j] is $x_i^T x_i + y_j^T y_j$. The next line then subtracts twice the product of the data matrices, so that k[i, j] becomes $x_i^T x_i + y_j^T y_j - 2 x_i^T y_j = \lVert x_i - y_j \rVert^2$.

Then, the code multiplies by $\frac{-1}{2 \sigma^2}$ and finally takes the elementwise $\exp$, getting out the Gaussian kernel.

If you dig into the scikit-learn implementation, it's exactly the same, except:

  • It's parameterized instead with $\gamma = \frac{1}{2 \sigma^2}$.
  • It's written in much better Python, not wasting memory all over the place and doing computations in a needlessly slow way.
  • It's broken up into helper functions.

But, algorithmically, it's doing the same basic operations.

Danica
  • 21,852
  • 1
  • 59
  • 115
  • If the matrices are of different sizes (one is $n \times d$ and one is $m \times d$, how are you able to do the $2x^Ty$ operation? – guy Jul 29 '19 at 02:13
  • @guy $x$ and $y$ here are each $d \times 1$. In matrix form, the code is using `mat1 * mat2.T`, which (assuming things are in numpy matrices, which you probably shouldn't use, and instead use Python 3.6+'s `mat1 @ mat2.T` matrix multiplication operator) is multiplying an $n \times d$ matrix by a $d \times m$ matrix, getting an $n \times m$ result. – Danica Jul 29 '19 at 13:48
  • I guess what I don't understood is where in the code you are computing the RBF kernel between two different size matrices. Because each matrix has the same number of columns but not the same number of rows so how can you find the norm for the rows that don't exist? Maybe I am lost cause the code is written in a roundabout way..... – guy Jul 29 '19 at 23:32
  • OP’s code is tbf pretty roundabout. Check out [`sklearn.metrics.pairwise.euclidean_distances`](https://github.com/scikit-learn/scikit-learn/blob/1495f69242646d239d89a5713982946b8ffcf9d9/sklearn/metrics/pairwise.py#L165) instead maybe; [`rbf_kernel`](https://github.com/scikit-learn/scikit-learn/blob/1495f69242646d239d89a5713982946b8ffcf9d9/sklearn/metrics/pairwise.py#L95) is very simple given that. – Danica Jul 30 '19 at 00:27