6

I am learning about Support Vector Machines, and in particular, those with kernels for non-linear decision boundaries. I understand the concept of projecting the original data to a higher-dimensional space, such that a linear decision boundary can be inserted. But what I don't understand is how the kernel function actually does this mapping.

For example, consider the radial basis function as a kernel $K(x, x') = -\lambda || x - x' || ^2$. What does this actually mean?

Is it saying that for every data point $x$, you find its squared distance to the point x'? And this distance corresponds to its value in the new higher-dimensional space? But what is $x'$ anyway? And also, this is only mapping to a one-dimensional space, because the value is just a distance...not a higher-dimensional space.

Of course, my understanding is totally wrong, but please could somebody explain where I am getting confused? Thanks!

Karnivaurus
  • 5,909
  • 10
  • 36
  • 52

2 Answers2

4

By solving the optimization problem of SVM in its dual form, it turns out that the dependency of the problem on the training data $\{x_i\}_{i=1}^n$ is only through their inner products. That is, you only need $\{x_i^\top x_j\}_{i, j=1}^n$ i.e., inner products of all pairs of points you have. So to train an SVM, you only need to give it the labels $Y=(y_1, \ldots, y_n)$ and a kernel matrix $K$ where $K_{ij} = x_i^\top x_j.$

Now to map each data point $x_i$ to a high-dimensional space, you apply $\phi(x)$. So the kernel matrix becomes

$$K_{ij} = \langle \phi(x_i), \phi(x_j)\rangle$$

where $\langle ,\rangle$ is just a formal notation for an inner product in a general inner product space. It can be seen that as long as we can define an inner product in the high-dimensional space, we can train SVM. We do not even need to compute $\phi(x)$ itself. We only need to compute the inner product $\langle \phi(x_i), \phi(x_j)\rangle$. This is where we set

$$K_{ij} = k(x_i, x_j)$$

for some kernel $k$ of your choice. It is known (by Moore-Aronzajn theorem) that if $k$ is positive definite, then it corresponds to some inner product space i.e., there exists a corresponding feature map $\phi(\cdot)$ such that $k(x_i, x_j) = \langle \phi(x_i), \phi(x_j) \rangle$.

To answer your question, the kernel $k(x,y)$ does not specify a projection of $x$. It is $\phi(\cdot)$ (which is usually implicit) associated with $k$ that specifies the projection. As an example, the feature map $\phi$ of an RBF kernel $k(x,y) = \exp(-\gamma \|x-y\|_2^2)$ is infinite-dimensional.

wij
  • 1,893
  • 11
  • 18
0

First, the radial basis function (RBF) is given by $k\colon\mathcal{X}\times\mathcal{X}\to\Bbb{R}$, with $$ k(x,y)=\exp(-\gamma\lVert x-y \rVert^2), $$ where $\gamma$ is a positive parameter.

What is really useful in SVMs is the so-called "kernel trick". Briefly, you don't need to explicitly know the mapping from the original space to the high-dimensional space (this sometimes is even impossible). What you really need to know is how to apply this mapping to the inner products of the form $x_i\cdot x_j$ that are present in the SVM formulation. So, if the mapping function is $\phi$, then the inner product $x_i\cdot x_j$ is transformed into inner product of the form $\phi(x_i)\cdot\phi(x_j)$, which is given by the kernel function evaluated on these points, i.e., $$ \phi(x_i)\cdot\phi(x_j)=k(x_i,x_j). $$

nullgeppetto
  • 273
  • 1
  • 2
  • 10