2

Is $f(x)=e^{x^Tx'}$ a suitable kernel to be choosen? If so, to what dimension does it transform the data?

whuber
  • 281,159
  • 54
  • 637
  • 1,101
Gigili
  • 755
  • 2
  • 8
  • 21
  • 2
    Kernel for *what*? "Kernel" means [many different things](http://en.wikipedia.org/wiki/Kernel_%28mathematics%29#Mathematics) in mathematics and statistics. Even the [tag:data-transformation] tag doesn't narrow the scope much! – whuber Dec 07 '12 at 20:52
  • @whuber: Right, sorry that I wasn't clear enough. I meant [this one](http://en.wikipedia.org/wiki/Kernel_trick) where $k(x,x')$ is referred to as a kernel or a kernel function. – Gigili Dec 07 '12 at 21:00
  • 1
    What's the difference between $x^T$ and $x'$? – assumednormal Dec 07 '12 at 21:08
  • 1
    He means there's two different variables, $x$ and $x'$, and he's just taking their dot product. – Josephine Moeller Dec 07 '12 at 21:17
  • I think he means $K(x,y)=e^{x^Ty}$. Is that correct? – Mimshot Dec 07 '12 at 21:19
  • @Mimshot: That's right. I mean $k(x,x')$ is a function of $x$ and $x'$ instead of $x$ and $y$ in $k(x,y)$. It's actually the same as explained in that Wikipedia article in my earlier comment. – Gigili Dec 07 '12 at 21:23

1 Answers1

6

If you are referring to the kernel as a kernel in machine-learning literature, then yes, it is a kernel.

More generally, we can consider the family of Gaussian kernels, parametrized by $\sigma$:

$$K(x,x') = e^{x^Tx'/\sigma^2 }$$

Using the power series expansion of the function exponential, we can rewrite the expression of $K$ as:

$$ K(x, x') = \sum_{n=0}^{\infty} \frac{(x^T \cdot x' )^n}{\sigma^{2n}n!} $$

Recall kernels are closed under summation, even infinite sums. $K$ then is sum of other (polynomial) kernels , thus is still kernel. The polynomial kernel, $(x\cdot x')^d$ can be shown to map $x$ to monomials of degree $d$. Thus the Gaussian kernel maps $x$ to all the polynomial kernels.

Example: In two dimensions, $x = (x_1,x_2), \; x' = (x_1', x_2')$, the secord order polynomial kernel $(x \cdot x')^2$ maps the data to look like a new inner product:

$$( x_1^2, x_2^2, x_1 x_2 ) \cdot (x_1'^2, x_2'^2, x_1' x_2')$$ The polynomial kernel is actually computing the above, which looks like it mapped $(x_1, x_2)$ to all monomials of degree 2. The Gaussian kernel does the same thing but for degree 1, degree 2, degree 3...

Side Note

Often in ML literature, the Gaussian kernel is defined as

$$K'(x,x') = \exp{\Big( \frac{||x - x'||^2}{2\sigma^2} \Big) }$$

But this is actually the normalized Gaussian kernel. A normalized kernel, $K'$, is defined:

$$ K'(x,x') = \frac{K(x,x')}{\sqrt{ K(x,x) K(x',x') } } $$

If we use $K(x,x') = \exp{\Big( \frac{x^Tx'}{\sigma^2 } \Big) }$, we get:

$$ K'(x,x') = \frac{e^{x^Tx'/\sigma^2 }}{ \exp{ \Big(\frac{||x||^2}{2\sigma^2} \Big)} \exp{\Big(\frac{||x'||^2}{2\sigma^2}\Big) } } $$

$$ = \exp{ \Big( -\frac{||x' - x||^2}{2\sigma^2} \Big) }$$

Cam.Davidson.Pilon
  • 11,476
  • 5
  • 47
  • 75