Is $f(x)=e^{x^Tx'}$ a suitable kernel to be choosen?

Question

Is $f(x)=e^{x^Tx'}$ a suitable kernel to be choosen? If so, to what dimension does it transform the data?

Kernel for *what*? "Kernel" means [many different things](http://en.wikipedia.org/wiki/Kernel_%28mathematics%29#Mathematics) in mathematics and statistics. Even the [tag:data-transformation] tag doesn't narrow the scope much! — whuber, Dec 07 '12 at 20:52
@whuber: Right, sorry that I wasn't clear enough. I meant [this one](http://en.wikipedia.org/wiki/Kernel_trick) where $k(x,x')$ is referred to as a kernel or a kernel function. — Gigili, Dec 07 '12 at 21:00
He means there's two different variables, $x$ and $x'$, and he's just taking their dot product. — Josephine Moeller, Dec 07 '12 at 21:17
@Mimshot: That's right. I mean $k(x,x')$ is a function of $x$ and $x'$ instead of $x$ and $y$ in $k(x,y)$. It's actually the same as explained in that Wikipedia article in my earlier comment. — Gigili, Dec 07 '12 at 21:23

Cam.Davidson.Pilon · Accepted Answer · 2012-12-07T21:42:10.910

If you are referring to the kernel as a kernel in machine-learning literature, then yes, it is a kernel.

More generally, we can consider the family of Gaussian kernels, parametrized by $\sigma$:

$$K(x,x') = e^{x^Tx'/\sigma^2 }$$

Using the power series expansion of the function exponential, we can rewrite the expression of $K$ as:

$$ K(x, x') = \sum_{n=0}^{\infty} \frac{(x^T \cdot x' )^n}{\sigma^{2n}n!} $$

Recall kernels are closed under summation, even infinite sums. $K$ then is sum of other (polynomial) kernels , thus is still kernel. The polynomial kernel, $(x\cdot x')^d$ can be shown to map $x$ to monomials of degree $d$. Thus the Gaussian kernel maps $x$ to all the polynomial kernels.

Example: In two dimensions, $x = (x_1,x_2), \; x' = (x_1', x_2')$, the secord order polynomial kernel $(x \cdot x')^2$ maps the data to look like a new inner product:

$$( x_1^2, x_2^2, x_1 x_2 ) \cdot (x_1'^2, x_2'^2, x_1' x_2')$$ The polynomial kernel is actually computing the above, which looks like it mapped $(x_1, x_2)$ to all monomials of degree 2. The Gaussian kernel does the same thing but for degree 1, degree 2, degree 3...

Side Note

Often in ML literature, the Gaussian kernel is defined as

$$K'(x,x') = \exp{\Big( \frac{||x - x'||^2}{2\sigma^2} \Big) }$$

But this is actually the normalized Gaussian kernel. A normalized kernel, $K'$, is defined:

$$ K'(x,x') = \frac{K(x,x')}{\sqrt{ K(x,x) K(x',x') } } $$

If we use $K(x,x') = \exp{\Big( \frac{x^Tx'}{\sigma^2 } \Big) }$, we get:

$$ K'(x,x') = \frac{e^{x^Tx'/\sigma^2 }}{ \exp{ \Big(\frac{||x||^2}{2\sigma^2} \Big)} \exp{\Big(\frac{||x'||^2}{2\sigma^2}\Big) } } $$

$$ = \exp{ \Big( -\frac{||x' - x||^2}{2\sigma^2} \Big) }$$

Thank you for your answer. Could you please explain yourself a little more - what do you mean by "maps x to all powers of x"? — Gigili, Dec 07 '12 at 21:28

Is $f(x)=e^{x^Tx'}$ a suitable kernel to be choosen?

1 Answers1

Side Note

Linked