Kernel transformation in Machine Learning

Question

I understand kernels allow us to linearly separate non-linearly separable data in a higher-dimensional space.

Given a feature vector $\bar x = [x1,x2,..xn]^T$, we can apply the transformation $\phi(\bar x)$, and apply the usual regression $ y = \bar w^T\phi(\bar x)$.

However, I do not understand the notation in the following question:

Given N data points $(x,t)$ (scalars), fit an th degree polynomial using polynomial and Gaussian kernels, and study goodness of fit.

To be more specific, what function $\phi$ do I use in the Polynomial and Gaussian kernels to obtain the transformed input vector?

Can you be specific about what part of the question you do not understand? What have you tried & where are you stuck? — Sycorax, Feb 22 '22 at 19:30
I do not understand how to explicitly transform the input data and perform regression — Pranav, Feb 22 '22 at 19:33
Seems like the question wants you to 1) generate data from any degree M polynomial 2) perform polynomial regression on this data 3) perform kernel regression on this data. Is that right? Like, you're given an X, but not a t? — John Madden, Feb 22 '22 at 20:16
I am given a set of (x,t) both are scalars. When it says fit an Mth degree polynomial, I'm guessing I need to create a vector for each $x_i->[x^0,...x^M]$ and perform a kernel transformation on these input vectors. What I do not get is how do I use kernels ($\phi(\bar x)$) to transform these vectors. — Pranav, Feb 22 '22 at 20:32
[The feature maps for the Gaussian kernel](https://stats.stackexchange.com/questions/69759/feature-map-for-the-gaussian-kernel) — Sycorax, Feb 22 '22 at 21:00
I was confused whether we use the explicit representation as in $\phi(\bar x)$ or do it implicitly using the Gram matrix $K$; $K(i,j)$ = $$. Then we can proceed to use the least-squares solution to compute the optimum $\bar \alpha$ estimate. — Pranav, Feb 24 '22 at 05:52

Pranav · Accepted Answer · 2022-02-24T06:18:57.080

I was confused whether we use the explicit representation as in $\phi(\bar x)$ or do it implicitly using the Gram matrix $K$; $K(i,j)$ = $<\phi(\bar x_j),\phi(\bar x_i)>$. To have $\phi$ transform data into an $infinite$-dimensional space for the Gaussian, we use the implicit representation, and then we can proceed to use the least-squares solution to compute the optimum $\bar \alpha$ estimate.

$\bar \alpha$ = $ (K + \lambda I)^{-1}Y$, where $\lambda$ is the regularisation parameter and $Y$ is the target variable.

The estimate $\hat Y(i)$ = $\bar \alpha'k(\bar x_j,\bar x_j)$

For the polynomial kernel of order d: $K(i,j)$= $\ <\phi(\bar x_j),\phi(\bar x_i)>\ =(\bar x_j'\bar x_i)^d $

For the Gaussian kernel of variance $\sigma ^2$: $K(i,j) =\ <\phi(\bar x_j),\phi(\bar x_i)>\ = e^{(||\bar x_j-\bar x_i||)/2\sigma^2}/\sqrt(2\pi \sigma ^2) $

Kernel transformation in Machine Learning

1 Answers1