3

I understand kernels allow us to linearly separate non-linearly separable data in a higher-dimensional space.

Given a feature vector $\bar x = [x1,x2,..xn]^T$, we can apply the transformation $\phi(\bar x)$, and apply the usual regression $ y = \bar w^T\phi(\bar x)$.

However, I do not understand the notation in the following question:

Given N data points $(x,t)$ (scalars), fit an th degree polynomial using polynomial and Gaussian kernels, and study goodness of fit.


To be more specific, what function $\phi$ do I use in the Polynomial and Gaussian kernels to obtain the transformed input vector?

Pranav
  • 31
  • 4
  • Can you be specific about what part of the question you do not understand? What have you tried & where are you stuck? – Sycorax Feb 22 '22 at 19:30
  • I do not understand how to explicitly transform the input data and perform regression – Pranav Feb 22 '22 at 19:33
  • Seems like the question wants you to 1) generate data from any degree M polynomial 2) perform polynomial regression on this data 3) perform kernel regression on this data. Is that right? Like, you're given an X, but not a t? – John Madden Feb 22 '22 at 20:16
  • I am given a set of (x,t) both are scalars. When it says fit an Mth degree polynomial, I'm guessing I need to create a vector for each $x_i->[x^0,...x^M]$ and perform a kernel transformation on these input vectors. What I do not get is how do I use kernels ($\phi(\bar x)$) to transform these vectors. – Pranav Feb 22 '22 at 20:32
  • [The feature maps for the Gaussian kernel](https://stats.stackexchange.com/questions/69759/feature-map-for-the-gaussian-kernel) – Sycorax Feb 22 '22 at 21:00
  • I was confused whether we use the explicit representation as in $\phi(\bar x)$ or do it implicitly using the Gram matrix $K$; $K(i,j)$ = $$. Then we can proceed to use the least-squares solution to compute the optimum $\bar \alpha$ estimate. – Pranav Feb 24 '22 at 05:52

1 Answers1

0

I was confused whether we use the explicit representation as in $\phi(\bar x)$ or do it implicitly using the Gram matrix $K$; $K(i,j)$ = $<\phi(\bar x_j),\phi(\bar x_i)>$. To have $\phi$ transform data into an $infinite$-dimensional space for the Gaussian, we use the implicit representation, and then we can proceed to use the least-squares solution to compute the optimum $\bar \alpha$ estimate.

$\bar \alpha$ = $ (K + \lambda I)^{-1}Y$, where $\lambda$ is the regularisation parameter and $Y$ is the target variable.

The estimate $\hat Y(i)$ = $\bar \alpha'k(\bar x_j,\bar x_j)$

For the polynomial kernel of order d: $K(i,j)$= $\ <\phi(\bar x_j),\phi(\bar x_i)>\ =(\bar x_j'\bar x_i)^d $

For the Gaussian kernel of variance $\sigma ^2$: $K(i,j) =\ <\phi(\bar x_j),\phi(\bar x_i)>\ = e^{(||\bar x_j-\bar x_i||)/2\sigma^2}/\sqrt(2\pi \sigma ^2) $

Pranav
  • 31
  • 4