0

Given is one single datapoint $\vec{x}$, which contains the same feature twice, hence $\vec{x} = (x,x)$ and a label y. We have to add the regularization term $\lambda*||\vec{\beta}||^2$, $\lambda > 0$ to the regular linear regression problem, solve this problem, as well as explain what relationship the components of the solution $\vec{\beta^*}$ satisfy. I would use the following equation to solve the problem by hand: $(X^TX)*\vec{\beta} - \lambda*\vec{\beta}= X^Ty$, specifically for our problem:

$\begin{bmatrix} x^2 & x^2\\ x^2& x^2\end{bmatrix} *\begin{bmatrix}\beta_1 \\ \beta_2 \end{bmatrix}+\begin{bmatrix}\lambda\beta_1 \\ \lambda\beta_2 \end{bmatrix} = \begin{bmatrix}xy \\ xy \end{bmatrix} $

Is this correct so far? I know for ridge regression, the closed form solution always exist (the added regularization allows for invertibility), but I am not sure what I can conclude for the solution and the components in this particular case.

  • 1
    Your approach is correct so far but you can still simplify the solution to \begin{align*} \beta&=(X^TX-\lambda I)^{-1}X^Ty\\ &= \pmatrix{x^2-\lambda & x^2\\ x^2 & x^2 -\lambda}^{-1}\pmatrix{x\\x}y\\ &= \pmatrix{(\frac{x(x^2-\lambda)}{\lambda^2-2\lambda x^2} -\frac{x^3}{\lambda^2-2\lambda x^2})y\\ (\frac{x(x^2-\lambda)}{\lambda^2-2\lambda x^2} -\frac{x^3}{\lambda^2-2\lambda x^2})y}\\ &=\pmatrix{-\frac{x}{\lambda-2x^2}y \\-\frac{x}{\lambda-2x^2}y } \end{align*} – chRrr Jul 31 '17 at 14:43
  • Now solve the system. What do you get? – Matthew Gunn Jul 31 '17 at 14:48
  • Thanks! I just assumed that for this exercise we don't know yet that the closed form solution can be used for ridge even though $(X^TX)^-1$ is not invertible:) Anyway, I see that obviously the $\beta$s are equal - but that seems trivial - so what exactly do they want to know if they ask for the relationships between the betas? –  Jul 31 '17 at 14:49
  • 1
    thats a good question. don't know what to get from this toyexample besides showing that the inverse exists indeed in this special case...maybe: suppose $\lambda \to 0$ and $y=\beta_1 x + \beta_2 x = x (\beta_1+\beta_2)$ then it can be immediately seen that the ridge estimator for $\beta_j$ is just $(\beta_1+\beta_2)/2$ for each of the "two" parameters. – chRrr Jul 31 '17 at 14:51
  • 1
    How did you get $(X^TX)\vec\beta - \lambda\vec\beta = X^Ty$? In particular, I'm wondering about the negative sign in front of $\lambda$. – Brent Kerby Jul 31 '17 at 14:52
  • oops brent is right :) the sign is wrong. unfortunately i can not edit my former comment anymore. the correct solution should be $\beta_j = \frac{x}{\lambda + 2x^2}y$, $j=1,2$. – chRrr Jul 31 '17 at 14:54
  • Just by derivation of $||Y-X\beta||+\lambda||\beta||^2$, corrected now @MatthewGunn you mean solving for y? chRrr: yes we have shown previously (with the help of Matthew) that there is no closed form solution without regularization –  Jul 31 '17 at 14:56
  • 1
    @TestGuest You are maximizing over $\beta_1, \beta_2$. Solve for $\boldsymbol{\beta}$. And the proper statement would be there is no *unique* closed form solution to this problem without regularization; indeed there are an infinite number of solutions without regularization. – Matthew Gunn Jul 31 '17 at 14:58
  • ...it is already solved for $\beta$...by chRrr? I see, thanks! –  Jul 31 '17 at 14:59

0 Answers0