3

The ridge residuals are defined as $\epsilon(\lambda)=y-X\beta^{ridge}(\lambda)$, for the model $y_i=x_i^T\beta+e_i$, where $e_i\sim N(0,\sigma^2)$, and $\beta$ is estimated by the ridge regression estimator, i.e $\beta^{ridge}(\lambda)=(X^TX+\lambda I_p)^{-1}X^Ty$.

How do I show that the $var(\epsilon(\lambda))=A^TA\sigma^2$? I have already shown that $E[e(\lambda)]=AX\beta$, where $A=[I_n-X(X^TX+\lambda I_p)^{-1}X^T]$, but my linear algebra is a bit rusty and I'm not sure how to go about it.

Also, am I correct in saying that the ridge residuals will also be normally distributed?

gunes
  • 49,700
  • 3
  • 39
  • 75
user179028
  • 117
  • 4

2 Answers2

3

Residual vector can be expressed as $$e(\lambda)=y-X\beta_r=y-X(X^TX+\lambda I)^{-1}X^Ty=Ay$$

According to properties of covariance (linear transformation), we have $$\operatorname{cov}(Ay|X)=A\operatorname{cov}(y|X)A^T=A\sigma^2IA^T=AA^T\sigma^2$$

Since $A=A^T$, this is equivalent to your formula.

gunes
  • 49,700
  • 3
  • 39
  • 75
0

Most of your question was anwered by @gunes already. Reagarding this specific point:

Also, am I correct in saying that the ridge residuals will also be normally distributed?

No, that's not possible to say. You are probably aware that residuals are often not normally distributed in linear regression. Ridge regression is equivalent to linear regression performed with real (yours) and virtual data (see this great answer). It inherits most of estimation properties from ordinary least squares due to that.

Normality of residuals is an assumption (that can be often tested). Downstream tasks may be robust to mispecification of that assumption.

Firebug
  • 15,262
  • 5
  • 60
  • 127
  • 1
    Practically you're correct, but given $X$, and assuming a fixed underlying $\beta$, isn't $Ay$ a linear transformation of a normal RV (since given $X$, $y$ is normal based on the assumptions)? – gunes Nov 08 '20 at 21:11
  • @gunes yes, based on assumptions, that's why it can't be stated as general – Firebug Nov 08 '20 at 21:14