1

A previous answer to a question asking for a derivation of ridge regression points out at one juncture that from the following equation:

$$(y_∗−X_∗β)′(y_∗−X_∗β)=(y−Xβ)′(y−Xβ)+λβ′β$$

It follows that

$$(X′_∗X_∗)β=X′_∗y_∗$$

The original author states that "From the form of the left hand expression it is immediate that the Normal equations are...". I do not understand why this follows, and would like to know more on the matter.

1 Answers1

3

This is by direct analogy with the normal equations of linear regression

$$(y−Xβ)′(y−Xβ) = 0 \Rightarrow X′X\beta = X′y $$

The poster is taking the above implication as known. By amending the matrix $X$ with the fake ridge observations we arrive at the ridge normal equations

$$(y_∗−X_∗β)′(y_∗−X_∗β) = 0$$

Which, by analogy with the above, must have the solutions

$$ X′_* X_* \beta = X′_* y $$

Matthew Drury
  • 33,314
  • 2
  • 101
  • 132
  • Let me gauge if I understand. So given a user-selected $\lambda$, minimization is equivalent to finding a $\beta$ such that the right-hand side (of my equation) is 0. Then the third equation follows from the corollary (the first equation?). – Aleksey Bilogur Oct 29 '17 at 20:12
  • 1
    Yup. That seems correct to me. – Matthew Drury Oct 29 '17 at 20:16