Non-Singularity due to inclusion of non-zero lambda in ridge regression

Question

There were many similar questions on this site , related to this but none were exactly to the point I wanted to ask

So the question is relates to ridge regression and This link where there is a statement(in bold) that

In a least squares fit where coefficient estimate given by

$\hat{\beta} = (X^TX)^{-1}X^Ty$

has no unique solution when X^TX is not invertible but

Inclusion of λ makes problem non-singular even if is not invertible

This was the original motivation for ridge regression (Hoerl and Kennard, 1970)

ie. $\hat{\beta } = (X^TX+\lambda I)^{-1}X^Ty$

I am unable to exactly convince myself why this would be the case. Specifically for the case when p(predictors)>n(instances) how will I be able to obtain a unique solution with this?

Haitao Du · Accepted Answer · 2017-06-01T15:49:23.243

3

Full rank matrix is invertible. Adding $\lambda I$ to $X^TX$ make it a full rank matrix.

Here is an example. Suppose $X$ has two identical columns:

> x=cbind(c(1,1,1,1),c(1,1,1,1),c(1,2,3,4))
> x
     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    1    1    2
[3,]    1    1    3
[4,]    1    1    4
> t(x) %*% x
     [,1] [,2] [,3]
[1,]    4    4   10
[2,]    4    4   10
[3,]   10   10   30
> Matrix::rankMatrix(t(x) %*% x)
[1] 2   
> Matrix::rankMatrix(t(x) %*% x + 1e6*diag(3))
[1] 3

$X^TX$ is a symmetric matrix. It is diagonalizable with real eigenvalues. Let those eigenvalues equal $\lambda_1, \cdots \lambda_n$.

If we choose any of the $\lambda_i$ above. $(X^TX - \lambda_i I)$ will create a singular matrix.

However, $(X^TX - \lambda I)$ where $\lambda \ne \lambda_1, \cdots \lambda_n$ then the eigenvalues of $(X^TX - \lambda I)$ equal $\lambda_1 - \lambda,\cdots,\lambda_n-\lambda$ none of which are equal to $0.$ Hence, $(X^TX - \lambda I)$ is non-singular.

Nearly any perturbation to a singular matrix will make for a non-singular matrix.

edited Jun 01 '17 at 15:49

answered May 31 '17 at 02:10

Haitao Du

32,885
17
118
213

3

(+1)Yes, but would such adding always result in a full rank matrix? – GeneX May 31 '17 at 02:27
1

seem my edits on why. – Haitao Du Jun 01 '17 at 16:30
It should be noted that the second part of the answer was copied over from [here](https://math.stackexchange.com/a/2303625/356207). – Casimir Jan 02 '19 at 12:55
So the conclusion is indeed that $(X^TX - \lambda I)$ is not guaranteed to be invertible either? – lucidbrot Jan 08 '22 at 15:27

Non-Singularity due to inclusion of non-zero lambda in ridge regression

1 Answers1

Linked