1

I was reading through another question answer where it discussed the inversion of the XX' matrix in ridge regression. It stated that it is impossible to have positive eigenvalues if matrix $XX'$(=$VDV'$ as it is positive semi-definite) for $p>>n$.

I'm not entirely sure which matrix dimensions are being refered to when it states $p>>n$, and am also struggling to see how the shape of a regression matrix affects the sign of the eigenvalues? Can anyone clarify these points?

Sean
  • 614
  • 2
  • 12

2 Answers2

2

The symbol $n$ denotes the number of observations (cases) and $p$ denotes the number of features (independent variables).

Knowing whether $p>n$ or $p\le n$ is important because of the invertible matrix theorem. When $p >n$, the product $XX^\prime$ is not full rank, i.e. it is singular. A singular matrix has at least one eigenvalue of 0.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
1

Case 1. If your (p x n) design (data) matrix $X$ has more observations $n$ than variables $p$ (a landscape), then $XX'$ will be invertible unless there are linearly dependent variables. This means that all eigenvalues will be positive.

If you have linearly dependent variables, then $XX'$ will have a rank lesser than $p$: $rank(XX')<p$ then at least one eigenvalue will be zero, i.e. not positive.

Case 2. If there are more variables than observations, i.e. $p>n$ (a portrait), then $rank(XX')\le n<p$, then again at least one eigen value will be not positive, a zero.

Conceptually, what a regularization does is it tries to bring you from Case 2 to Case 1 by imposing a constraint, such that effectively "reduces" $p$ to $\tilde p$. Although it still holds that $p>n$, but now $\tilde p<n$. In this regard, again, conceptually, regularization changes the effective shape of your $X$ matrix from portrait (case 2) to landscape (case 1). We saw that the shape impact whether all eigenvalues are positive or not.

Aksakal
  • 55,939
  • 5
  • 90
  • 176