Why are MLE for high dimensional multivariate gausian covariance matrix likely to be ill-conditioned

Question

In a book I'm reading (Probabilistic Machine Learning: An Introduction) the author suggested that in high dimensions, the MLE estimate for the covariance matrix for multivariate gaussian is often poorly conditioned.

I'm trying to understand - is there a mathematical explanation as to why MLE for high dimensional multivariate Gaussian covariance matrix is likely to be ill-conditioned? Is this even the case?

I couldn't find any evidence for this online other than people encountering ill-conditioned matrixes while fitting a multivariate Gaussian.

The simplest explanation is that the number of parameters needed to specify the (full covariance) of a Gaussian is $p(p+1) / 2$, i.e. order $p^2$, where p is the dimension; so if the number of observations $n$ is of the same order as (or smaller than) as $p$ you end up with close to singular (or singular) empirical covariance matrix. — dr.ivanova, Jan 12 '22 at 13:27
@dr.ivanova This looks like an answer rather than a comment. — Christian Hennig, Jan 12 '22 at 14:33
@dr.ivanova The author said something similar in the book but I still don't see it! Why does the fact that there are many parameters mean the matrix will be close to singular? — user346500, Jan 12 '22 at 16:54

score 0 · Answer 1 · answered Jan 17 '22 at 07:46

A covariance matrix is ill-conditioned when it is singular or near-singular.

Suppose you have data $X\in\mathbb{R}^{n\times p}$ which you wish to model as multivariate Gaussian of $p$ dimensions; $n$ here is the number of observations.

A full covariance matrix has $p(p+1)/2$ free parameters, so if the number of observations $n$ is the same order as (or smaller than) $p$, then you end up with close to singular (or singular) empirical covariance matrix.

To understand why, without loss of generality, suppose $X$ is mean 0. Then the empirical covariance matrix is $\hat{\Sigma}=1/nXX^T\in\mathbb{R}^{p\times p}$. The rank of the empirical covariance matrix is at most $\min(p, n)$ (this is because $rank(\hat{\Sigma})=rank({XX^T})=rank(X)\leq min(n, p)$. Hence if $n<p$, then the empirical covariance is singular.

I found this related question with a very detailed answer which you will find helpful.

Why are MLE for high dimensional multivariate gausian covariance matrix likely to be ill-conditioned

1 Answers1