2

I'd like to know if there is a concentration inequality for the sample covariance matrix that don't assume the knowledge of the true mean.


Background.

Given a probability distribution $\mu$ on $\mathbb R^d$, the covariance matrix of $\mu$ is defined as follows: $$\Sigma := \mathbb E [(x - \bar \mu)(x -\bar \mu)^\top] $$ where $x \sim \mu$ and $\bar \mu = \mathbb E [x]$.

If $X = (x_1, \cdots x_m)$ is an i.i.d. sample drawn from $\mu$, then we can define two estimators: \begin{align*} & \hat \Sigma_1 := \frac1m \sum_{i=1}^m (x_i - \bar \mu)(x_i - \bar \mu)^\top, \text{ where } \bar \mu = \mathbb E_{x \sim \mu} [x] \\ & \hat \Sigma_2 := \frac1{m-1} \sum_{i=1}^m (x_i - \bar x)(x_i - \bar x)^\top, \text{ where } \bar x = \frac1m (x_1 + \cdots x_m) \end{align*} They both satisfy $\mathbb E_X \hat \Sigma_1 = \mathbb E_X \hat \Sigma_2 = \Sigma$.

The second estimator $\hat \Sigma_2$ is of interest because $\bar \mu$ is often not known in practice.


Question.

I'm interested in the concentration of $\hat \Sigma_2$ to $\Sigma$ as $m \rightarrow \infty$. More precisely, given a number $t > 0$, I'd like to know whether there exists a constant $A>0$ and a term $\alpha \in (0,1)$ that depend on $\mu$ and $t$ such that $$\text{Prob}(\| \Sigma - \hat \Sigma_2 \| \ge t) \le A \cdot \alpha^m$$ where $\|\cdot \|$ is the spectral norm, also known as the 2-norm. (The Frobenius norm is also fine, since for any $d \times d$ matrix $A$, $\|A\| \le \|A\|_F \le \sqrt{d} \|A\|$)

In the case of the difference $\|\Sigma - \hat \Sigma_1\|$, such an answer can be obtained using the matrix Bernstein inequality. However, I'm less sure about $\|\Sigma - \hat \Sigma_2\|$. I have an idea, which is to use the fact that: $$\hat \Sigma_1 - \hat \Sigma_2 = \frac1{m(m-1)} \sum_{i\neq j} (x_i-\bar\mu) (x_j-\bar\mu)^\top$$ which follows from: \begin{align*} \hat \Sigma_2 =& \frac1m \sum_i x_i x_i^\top - \frac1{m(m-1)} \sum_{i\neq j} x_i x_j^\top \\ =& \frac1m \sum_i (x_i-\bar\mu) (x_i-\bar\mu)^\top - \frac1{m(m-1)} \sum_{i\neq j} (x_i-\bar\mu) (x_j-\bar\mu)^\top \\ =& \hat \Sigma_1 - \frac1{m(m-1)} \sum_{i\neq j} (x_i-\bar\mu) (x_j-\bar\mu)^\top \end{align*} But now I'm not sure how to control the sum of the quantities $(x_i-\bar\mu) (x_j-\bar\mu)^\top$, which are not independent.

This should be a fairly standard question with a standard answer, but I couldn't find an answer to this. A similar question's only answer wasn't addressing my question; it was addressing the case for $\hat \Sigma_1$.

I found a weak(?) answer to this question (see the answer here), showing that $\| \Sigma - \hat \Sigma_2 \| \rightarrow 0$ quadratically in $m$, assuming that $\mu$ has a compact support. I think there's likely to be a sharper bound, so if anyone knows about it, please tell me.

Uzu Lim
  • 181
  • 5
  • The covariance matrix can be defined without referring to the mean. See https://stats.stackexchange.com/a/18200/919 for a description of one way. (You will easily see how this visual explanation translates to a mathematical formula.) – whuber Apr 12 '21 at 15:09
  • The expression $\hat \Sigma_1$ uses the true mean and the expression $\hat \Sigma_2$ uses the sample mean. I'm trying to obtain a concentration inequality for $\hat \Sigma_2$. – Uzu Lim Apr 12 '21 at 15:18
  • The point is that there are algebraically equivalent expressions for $\hat\Sigma_2$ that do not use the sample mean at all. – whuber Apr 12 '21 at 16:11
  • I see. Currently I cannot see how that would help me prove a desired concentration inequality, could you tell me how? One expression I know for $\hat \Sigma_2$ is obtained by expanding the sample mean part, which gives $\hat \Sigma_2 = \frac1m \sum_i x_i x_i^\top - \frac1{m(m-1)} \sum_{i \neq j} x_i x_j^\top$. I wonder if this is one of the expressions you consider as relevant. – Uzu Lim Apr 12 '21 at 16:24

0 Answers0