Decrease of $(X'X)^{-1}$ as n increases

Question

Let $X$ be a $n \times p$ matrix ($n \geq p$ like a conventional data matrix), with each column j filled by iid draws from a variable $\mathcal{X}_j$. I would like to show that, in a sloppy notation, $(X^TX)^{-1} \rightarrow 0$ as $n \rightarrow \infty$.

Edit 2016/06/15: I will expand the question to show where I stuck: first, it is known that a the maximum likelihood estimator $\hat{\beta} \sim N(\beta,~\sigma \cdot (X^T X)^{-1})$. Second, $\hat{\beta}$ is consistent, meaning that $\lim_{ n\to\infty} \hat{\beta} \xrightarrow{p}\beta$. As this question (Why don't asymptotically consistent estimators have zero variance at infinity?) suggest, this doesn't generaly imply $\sigma \cdot (X^T X)^{-1} \to 0$. But does it hold for this case?

Edit 2016/06/15: The alternative was to show that adding new data $X_{new}$ (again drawn from $\mathcal{X}$) to existing data, resulting in $X^{* T} = (X^T,~ X_{new}^T)$, decreases $(X^{* T} X^*)^{-1}$. This statement is weaker, and is now not sufficient any more.

Why do you expect that the covariance will die for asymptotic n? Is there some implicit assumption here, or have I missed something. Please clarify — Repmat, Apr 21 '16 at 11:32
Thanky @Repmat for having a look at the question. Intuitively I would say: the more data, the more precise the estimator. I'll update the question with my current approach. — Qaswed, Apr 21 '16 at 12:20

kjetil b halvorsen · Answer 1 · 2016-04-25T06:17:55.003

Adding to the other answers: You cannot in general show that, $(X^T X)^{-1}$ go to zero when $n \rightarrow \infty$. You would need more assumptions, and you have not specified those. As a simple example, let the model be a one way ANOVA comparing $p$ groups, coded as dummy variables ($p$ dummys without an explicit intercept). Let the number of observations in group $i$ be $n_i$ with $n_1+n_2+\dotsb+n_p$. Then the design matrix $X$ becomes $$ X=\begin{bmatrix} 1 & 0 & 0 &\dots & 0 \\ 1 & 0 & 0 &\dots & 0 \\ \dots \\ 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ \dots \\ 0 & 1 & 0 & \dots & 0 \\ \vdots \\ 0 & 0 & 0 & \dots & 1 \\ \dots \\ 0 & 0 & 0 & \dots & 1 \end{bmatrix} $$ with $n_1$ rows in first block, and so on. Then $X^T X$ becomes a diagonal matrix with the $n_i$'s along the diagonal, and its inverse diagonal with $1/n_i$ along the diagonal. If now you only can get five observations from the first group, but the other $n_2, n_3, \dotsc, n_p$ all increases to infinity with $n$, then the limit of $(X^T X)^{-1}$ becomes the diagonal matrix $$ \begin{bmatrix} 1/5 & 0 & 0 &\dots \\ 0 & 0 & 0 &\dots \\ 0 & 0 & 0& \dots \\ \vdots \\ 0 & 0 & \dots & 0 \end{bmatrix} $$ which is not the zero matrix.

So, in general we can assume the model $y_i = x_i^T \beta + \epsilon_i$ where the disturbances $\epsilon_1, \dotsc,\epsilon_n$ are iid random variables from some distribution with zero mean and common variance $\sigma^2$. In matrix form we can write this model $ Y= X\beta+\epsilon$ and we can ask about the estimate of some contrast of the parameter vector $\beta$, say $c^T \beta$ defined by the contrast vector $c$. In our anova example, the mean of group $i$ is given by the contrast $c^T \beta$ with $c=e_i$, $e_i$ the unit vector with a one in position $i$. So the mean of the first group is the contrast $e_1^T \beta$. I that example the variance of the (least squres) estimate of the contrast $c^T\beta$, $c^T \hat{\beta}$, will go to zero with $n$ for some contrast vectors, and not for others.

So, in general we can ask ways of characterizing those contrast vectors $c$ such that the limiting variance is zero, where the variance of the estimated contrast is $$ \text{Var}(c^T \hat{\beta})=\sigma^2 c^T (X^T X)^{-1} c $$ or for conditions guaranteeing that the limiting variance is zero for all contrast vectors $c$ (that will correspond to the original question asked here). One such condition could be that the rows $x_i$ of the design matrix $X$ is obtained as an iid sample from some common distribution (with some necessary conditions on that common distribution, no components can have zero variance, for instance).

There is a paper dedicated to giving such conditions with much detail: Chien-Fu Wu: "Characterizing the consistent directions of least squares estimates", the annals of statistics, 1980, vol 8 No 4 789--801 http://projecteuclid.org/euclid.aos/1176345071

Thank you @kjetil b halvorsen for the link to the article. I've updated the question and clarified that each column in $X$ is a draw from a random variable (so adding I do not think about dummy variables). — Qaswed, Jun 15 '16 at 15:36

dv_bn · Answer 2 · 2016-04-21T14:41:11.280

0

I don't see the link between the two elements of your questions.

Let's deal with the first part.

Assume $\frac{X'X}{n}$ is an estimator of the $p \times p$ covariance matrix of the regressors. Assuming the estimator is consistent, you are ensured that $\frac{X'X}{n} \to C_x$ as the number of observations $n$ grows to $\infty$. $C_x$ denotes the covariance matrix of regressors.

Consistency implies that, for all $p$, $k$, we have $n^{-1}\sum\limits_{i=1}^{n} x_{ip} x_{ik} \to C_x(p,k)$ as $n\to \infty$. Assume further that $C_x(p,k)<\infty$. Then $\sum\limits_{i=1}^{n} x_{ip} x_{ik} \to \infty$ as $n\to \infty$. Therefore $(\sum\limits_{i=1}^{n} x_{ip} x_{ik})^{-1} \to 0$.

It is not a matter of your variance-covariance matrix to decrease. It is just that you need to control for the number of observations to ensure consistency.

edited Apr 21 '16 at 14:41

answered Apr 21 '16 at 13:04

dv_bn

526
3
9

If the estimator of the covariance matrix is consistent, the matrix to which it converges is the covariance matrix itself. – dv_bn Apr 21 '16 at 14:40
Usually the argument is reversed. You need to prove that the limit holds, then the consistency holds. Also it is not evident that if elements of the matrix $A=(a_{ij})$ hold the property of $a_{ij}^{-1}\to 0$ it follows that $A^{-1}\to 0$. For one, the $A$ might not be invertible at all. – mpiktas Apr 25 '16 at 06:49
@dv_bn, I can follow your argument except from saying $n^{-1} \sum_{i=1}^n x_{ip}x{ik} \to C_x(p,k)$ as $n\to \infty$ to concluding that $\sum_{i=1}^n x_{ip}x{ik} \to \infty$ as $n\to \infty$. Do you say "if something ($\sum x$) devided by something infinite ($n$) is finite ($C_x$), than this something has be be infinite"? Is there a reference that one can draw this conclusion? (I remember that looking a limits separately is shaky: $lim ((1/n)\cdot n)$ is not $lim(0 \cdot n) = 0$). – Qaswed Jun 15 '16 at 15:56
@Qaswed, my answer is not very clear I must reckon. One way to see my point is to consider that for a ratio to converge (converging means to something finite otherwise it diverges), both the numerator and the denominator must converge separately or diverge separately but at the same rate. So if n goes to infinity, the denominator diverges. For the ratio to converge, the numerator must diverge at the same rate. Does that make more sense? You can read up on the properties of the limit operator here e.g. http://tutorial.math.lamar.edu/Classes/CalcI/LimitsProperties.aspx – dv_bn Jun 15 '16 at 16:21

Qaswed · Answer 3 · 2018-08-02T08:03:41.997

-1

Edit 2018/08/02: not an answer, but an insight.

The insight to my question needs block matrices. If new data $X_{new}$ are available, those can "join" the old data $X$ to a block matrix $X^* = \begin{bmatrix}X \\ X_{new}\end{bmatrix}$.

Rules for multiplication of block diagonal matrices yield $\begin{bmatrix}X' & X_{new}'\end{bmatrix} \begin{bmatrix}X \\ X_{new}\end{bmatrix} = X'X + X_{new} ' X_{new}$. (See e.g. David A. Harville "Matrix Algebra From a Statisticians's Perspective" (1997) Section 2.2.)

With a bit of work (not shown here) it can be shown that $(X^{*}´ X^*)^{-1}$ is Loewner smaller than $(X'X)^{-1}$. This means that $(X'X)^{-1}$ decreases (but not to necessarily to 0) as $n$ increases.

edited Aug 02 '18 at 08:03

answered Apr 21 '16 at 14:03

Qaswed

578
4
17

1

This answer is incomplete because it does not use the iid assumption. Without that assumption, the conclusion is false. – whuber Apr 21 '16 at 14:16
@whuber I realized that I only showed a decrease in $X*$, but not a decrease to $0$. Do you think that using the iid assumption would lead to a decrease to $0$? – Qaswed Aug 02 '18 at 08:09
Yes, assuming $X_n$ is not always zero. – whuber Aug 02 '18 at 11:56

Decrease of $(X'X)^{-1}$ as n increases

3 Answers3