1

I want to know can we approximate the covariance matrix of a random vector by making use of a probability limit.

Define the linear regression model in matrix form as $$ \mathbf{Y} = \mathbf{X} \beta + \varepsilon, $$ where the variance of $\varepsilon$ is $\sigma$.

I am interested in approximating $E[\text{Cov}[A|\mathbf{X}]]$ defined by

$$ E[\text{Cov}[\hat \beta|\mathbf{X}]] = E\bigg[\frac{\sigma^2}{n} \bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1}\bigg] = \frac{\sigma^2}{n} E\bigg[\bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1}\bigg]. $$

The probability limit of $\mathbf{X}^T\mathbf{X}/n$ is $$ \text{plim}_{n\to \infty} \bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg) = Q, $$ where $Q$ is a constant positive definite matrix (see Econometric Analysis by William Greene, eq. 4-19). So the probability limit of the inverse $(\mathbf{X}^T\mathbf{X}/n)^{-1}$ is $$ \text{plim}_{n\to \infty} \bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1} = Q^{-1}. $$

For large $n$, I am interested in approximating $E[\text{Cov}[\hat \beta|\mathbf{X}]]$ by using the probability limit, that is, saying something like $$ E[\text{Cov}[\hat \beta|\mathbf{X}]] \approx \frac{\sigma^2}{n} Q^{-1}, \quad \quad \text{or} \quad \quad E[\text{Cov}[\hat \beta|\mathbf{X}]] \sim \frac{\sigma^2}{n} Q^{-1}. $$ I have various questions regarding the validity of doing this.

What kind of error are we making if we can do this? Is there a way to account for the error? Is this a situation where we have an approximation that 'holds with high probability'? If we can indeed make this approximation, how do we rigorously state it mathematically (precisely what does $\approx$ or $\sim$ signify)?

sonicboom
  • 780
  • 1
  • 6
  • 13
  • What is $A$? It appears to have the same covariance matrix as $\beta$, under the usual assumptions. – Alecos Papadopoulos Dec 04 '20 at 13:38
  • $A$ is actually the estimated linear regression coefficients $\hat \beta$ (see [here](https://en.wikipedia.org/wiki/Ordinary_least_squares#Finite_sample_properties)). – sonicboom Dec 04 '20 at 14:16

1 Answers1

3

In "standard linear regression" with strict exogeneity, $E(\varepsilon \mid \mathbf X) = 0$, the OP wants to approximate (pursuing a theoretical result) the unconditional variance of $\hat \beta$ by using the probability limit of the the moment matrix.

By the Law of Total Variance and the fact that $E(\hat \beta \mid \mathbf X) = \beta$, we have that the unconditional variance is

$${\rm V}(\hat \beta) = \sigma^2 \cdot E\Big[(\mathbf X' \mathbf X)^{-1}\Big] = \frac{\sigma^2 }{n}\cdot E\Big[(n^{-1}\mathbf X' \mathbf X)^{-1}\Big]$$

We approximate this by

$${\rm V}(\hat \beta) \approx \frac{\sigma^2}{n} \cdot Q^{-1},$$

where

$$Q = {\rm plim}\left(n^{-1}\mathbf X' \mathbf X\right) = E(\mathbf x \mathbf x')$$

where $\mathbf x$ is the typical row vector of $\mathbf X$ and is used because at the limit the matrix $X$ has infinite row dimension, so it would be inappropriate to use it as the result of a limiting expression.

In words, instead of the expected value of the inverse, we use the inverse of the expected value.

The approximation error is

$$\delta(n) =(\sigma^2 /n) \cdot \Big[E[(n^{-1}\mathbf X'\mathbf X)^{-1}] - [E(\mathbf x'\mathbf x)]^{-1}\Big].$$

We have that $(n^{-1}\mathbf X'\mathbf X)^{-1} \longrightarrow_p [E(\mathbf x'\mathbf x)]^{-1}$, so

$$E[(n^{-1}\mathbf X'\mathbf X)^{-1}] - [E(\mathbf x'\mathbf x)]^{-1} \longrightarrow E[(E(\mathbf x'\mathbf x))^{-1}] - [E(\mathbf x'\mathbf x)]^{-1} = 0, $$

so this expression is $o(1)$. Also, $(\sigma^2/n) = O(1/n)$. Therefore,

$$\delta(n) = O(1/n)\cdot o(1) = o(1\cdot 1/n) = o(1/n).$$

So the approximation error goes to zero faster than $n$ goes to infinity.

UPDATE
Can we improve on the $o_p(1) / o(1)$ rate of convergence of

$$E[(n^{-1}\mathbf X'\mathbf X)^{-1}] - [E(\mathbf x'\mathbf x)]^{-1},\;\;\; ?$$

Apparently, the OP needs that. Let's see.

The OP mentioned a remark in Bruce Hansen's Econometrics textbook, about the OLS estimator having a faster convergence rate than $o_p(1)$. Hansen derives this after it obtains the rate of scaling needed for the asymptotic distribution. And since this latter is $O_p(n^{-1/2})$, it follows that multiplying the estimator $\hat \beta_n - \beta$ by something larger than unity ($n^0$) but lower than $n^{1/2}$ will not hinge his journey towards zero.

To clear the eye, we are examining the rate of convergence of

$$E(h_n) - c,\;\;\; c\; {\rm =\;constant}, \;\;\; h_n = O_p(1), \;h_n - c \to_p 0.$$

Now, to apply the Hansen approach, we would need to be able to say something about the distribution (if it exists) of $$n^{\delta} (h_n - c).$$

If we can prove that, for some $\delta >0$ the above converges in distribution, then we can apply the logic of Hansen, and argue that $\exists \, \gamma, 0<\gamma < \delta$ for which

$$n^{\gamma}(h_n - c) \to_p 0$$

and so

$$(h_n - c) = o_p(1/n^{\gamma}) \implies E(h_n -c) = o(1/n^{\gamma}).$$

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • This doesn't correspond to what I asked though. I am asking about the effect on $\text{Cov}[A|\mathbf{X}] = \text{Cov}[\hat \beta\mathbf{X}]$ of replacing $(\mathbf{X}\mathbf{X}/n)^{-1}$ with its probability limit $Q^{-1}$. My questions asks is such an approximation valid? What level of error is incurred, and can we account for the error? – sonicboom Dec 04 '20 at 16:56
  • I am only interested in the standard regression case when $E[\varepsilon|\mathbf{X}] = 0$. Your post doesn't mention anything about the probability limit $Q^{-1}$ which is they key point of my question. – sonicboom Dec 04 '20 at 19:23
  • When we make the approximation we incur an error $\delta(n)$ because in general $$ \delta(n) = \text{Cov}[A|\mathbf{X}] - \dfrac{\sigma^2}{n}Q^{-1} = \dfrac{\sigma^2}{n}\bigg(\dfrac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1} - \dfrac{\sigma^2}{n}Q^{-1} = \dfrac{\sigma^2}{n}\bigg(\bigg(\dfrac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1} - Q^{-1}\bigg) \neq 0. $$ In general, the error $\delta(n) = 0$ only when we take the probability limit of $(\mathbf{X}^T\mathbf{X}/n)^{-1}$. So I want to know if we can say something about how small $\delta$ is when $n$ is large. – sonicboom Dec 04 '20 at 19:23
  • You have it the wrong way around, I'm not trying to approximate the probability limit. I am talking about using the probability limit $Q^{-1}$ to approximate the finite sample variable $(\mathbf{X}^T\mathbf{X}/n)^{-1}$ when $n$ is large. These objects are not equal so an error $\delta$ is made. – sonicboom Dec 04 '20 at 19:50
  • I am interested in establishing a theoretical bound and I need to find an explicit rate of decay with respect to $n$. If I use the probability limit approximation I get $\text{Cov}[A|\mathbf{X}] \approx \frac{\sigma^2}{n}Q^{-1}$ which gives me an explicit rate for $n$. So that's why I want to do it. But I am wondering how big an error is incurred when I do this..in particular if the error is greater than or equal to $O(1/n)$ the error would be bigger than the approximation so the approximation would be useless. – sonicboom Dec 04 '20 at 20:03
  • Sorry, I was missing an expectation in the original post. I want an $n$-explicit rate of decay for $E[\text{Cov}[\hat\beta|\mathbf{X}]]$. If I make the probability limit approximation I get $E[\text{Cov}[\hat\beta|\mathbf{X}]]=\frac{\sigma^2}{n}Q^{-1}+\delta(n)$ which is $n$-explicit in the first term but features an approximation error term $\delta(n)$. As you stated, we incur an $o_p(1)$ error when we make the probability limit approximation; how does that error feed into $\delta(n)$? If it makes $\delta(n)=O(1/n)$ or bigger the approximation is no good as the error is at least as big as it. – sonicboom Dec 04 '20 at 20:42
  • I think we are on the same page now. I would write it as $$ \begin{align} \delta(n) &= \frac{\sigma^2}{n}\bigg(E\bigg[\bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1}\bigg] - Q^{-1}\bigg), \end{align} $$ where $Q^{-1} = \text{plim}_{n\to \infty}(\mathbf{X}^T\mathbf{X}/n)^{-1}$ which is a constant matrix. What are the little $\mathbf{x}$'s in your expression, is your $Q^{-1}$ the same as mine? – sonicboom Dec 05 '20 at 10:06
  • So its seems we have $$ \bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1} - Q^{-1} = o_p(1), $$ and then $$ E\bigg[\bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1}\bigg] - Q^{-1} = E[o_p(1)]. $$ So if we could find the dependence of the $E[o_p(1)]$ term on $n$ we would find the magnitude of the error $\delta(n)$. Although I'm not entirely sure the notation $E[o_p(1)]$ even makes sense. – sonicboom Dec 05 '20 at 10:12
  • I think this corresponds to finding an explict $C(n)$ for the following expression: $$ E\bigg[\bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1}\bigg] = Q^{-1} + C(n). $$ If we had that $C(n)$ we can say $\delta(n) = \frac{\sigma^2}{n}C(n)$ so we would be done. – sonicboom Dec 05 '20 at 10:18
  • Thanks, this is very helpful and I appreciate the help a lot. It seems I can use the approximation safely for large $n$! – sonicboom Dec 05 '20 at 13:19
  • I think there might still be a slight issue. $E[(n^{-1}\mathbf X'\mathbf X)^{-1}]- E[(E(\mathbf x'\mathbf x))^{-1}]$ is a scalar so it cant converge in probability. But I know that if random variables are uniformly integrable (UI) then convergence in probability implies convergence in expecation. So if assume $(n^{-1}\mathbf X'\mathbf X)^{-1}$ is UI we could say $$\lim_{n \to \infty}E[(n^{-1}\mathbf X'\mathbf X)^{-1}]-(E[\mathbf x'\mathbf x])^{-1}=E[(E(\mathbf x'\mathbf x))^{-1}] -(E[\mathbf x'\mathbf x])^{-1}=0, $$ which gives us an error of $\delta(n) = o(1/n)$ as you show. Do you agree? – sonicboom Dec 05 '20 at 13:36
  • Hi Alecos, regarding this question, it now appears that the $o(1)$ convergence leads to problems for me. Do you think there is any possibility that the convergence could be shown to be faster, for example: $E[(n^{-1}\mathbf X'\mathbf X)^{-1}] - [E(\mathbf x'\mathbf x)]^{-1} = o\bigg(\dfrac{1}{n}\bigg)$, or is $o(1)$ really a sharp indication of the true convergence? – sonicboom Dec 10 '20 at 16:17
  • @sonicboom But since I showed that $\delta(n) = o(1/n)$, doesn't this cover your needs? – Alecos Papadopoulos Dec 10 '20 at 16:36
  • The $o(1)$ rate for the probability limit approximation leads to $\delta(n) = o(1/n)$. It turns out that I need something faster such as $\delta(n) = o(1/n^2)$. Even $\delta(n) = o(1/n^{3/2})$ would be enough. Those rates for $\delta(n)$ arise if we have $o(1/n)$ or $o(1/\sqrt{n})$, respectively, for the probability limit approximation. – sonicboom Dec 10 '20 at 16:42
  • I just found something very interesting. In these [notes](https://www.ssc.wisc.edu/~bhansen/econometrics/Econometrics2013.pdf) an $o_p(1)$ rate of convergence (RoC) is mentioned in (6.6) and then in (6.13) it says that this can be strengthened to $O_p(n^{-1/2})$. I think if that $O_p(n^{-1/2})$ rate for the convergence in probability can be converted into a corresponding $O(n^{-1/2})$ for the the convergence in expectation everything will work out. But I'm not familiar with how a RoC in probability translates into a RoC in expectation. – sonicboom Dec 10 '20 at 17:03
  • 1
    @sonicboom That's Bruce Hansen's econometrics textbook. Let me have a look. – Alecos Papadopoulos Dec 10 '20 at 17:21
  • 1
    I have updated my answer to show what we need to apply Hansen approach in your case. I suggest you delete all these comments, the essence has by now been incorporated in my post. I am deleting my comments. – Alecos Papadopoulos Dec 10 '20 at 18:45
  • Very good, I see now how the convergence rate in probability can be translated into a convergence rate in expectation. So to get a faster RoC in expectation it ultimately comes down to proving that for $\delta > 0$, $n^\delta (h_n - c)$ converges in distribution. Or alternatively making a reasonably justified assumption that it exists. – sonicboom Dec 10 '20 at 18:55
  • I think there might be some room to maneuver here. The fact that $\text{plim}_{n\to \infty} (n^{-1}\mathbf{X}'\mathbf{X}) = Q$ comes from an assumption; see (4.19) in Econometric Analysis by Green, 8th Edition. So ultimately it seems we need to make a slightly stronger assumption and say that $\text{plim}_{n\to \infty} (n^{-1+\delta}\mathbf{X}'\mathbf{X}) = Q$ for an appropriate $\delta$ such that we then get $n^\delta(h_n - c)$ converges in distribution. – sonicboom Dec 10 '20 at 19:09
  • So we replace the usual assumption probability limit assumption (4.19 in Greene's book) for a (slightly) stronger one. How reasonable does this stronger assumption seem to you? – sonicboom Dec 10 '20 at 19:09
  • 1
    @sonicbom No, that definitely won't work. But If you write explicitly the $X'X$ matrix, it is comprised of sample means. Moreover, if I guess in the first column of $X$ you have a constant, then you can write $X$ and $X'X$ in blocks, and apply block-matrix inversion to obtain an explicit expression for the inverse. You will find that it includes sample means and so multiplied by $\sqrt{n}$ should lead to a distribution. This means that here too you end up having the Hansen result, namely you have room to improve the rate of convergence from $o_p(1)$ up to $o_p(1/n^{\delta}),\; \delta <1/2.$ – Alecos Papadopoulos Dec 10 '20 at 21:40
  • Ok if it could be improved even that much it would be enough to resolve the problem I ran into. I'm going to try your suggestion and I'll start a new post if I run into any issues or want to check some things as the rate of convergence in probability of $(\mathbf{X}'\mathbf{X}/n)^{-1}$ seems like an interesting topic in its own right. Thanks again. – sonicboom Dec 11 '20 at 14:52
  • Hi Alecos based on your suggestion I attempted to show a faster rate of convergence for the case of simple linear regression. My workings can be found [here](https://stats.stackexchange.com/questions/500431/rate-of-convergence-of-hat-q-xx-1-bigg-dfrac-mathbfxt-mathbfx). I got a rate of $o_p(n^{-1/2})$, does what I have done look valid? – sonicboom Dec 11 '20 at 16:57
  • That was a typo, I meant $O_p(n^{-1/2})$. – sonicboom Dec 11 '20 at 17:05