Why don’t we need strict exogeneity for OLS consistency?

Question

I know how to show that OLS only requires orthogonality between regressor and error for consistency, so the title is maybe a misnomer (couldn’t think of a better one)

But consider the following regression: $$y_i=\alpha + x_i\beta + u_i$$

Where $x_i, u_i$ are orthogonal . Consider now this demeaned version of it

$$\tilde y_i:=y_i-\bar y = (x_i-\bar x)\beta+ u_i-\bar u$$

Where $\bar x = \frac 1 N \sum_i^Nx_i, \bar u = \sum_i^Nu_i$, and similarly for $\bar y$, and $x$ is a r.v. Here are two facts that seem paradoxical to me:

The OLS estimator of $\beta$ in the first equation is the same as in the second. Therefore both estimators are consistent, since the first one is consistent by orthogonality of regressors and the error term.
In the second equation, the regressor $\tilde x_i=x_i-\bar x$ is not orthogonal with the error term $\tilde u_i=u_i-\bar u$. $E(\tilde x_i \tilde u _i)\neq 0$. This can be shown by expanding $E(x_i-\frac 1 N\sum_j{x_j})(u_i - \frac 1 N \sum_j{u_j})$

I know that orthogonality $\implies$ consistency does not imply that consistency $\implies$ orthogonality. But how do we show that indeed OLS in the second equation is consistent, given that the regressor and the error term are not orthogonal ?

How do $\bar{x}$ and $\bar{u}$ make the regressors and error term (or residual?) in the second equation non-orthogonal? and what are they, population averages, sample averages? Is $x$ a random variable or fixed? — Sextus Empiricus, Jun 02 '18 at 13:34
Note that you are subtracting off $\bar{u}$, which, if you have used OLS to estimate the parameter $\beta$, is equal to 0 by construction. If you change to subtracting off population means, it's 0 by assumption. If you haven't used OLS (or something else) to estimate the parameter $\beta$, you don't have any observed residuals, so you can't subtract $\bar{u}$ from $u$. It seems to me that the regressor and the error term are still orthogonal by construction. — jbowman, Jun 02 '18 at 14:40
@jbowman the error term is not observed, neither is its average. The sample average of the error term cannot be observed, but that doesn’t mean it doesn’t theoretically exist. $u$ is the error term, NOT the residual. — user56834, Jun 02 '18 at 14:42
If those are the true errors, then how do you lose orthogonality? They are orthogonal to the $x$ by assumption. Consistency is something that happens as the sample size goes to infinity, and $\bar{u} \rightarrow 0$ in that case and we are back to the situation where you are centering the regressors, which does not cause a loss of orthogonality. — jbowman, Jun 02 '18 at 14:48
@jbowman I think I realize now why consistency holds: it is because of the continuous mapping theorem for probability limits. I think I am right that orthogonality no longer holds, though. Just expand that expectation, and you’ll see. — user56834, Jun 02 '18 at 14:53
If x and u are uncorrelated, then any function g(x) is also uncorrelated with u. The premise of the question is wrong, subtracting the mean of u is just subtracting of 0 (since E(u) = 0 by construction if the model has an intercept). From x you are just subtracting an unknown but fixed number, this does not in any way break the orthogonality assumption. — Repmat, Jun 02 '18 at 17:31
@repmat, that is just wrong. The fact that $E(x_iu_j)\neq E(x_i)E(u_j)$ matters here. Just expand the sum, and u'll get a whole bunch of terms like that. And it doesnt follow from the LLN that they go to zero on average because theyre not random variables. Though maybe theres another reason why they do go to zero — user56834, Jun 02 '18 at 17:46
Maybe you could show a simulation to demonstrate what you mean. — Sextus Empiricus, Jun 02 '18 at 18:02
@programmer2134 no you are wrong, the sum of u is 0 by definition. Just think about the concept. Why should any of the asymptotics change just because you subtract a fixed number from x? — Repmat, Jun 02 '18 at 18:49
@Repmat The sum of OLS residuals $\hat u_i$, is equal to zero, and only if we include a constant term in the regression. The sum of the error terms $u_i$ is certainly not equal to zero. — Alecos Papadopoulos, Jun 02 '18 at 19:26
You are right, what I meant was that the mean is zero - it is a such an innocent assumptions, I suppose it often gets overlooked — Repmat, Jun 02 '18 at 21:01
Why do you think $E(x_iu_j) \neq E(x_i)E(u_j)$? The $u_j$ are typically assumed i.i.d. and independent of $x$, which guarantees that the two are equal. — jbowman, Jun 02 '18 at 23:40
(Based on the comments,) this question might benefit from explicitly spelling out the definition of $\bar{u}$ which I assume to be $\frac{1}{N}\sum_{i=1}^N u_i$, not $E(u_i)$, not $\frac{1}{N} \sum_{i=1}^N \hat{u}_i$, as well as the definitions of $\tilde{x},\tilde{u}$ — Juho Kokkala, Jun 04 '18 at 17:06
@Programmer2134 You don't observe errors. You observe residuals. You can center those *only after* estimating the $\hat{\beta}$ least-squares estimator. — AdamO, Jun 05 '18 at 16:14

Alecos Papadopoulos · Answer 1 · 2018-06-03T11:25:37.097

This question brings on to the surface the fact that, usually, it is not stressed enough how important is to accompany the "orthogonality" assumption with the assumption $E(u)=0$, in order to get consistency. It is assumed, but it is not pointed out that if the assumption is that "regressors are orthogonal to the error term", then consistency hinges on this additional assumption also.

Consider the simple regression model as the OP does,

$$y_i = b_0 +b_1x_i +u_i$$

without making any assumptions apart from the regularity ones (expected values exist, matrices converge, sample is ergodic stationary). Then at the limit one gets for theOLS estimator of the beta vector $\mathbf b = (b_0,b_1)'$,

$$\text{plim} \hat {\mathbf b} -\mathbf b=[\text{Var}(x)]^{-1}\cdot \left[ \begin {array} \\ E(x^2) &-E(x)\\-E(x) & 1 \end {array}\right] \cdot \left[ \begin {array} \\ E(u) \\E(xu) \end {array}\right]$$

Carrying out the multiplications we get

$$\text{plim} \hat b_0 -b_0 = [\text{Var}(x)]^{-1} \cdot [E(x^2)E(u) - E(x)E(xu)]$$

$$\text{plim} \hat b_1 -b_1 = [\text{Var}(x)]^{-1} \cdot [-E(x)E(u) + E(xu)]$$

Now we can see what alternative assumptions can lead us to consistency for $b_1$.

Alternative A. A1: $x$ and $u$ are orthogonal (so $E(xu) = 0$) and A2: $E(u)=0$.

Alternative B. $x$ and $u$ are uncorrelated, so $\text{Cov}(x,u) = E(xu)-E(x)E(u) =0$, without making assumptions about $E(u)$.

Alternative C. C1: $x$ and $u$ are orthogonal (so $E(xu) = 0$) and C2: $E(x)=0$.

(Note that under $B$ or under $C$, we do not have consistency for the $\hat b_0$, this is why the usual assumption made is $A$).

Let's move to the centered regression. Something that the OP neglected is that at the limit sample means become expected values, i.e. constants. Then (using the tilde to denote population values)

$$E[(x_i-\bar x)(u_i-\bar u)] \to E[(x_i-E(x))(u_i-E(u))] = E(\tilde x \tilde u) = E(xu) - E(x)E(u)$$

and we do not have any cross-observation products. Here we have

$$\text{plim} \hat b_{1,centered} -b_1 = [\text{Var}(x)]^{-1} \cdot E(\tilde x \tilde u)$$

$$\implies \text{plim} \hat b_{1,centered} -b_1 = [\text{Var}(x)]^{-1} \cdot [ E(xu)-E(x)E(u)]$$

which is the exact same result as in the uncentered regression.

So in the centered regression consistency hinges on

$$E(\tilde x \tilde u) = 0 \implies E(xu) - E(x)E(u) = 0$$

This can be seen to hold irrespective of whether we assume $A$, or $B$, or $C$ for the uncentered regression. So if we have consistency in the uncentered regression for the $b_1$ coefficient, we will have it in the uncentered regression too.

PS: When stating the regression model in matrix form, it is customary to state the "orthogonality condition" as $E(\mathbf X'\mathbf u)=0$. Then, in most cases, the author says that "the rergessor matrix includes a constant". But then in the stated orthogonality condition, the assumption $E(\mathbf u) = 0$ is automatically included, since the first row of $\mathbf X'$ is a series of ones.

I don't get why this answer got a negative score (I would wish to see an explanation). The answer is very good (maybe it became too cluttered for some peoples tastes because of the thorough explanations). The fact that we also need E(u)=0 or E(u|x)=0 for consistency (aside from orthogonality as defined be E(xu)=0) is very important. — Sextus Empiricus, Jun 03 '18 at 10:57
*A simple overview:* The OP discusses strict exogeneity by using an example where a one form of OLS is changed into another form (with the same solution). But either (1) the first case does not have $E(u) = 0$ and is not necessarily consistent (2) or the first form does have $E(u) =0$ (and is consistent) and then the second form shares the same necessary conditions.......... thus the example given is not a case of one form having strict exogeneity and the other form having endogenous regressors, instead they are both the same. — Sextus Empiricus, Jun 03 '18 at 10:58
Alecos' answer addresses the main point of confusion for most intrepid regression modeling students. The book by Seber and Lee is based around elegant and almost stupid-easy proofs that are made possible, as Martijn alludes to, by simply assuming (and not stating) all covariates are centered. — AdamO, Jun 05 '18 at 16:14

Why don’t we need strict exogeneity for OLS consistency?

1 Answers1