3

The OLS estimate $b$ is equal to $(X^TX)^{-1}X^Ty$ for the linear regression model. If we assume that $E(\epsilon|X)=0$ then it is easy to prove simply by taking the conditional expectation, of $b$ substituting in the expression for $y$ and simplifying.

But how do we prove it if we only know that $E(X^T\epsilon)=0$? $E(\epsilon|X)=0$ implies $E(X^T\epsilon)=0$, but not the other way around.

EDIT: Can I just get an answer, just to be sure, that even if the $u_i$'s are i.i.d., that $E(x_iu_i)=0$ does not imply unbiasedness?

So just to be absolutely clear: If $u_i$ is i.i.d., and we know that $E(x_iu_i)=0$ holds, but we don't know whether $E(u_i|x_i)=0$ holds, then OLS may be biased?

user56834
  • 2,157
  • 13
  • 35

2 Answers2

5

For this question we can make use of a simple decomposition of the OLS estimator:

$$\begin{equation} \begin{aligned} \hat{\boldsymbol{\beta}} = (\mathbf{X}^\text{T} \mathbf{X})^{-1} \mathbf{X}^\text{T} \mathbf{Y} &= (\mathbf{X}^\text{T} \mathbf{X})^{-1} \mathbf{X}^\text{T} (\mathbf{X} \boldsymbol{\beta} + \mathbf{\epsilon}) \\[6pt] &= \boldsymbol{\beta} + (\mathbf{X}^\text{T} \mathbf{X})^{-1} \mathbf{X}^\text{T} \mathbf{\epsilon}. \\[6pt] \end{aligned} \end{equation}$$

This useful decomposition follows directly from the form of the OLS estimator and the underlying regression equation, so it is not dependent on any assumptions about the behaviour of the error terms. From this decomposition, the conditional bias (taking the regressors as fixed) is:

$$\text{Bias}(\hat{\boldsymbol{\beta}}|\mathbf{x}) = \mathbb{E}(\hat{\boldsymbol{\beta}} | \mathbf{x}) - \boldsymbol{\beta} = (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbb{E}(\mathbf{\epsilon}| \mathbf{x}).$$

The unconditional (marginal) bias (taking the regressors as random variables) is:

$$\text{Bias}(\hat{\boldsymbol{\beta}}) = \mathbb{E}(\hat{\boldsymbol{\beta}}) - \boldsymbol{\beta} = \mathbb{E}((\mathbf{X}^\text{T} \mathbf{X})^{-1} \mathbf{X}^\text{T} \mathbf{\epsilon}).$$

In both cases, the condition $\mathbb{E}(\mathbf{\epsilon}| \mathbf{x}) = \mathbf{0}$ is sufficient for unbiasedness, but in the latter case, the weaker condition $\mathbb{E}((\mathbf{X}^\text{T} \mathbf{X})^{-1} \mathbf{X}^\text{T} \mathbf{\epsilon}) = \mathbf{0}$ is sufficient. The condition $\mathbb{E}( \mathbf{X}^\text{T} \mathbf{\epsilon}) = \mathbf{0}$ is not sufficient for unbiasedness in either case.

Ben
  • 91,027
  • 3
  • 150
  • 376
3

You can't, because the statement is not true under the weaker assumption.

Consider for example the autoregressive model \begin{equation*} y_{t}=\beta y_{t-1}+\epsilon _{t}, \end{equation*} in which the strict exogeneity $E(\epsilon|X)$ is violated even under the assumption $E(\epsilon_{t}y_{t-1})=0$:

we have that \begin{equation*} E(\epsilon_ty_{t})=E(\epsilon_t(\beta y_{t-1}+\epsilon _{t}))=E(\epsilon_{t}^{2})\neq 0. \end{equation*} But, as $y_{t+1}=\beta y_{t}+\epsilon_{t+1}$, $y_t$ is also a regressor for $y_{t+1}$ and hence, it is impossible in this model that the error term is also uncorrelated with future regressors.

Now, it is also well-known that OLS is biased for the coefficient of an AR(1)-model, see Why is OLS estimator of AR(1) coefficient biased?

Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106
  • So this problem would go away if we instead required that $E(x_i\epsilon_j)=0$ for all i and j? – user56834 Jan 18 '18 at 10:15
  • What if we require that $E(x^T\epsilon)=0$ – user56834 Jan 18 '18 at 10:17
  • In my example, you cannot require the idea from the first comment, as I demonstrate. The second is the same as what you already state in your question, it seems. – Christoph Hanck Jan 18 '18 at 10:40
  • @Programmer2134 you are thinking about identification, which renders OLS consistent, but not necessarily unbiased in finite samples in statistical terms (you should take a look here https://stats.stackexchange.com/questions/303403/the-role-of-invoking-unconfoundedness-in-the-rubin-causal-model-treatment-estima/303414#303414). OLS is still consistent, as $n$ grows the estimator converges to the parameter, the bias we are talking about is a finite sample property of the estimator, in the sense that the expected value of the estimator is not the parameter. – Carlos Cinelli Jan 29 '18 at 17:27