2

Obviously, error terms can be correlated with explanatory variables. E.g. see this.

However, I came up with an argument why they cannot be, so I am wondering where the mistake in my argument is.

Assume we have a "true" linear regression model $$y=\beta_0+\beta_1x_1+\beta_2x_2+\epsilon$$

Where $E(\epsilon|x_1,x_2)=0$

But assume that instead we run the regression $$y=\beta_0+\beta_1x_1+\eta$$ Where $$\eta = \beta_2x_2 +\epsilon$$

Then $E(\eta|x_1)=E(\beta_2x_2 +\epsilon|x_1)=\beta_2\cdot E(x_2|x_1)$ Now, if we assume that the relation between $x_1$ and $x_2$ is purely linear, then we know that $E(x_2|x_1)=\gamma x_1$, so that $$E(\eta|x_1)=\beta_2\gamma x_1$$

But if this is true, then why can't we simply state the model

$$y=\beta_0+(\beta_1+\beta_2\cdot \gamma)x_1+\zeta$$ so that $E(y|x_1)=\beta_0+(\beta_1+\beta_2\cdot \gamma)x_1$, and $$E(\zeta|x_1)=0$$

In this adjusted model, the error term is uncorrelated with $x_1$, even though we have an omitted variable. So if we do the regression, we will have an unbiased estimator of $\beta_1+\beta_2\cdot \gamma$.

What is wrong with this argument? Is the only reason that this doesn't work simply because we cannot measure $\gamma$ and $\beta_2$?

user56834
  • 2,157
  • 13
  • 35
  • Why do you have to assume any (or a linear one specifically) association between $x_1$ and $x_2$? – IWS Nov 17 '17 at 09:28
  • Because I think that if there is no correlation between the two, then the ommission of $x_2$ is not problematic in the first place, so then the whole question would be irrelevant. – user56834 Nov 17 '17 at 09:37

1 Answers1

4

You seem to have rediscovered linear projection coefficients.

You are right that $\zeta$ has zero conditional mean given $x_1$, and hence OLS consistently estimates the linear projection coefficient of $y$ onto $x_1$. In terms of your true model, you consistently estimate $\beta_1+\beta_2\gamma$. In terms of your true model, you are however interested in estimating $\beta_1$. So unless $x_2$ is irrelevant for explaining $y$ (i.e., $\beta_2=0$) or $x_1$ and $x_2$ are unrelated ($\gamma=0$), OLS won't estimate the parameter you seek to estimate.

Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106