2

I've been learning about endogeneity but after looking around online I've gotten more and more confused about what the definition is.

Most pages say that in a model $y=X\beta+\epsilon$ the definition of endogeneity is $E[X'\epsilon] \neq 0$. But a lot of these same pages say that endogeneity is when $X$ is correlated with the error, or in other words, (if I am understanding this correctly) $Cov(X',\epsilon) \neq 0$. But these two things are not the same in general, right?

So in total I'd like to know what the definition of endogeneity is. Am I just confused? Is the definition of "correlated" different than what I think it is?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user35734
  • 376
  • 3
  • 10
  • Since you don't actually tell us what you think "correlated" means, we may have difficulties answering your question. But here's a hint about the situation: what is the value of $E[\epsilon]$? What role does this quantity play in the formula for the covariance? When you account for that, what does the formula reduce to? – whuber Feb 20 '17 at 23:53
  • I thought that to say $X$ and $\epsilon$ are correlated is to say that $Cov(X', \epsilon) \neq 0$. Is that correct? As for your point, I suppose the covariance that I wrote reduces to the definition I've seen if $E[\epsilon]=0$. Is it the case that if $\beta$ is properly fitted then that is true? While I know it is true for OLS, I don't see why that has to be true in general. – user35734 Feb 21 '17 at 00:22
  • The distribution of $\epsilon$ has nothing whatsoever to do with how the model is fitted. The *only* things you know about $\epsilon$ are what you assume about it. Your question is about the *model*, not about data or OLS. – whuber Feb 21 '17 at 18:05
  • Ok. So it's accurate to say that in arbitrary linear models, the covariance definition and the expected value of a product definition of endogeneity are different correct? So which one is actually the true definition? – user35734 Feb 21 '17 at 23:28
  • They are mathematically equivalent given that the expectation of $\epsilon$ is zero. The proof, which is elementary (and almost trivial), uses standard formulas for the covariance. – whuber Feb 21 '17 at 23:46

2 Answers2

1

You are correct in noting that, if $E \epsilon \neq 0$, $$ E[X \epsilon] \neq Cov(X, \epsilon) = E[X(\epsilon - E\epsilon)]. $$ However, assuming $E \epsilon = 0$ is usually without loss of generality. In particular, if $X$ contains a constant and if the coefficient on the constant carries no "structural" interpretation then we can always redefine this coefficient to make sure that $E \epsilon =0$.

To see this, write $X = (1, W')'$ and $\beta = (\beta_0, \beta_1')'$. Plug in, solve for $\epsilon$ and take expectation to obtain:

$$ E[\epsilon] = -E[Y - W \beta_1] + \beta_0. $$

This shows that choosing $\beta_0 = E[Y - W \beta_1]$ guarantees $E \epsilon = 0$.

Andreas Dzemski
  • 697
  • 3
  • 8
0

The fact that you are confused is not so strange in my opinion. Recently I spent some effort in this direction.

Definition of exogeneity/endogeneity in econometrics frequently is ambiguous. For this reason there is ambiguous treatment of the causality. Read here: Regression and causality in econometrics

Note that endogenous/exogenous is a concept that should have only causal meaning. This point is matter of debate but my opinion is the previous. Read this related topic: Structural equation and causal model in economics

Other goal in econometrics is forecasting but in this setting the endogeneity problem do not play an important role. Read here: Endogeneity in forecasting

Basically, the most important concept is that the exogeneity condition must related to structural error ($u$). Statistically speaking the most frequent definition is to mean conditional independence, like: $E[u|X]=0$ that is stronger than orthogonality $E[uX]=0$; note that $E[u]=0$ is valid by assumpion not by costruction. So the orthogonality and scorrelation between (structural) error and covariates/regressor are the same thing.

Note that in regression the orthogonality/scorrelation (now $u$ is regression error = residual), is valid by costruction not by assumption; in general $E[u|X]=0$ do not hold bu it is not very important.

Shortly, the interpretation of $u$ is crucial. Most confusion coming from this point.

Basically the so called "error term" tha you find in some econometrics presentation must be interpreted as true/structural error. Others peculiarity about: orthogonality, correlation, conditional independence, full independence; can produce only confusion if the distinction between the two type of error above is not clear.

markowitz
  • 3,964
  • 1
  • 13
  • 28