Cov(e,X) in the population regression

Question

Say the population regression function is: $$ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i $$

(In the econometrics context) While I can't just assume that $E[\varepsilon_i | X_i] = 0$, can I not say that $Cov(\varepsilon_i, X_i) = 0$ just as an algebraic consequence of OLS?

Of course, OLS only that says $\widehat{Cov}(\varepsilon_i,X_i) = 0$, but assuming I'm running OLS on the entire population dataset, then this "sample" covariance is just the population covariance, right?

What am I missing here? Is it that we assume that there's a data generating process and thus I can't just say that $\varepsilon_i$ is the residual from OLS on the population dataset? Or is the error term in the population regression function not a residual from OLS but some structural error term of this model?

But if I run OLS on the entire population dataset, it is the residual.

The assumption regarding the errors $\varepsilon$, not residuals $\hat\varepsilon$, is just that, an assumption. You may or may not make it. It's either true or not. — Aksakal, Oct 15 '21 at 20:40
Right, but are the errors $\varepsilon$ not defined as the residuals of OLS but on the population dataset (as opposed to a sample)? Then it should inherit all the properties of the OLS, including $Cov(\varepsilon_i,X_i) = 0$, without having to claim it as an assumption. — FWL, Oct 15 '21 at 21:08
If you have the population then don’t assume or guess but measure and calculate. This way you’ll establish whether covariance is zero or not. — Aksakal, Oct 15 '21 at 21:36
I agree with @Aksakal: you do not need, and do not use, OLS when you have the entire population. — Sergio, Oct 15 '21 at 21:37
I'm starting to think: the $\beta$'s aren't a product of some statistical procedure, they're just some "true" parameters in the population. It exists out there, and we can never know what they are. Running OLS on the entire population dataset is, at the end of the day, just giving us another estimator for the $\beta$'s in the PRF. — FWL, Oct 15 '21 at 21:56
Read hear can help https://stats.stackexchange.com/questions/493211/under-which-assumptions-a-regression-can-be-interpreted-causally/493905#493905 — markowitz, Oct 15 '21 at 22:51
If you have a population AND know that the true model is OLS, then you can estimate coefficients using OLS. What you don't need is the variance-covariance matrix of the coefficients, because there only one value of them that you obtain. They are not estimated coefficients anymore, and the residuals are errors. So, you can directly measure covariance of errors with data etc. — Aksakal, Oct 15 '21 at 23:14
The more relevant point seems to me that it is not clear that the $\beta_1$ identified by this line of reasoning is not necessarily a ("structural") coefficient of interest. See, e.g., https://stats.stackexchange.com/questions/314216/whats-wrong-with-this-argument-that-the-error-term-cannot-be-correlated-with-ex/314279#314279 — Christoph Hanck, Oct 16 '21 at 16:08

score 2 · Answer 1 · answered Oct 16 '21 at 18:26

The assumption that $Cov(X,\epsilon)=0$ is not even needed. If you replace it with the more intuitive and practically relevant assumption that $E(Y|X=x) = \beta_0+\beta_1 x$, then the covariance condition is automatically true. So instead of worrying about the covariance assumption, you can instead worry about whether the conditional mean function is truly linear.

score 0 · Answer 2 · answered Oct 17 '21 at 00:56

The assumption $\mathrm{cov}[X_i,\epsilon]=0$, like the assumption $E[\epsilon_i=0]$ is needed to identify the parameter being targeted by OLS with the structural parameters in the model.

Let's separate the two notationally to make it easier to compare. Suppose $X$ and epsilon are provided and Y generated by a process $Y\gets \gamma_0+\gamma_1 X+\epsilon$. If $E[\epsilon]=0$ and $\mathrm{cov}[X,\epsilon]=0$ then the OLS $\hat\beta_0$ estimates $\gamma_0$ and the OLS $\hat\beta_1$ estimates $\gamma_1$.

On the other hand, if $\epsilon= 1-X/10$, then $$Y=\gamma_0+\gamma_1X+\epsilon= (\gamma_0+1) + (\gamma_1-1/10)X$$ and the OLS estimates will be consistent (and unbiased conditional on $X$) for $\beta_0=\gamma_0+1$ and $\beta_1=\gamma_1-1/10$.

The constraints are on the border between assumptions and specification choices. In scenarios where we think of $\epsilon$ merely as the difference between $Y$ and $\hat Y$, then $E[\epsilon]=0$ and $\mathrm{cov}[X,\epsilon]=0$ are the choices that we make to identify $\beta_0$ and $\beta_1$. In the (less common) scenarios where $\epsilon$ is genuinely some sort of measurement error in $Y$ and there is a real reason to believe in linearity of the relationship, the constraints on $\epsilon$ are genuinely falsifiable assumptions about the measurement error process.

Cov(e,X) in the population regression

2 Answers2