Is the equation "$Y=\mathbb{E}[Y|X] + error$" an identity?

Question

Can I always use this equation to regress $Y$ on $X$, if I know the distribution of $Y$ to get an expression for the expectation term?

Depends how you define the error. If $E[error]=0$, then yes. $E[Y] = E[E[Y|X]$ is [the fundamental property](http://www.math.uah.edu/stat/expect/Conditional.html) — rightskewed, Oct 09 '15 at 04:38
The thing you quote is a *model*. It may be an identity in some circumstances (trivially, if the model holds, or if you define "error" so as to make it true, which makes sense in many - but not all - situations), but you haven't identified circumstances sufficiently. — Glen_b, Oct 09 '15 at 23:34
What I'm actually interested in knowing is that if I know *f(Y|X)*, can I use the above equation to form the regression equation through the conditional expectation (can be both linear and non-linear)? If yes, then is there any restriction on the distribution of error terms? — FreeSid91, Oct 10 '15 at 00:53

score 5 · Answer 1 · answered Oct 09 '15 at 07:10

Yes, it is an identity - just define $error:=Y-E[Y|X]$.

Whether or not that identity is helpful in the regression you mention is another matter.

For one thing, the conditional expectation may be a nonlinear function of $X$ (may for example include squares of $X$), so that a standard linear regression may not estimate this function consistently (nonparametric approaches may be preferable).

Another problem is whether the conditional expectation, even if it were linear, corresponds to the object you are interested in estimating.

For example, a regression of earnings on years of schooling can estimate the expected earnings of an employee with a given level of education. But: the additional earnings related to an additional year of earnings will generally not be the same thing as the causal effect of such additional schooling, unless you manage to control for all potential confounders (experience, ability, and many more) in your regression (or have another identification strategy, like a convincing instrumental variable).

Assuming that error terms are independently and normally distributed, then non-linear least square estimator is the same as the MLE estimator, which is consistent. This gives rise to another doubt- whether assuming error terms to be normal is harmless when conditional PDF of Y is not normal (say exponential). What I'm actually interested in knowing is that if I know f(Y|X), can I use the above equation to form the regression equation through the conditional expectation. If yes, then is there any restriction on the assumptions regarding error terms? — FreeSid91, Oct 10 '15 at 00:36

Is the equation "$Y=\mathbb{E}[Y|X] + error$" an identity?

1 Answers1

Linked