8

We have a simple linear regression model. Our assumptions are:

$Y_i =\beta_0+\beta_1X_i+ \varepsilon_i $, $i=1, \cdots, n$

$\varepsilon_i \sim N(0, \sigma^2)$

$Var(\varepsilon_i|X_i=x)=\sigma^2$

$\varepsilon_1, \cdots, \varepsilon_n$ are mutually independent.

$\\$

Are these hypothesis enough to claim that $\varepsilon_i|X_i=x \sim N(0, \sigma^2)$?

DGRasines
  • 275
  • 1
  • 11
  • In $\operatorname{Var}(\varepsilon_i|X=x)=\sigma^2$, $X$ does not have a subscript. Is that intentional? Also, in the very last formula $X$ does not have a subscript. Again, is that intentional? Are $X$ and $x$ vectors in both cases? – Richard Hardy Sep 09 '15 at 17:53
  • @RichardHardy They are note vectors, the OP says "simple" linear regression, meaning only one explanatory variable, besides the constant term. – Alecos Papadopoulos Sep 09 '15 at 18:03
  • @RichardHardy No, it was a mistake. – DGRasines Sep 09 '15 at 18:29
  • @AlecosPapadopoulos, I did notice that was a simple regression but then $X$ means a column vector and $X_i$ means one element of a vector. As the author noted, writing $X$ rather than $X_i$ was a mistake. – Richard Hardy Sep 09 '15 at 18:34
  • Are you asking whether the error term is _independent_ of $X_i$? Because I just realized that the conditional distribution you ask about is identical to the marginal distribution (not only normal, but also having the same variance). – Alecos Papadopoulos Sep 09 '15 at 20:26
  • @AlecosPapadopoulos Yes – DGRasines Sep 09 '15 at 20:32
  • 1
    Your title and your body questions don't seem to be asking the same thing. – Glen_b Sep 09 '15 at 23:28
  • @Glen_b I'm a bit confused. Isn't it true that $X$ and $Y$ are independent $\iff$ $Y|X=x$ follows the same distribution as $Y$ for any $x$? – DGRasines Sep 10 '15 at 11:54
  • I was reading the title as asking if they were independent and the body as asking what conditions were sufficient for the errors to be normally distributed with constant mean and variance... – Glen_b Sep 10 '15 at 15:42
  • I hope this is not too dumb a question, but when we say "residuals are N(0, \sigma^2)", we have k residuals e_1,...,e_k and each residual is N(0, \sigma^2)? Then we have k independent observations for x_1,...,x_k with values y_1....,y_k and for _each_ x_j , e_j:=(y_k - y^) is normally-distributed (where y^ is the linear estimator) ? I mean so that each error e_j associated to x_j is N(0, \Sigma^2)? I am just having trouble considering the j-th error as an r. variable since the order seems to be arbitrary. – MSIS Jan 10 '20 at 00:28

2 Answers2

13

No. Here's an interesting counterexample.

Define a density function

$$g(x) = \frac{2}{\sqrt{2\pi}}\exp(-x^2/2)I(-t \le x \le 0 \text{ or } t \le x)$$

for $t = \sqrt{2\log(2)} \approx 1.17741$. ($I$ is the indicator function.)

The plot of $g$ is shown here in blue. If we define $h(x) = g(-x)$, its plot appears in red.

Figure

Direct calculation shows that any variable $Y$ with density $g$ has zero mean and unit variance. By construction, an equal mixture of $Y$ with $-Y$ (whose PDF is $h$) has a density function proportional to $\exp(-x^2/2)$: that is, it is standard Normal (with zero mean and unit variance).

Let $X_i$ have a Bernoulli$(1/2)$ distribution. Suppose $\varepsilon_i|X=0$ has density $g$ and $\varepsilon_i|X=1$ has density $h$, with all the $(X_i, \varepsilon_i)$ independent. The assumption about $Y_i$ is irrelevant (or true by definition of $Y_i$) and all the other assumptions hold by construction, yet none of the conditional distributions $\varepsilon_i | X_i = x$ are Normal for any value of $x$.

Figures

These plots show a dataset of $300$ samples from a bivariate distribution where $E[Y|X]=5 + X.$ The $x$ values in the scatterplot at the left have been horizontally jittered (displaced randomly) to resolve overlaps. The dotted red line is the least squares fit to these data. The three histograms show the conditional residuals--which are expected to follow $g$ and $h$ closely--and then the combined residuals, which are expected to be approximately Normal.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
4

The assumption that the conditional variance is equal to the unconditional variance, together with the assumption that $E(\varepsilon_i)=0$, does imply zero conditional mean, namely

$$\{{\rm Var}(\varepsilon_i \mid X_i) = {\rm Var}(\varepsilon_i)\} \;\text {and}\;\{E(\varepsilon_i)=0\}\implies E(\varepsilon_i \mid X_i)=0 \tag{1}$$

The two assumptions imply that

$$E(\varepsilon_i^2 \mid X_i) -[E(\varepsilon_i \mid X_i]^2 = E(\varepsilon_i^2)$$ $$\implies E(\varepsilon_i^2 \mid X_i) - E(\varepsilon_i^2) = [E(\varepsilon_i \mid X_i]^2$$

Ad absurdum, assume that $E(\varepsilon_i \mid X_i)\neq 0 \implies [E(\varepsilon_i \mid X_i]^2 >0$

This in turn implies that $E(\varepsilon_i^2 \mid X_i) > E(\varepsilon_i^2)$. By the law of iterated expectations we have $E(\varepsilon_i^2) = E\big[ E(\varepsilon_i^2 \mid X_i)\big]$. For clarity set $Z \equiv E(\varepsilon_i^2 \mid X_i)$. Then we have that

$$E(\varepsilon_i \mid X_i)\neq 0 \implies Z > E(Z)$$

But this cannot be since a random variable cannot be strictly greater than its own expected value. So $(1)$ must hold.

Note that the reverse is not necessarily true.

As for providing an example to show that even if the above results hold, and even under the marginal normality assumption, the conditional distribution is not necessarily identical to the marginal (which would establish independence), whuber beat me to it.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • 1
    Let us add an additional assumption that the true model is indeed linear and $\operatorname{E}(\varepsilon_i|X_i)=0$. We then have 1. unconditional normality, 2. conditional mean zero and 3. conditional constant variance. I still do not see whether (and if so, how) that yields conditional *normality*. (The sequence would work fine the other way around: conditional normality + conditional mean zero + conditional constant variance --> {unconditional normality+mean zero+constant variance.}) – Richard Hardy Sep 09 '15 at 18:37
  • @RichardHardy Where in my answer do I write that conditional normality follows from the assumptions that you state? I explicitly write "conditional normality follows from _nowhere_". – Alecos Papadopoulos Sep 09 '15 at 18:49
  • True. But you "blame" the fact that the true model may be nonlinear. If I may add this extra assumption about the true model and reiterate the question, would your answer change? – Richard Hardy Sep 09 '15 at 18:54
  • @RichardHardy Where do I do that? The linearity/non-linerariy has to do with the mean-independence, not with conditional normality. – Alecos Papadopoulos Sep 09 '15 at 18:56
  • @RichardHardy To answer the enhanced set of assumptions, no, conditional normality still doesn't follow. – Alecos Papadopoulos Sep 09 '15 at 19:11
  • 1
    Thanks, I thought so, too. However, I was unable to come up with a *simple* counterexample where all the conditions (plus the one stating that the true model is linear) are satisfied but the conditional distribution is *nonnormal*. A counterexample is always a nice way to disprove a hypothesis, so I thought it would be nice to come up with one. Anyway, yours is a nice answer. – Richard Hardy Sep 09 '15 at 20:00
  • @RichardHardy I will post a second answer with a counter example. – Alecos Papadopoulos Sep 09 '15 at 20:02
  • @RichardHardy I had to totally change my answer because I have missed something obvious in the OP's post. So I'm afraid at least pat of our discussion in our comments became unconnected. As for the counter-example, see whuber's answer. – Alecos Papadopoulos Sep 10 '15 at 01:41
  • Hi @AlecosPapadopoulos could you have a look at this question which is closely related ? thanks https://stats.stackexchange.com/questions/366220/predictor-and-error-are-independent – Xavier Bourret Sicotte Sep 11 '18 at 08:03
  • @XavierBourretSicotte I posted an answer there. – Alecos Papadopoulos Sep 11 '18 at 11:57
  • @ Alecos Papadopoulos and Richard Hardy. It seems me that in this post there are room for ambiguity. Question speak explicitly about “regression” and its “error”. In the answer nothing is says explicitly but in the comments, rigtly, you underscore that the so called “exogeneity condition” refers on the “true model”. Now, probably for you is obvious that the true model in not a regression. However in my experience the ambiguities that move around concepts like: true model, regression, sample, population, and related “error” are very pernicious. – markowitz May 01 '20 at 08:11
  • Not rarely those ambiguity stay in authoritative econometrics textbooks also. Since some years to now I spend time to put clarity about that. I recently reply (here https://stats.stackexchange.com/questions/455373/does-homoscedasticity-imply-that-the-regressor-variables-and-the-errors-are-unco/462885#462885 ) a one question strongly related to that above. I am quite confident about what I wrote. However, even if I have reference about the main points, I elaborate from myself. You are expert users, I would like your opinion on my answer. – markowitz May 01 '20 at 08:12