In simple linear regression, how does the derivation of the variance of the residues support its 'Constant Variance' Assumption?

Question

In simple linear regression: $$Residuals = \hat{Y} - Y$$

We can derive that:
$$Var(Residuals) = Var(\hat{Y} - Y) = (I-H)\cdot\sigma^2$$ ($\sigma^2$ is the variance of $Y$) (See derivation of Var(residuals))

So my question is:

Since $h_{ii}$ (diagonal elements in H) is different for each $i$, how come the $(I-H)\cdot\sigma^2$, which is the variance of the residuals, is a constant (one of the assumptions of the LR)?

Another thing that bothers me is:

We know that the MSE (which is calculated by the sum of squared residuals divided by n-p) is an unbiased estimate of the variance of the errors, so what are the relationships between MSE, the variance of the errors and the variance of the residuals? Which one is the LR assumption targeting at?

There's no assumption of constant variance of *residuals* in regression. There is a constant variance assumption however. (BTW can you please be consistent with residuals vs "residues" in your Q? They should all be residuals) — Glen_b, Jul 16 '17 at 06:50
Thanks a lot for the comment! My confusion is that: one of the assumptions of LR is homoscedasticity - (constant variance) of the ERRORS, and in order to check that, we usually generate residual plots, which are used to check whether the variance of the RESIDUALS is a constant; and given the derivation in the post, the variance of the RESIDUALS doesn't have to be a constant, so how can we use the residual plots to test the homoscedasticity property of the ERRORS? Thanks! And I will make the terms consistent in the post. — user2027016, Jul 16 '17 at 07:21
you're getting to the nub of an important question. Have you ever wondered why many stats packages by default offer plots not of raw residuals but residuals divided by $s\sqrt{1-h_{ii}}$? e.g. take a look at what R's `plot.lm` does — Glen_b, Jul 16 '17 at 07:29
@Glen_b, this is very insightful! It definitely helps a lot to clear my confusions, thank you sooo much! — user2027016, Jul 16 '17 at 07:54

Glen_b · Answer 1 · 2017-07-16T23:01:55.673

The assumption is that the errors have constant variance. The corresponding residuals don't have constant variance because points with larger values on the diagonal of the hat matrix ($H$) pull the fit more toward themselves than points with smaller values do, that is, they have greater influence on their own fitted value: $\dfrac{\partial \hat{y_i}}{\partial y_i}=h_{ii}$, the $i$th diagonal element of $H$.

As you say, the variance-covariance matrix of the residuals is given by $\sigma^2(I-H)$, which is normally estimated by $s^2(I-H)$, and so the standard error of the $i$th residual is $s\sqrt{1-h_{ii}}$.

As a result it's common to standardize residuals, scaling them to have constant variance: $\frac{y_i-\hat{y}_i}{s\sqrt{1-h_{ii}}}$.

In simple linear regression, how does the derivation of the variance of the residues support its 'Constant Variance' Assumption?

1 Answers1

Linked