help to understand how residual standard deviation can differ at different points on X

Question

I read in more than one place that residual standard deviation can differ at different points on X. I cannot understand this statement. I find this while learning the very basics, so for me the standard deviation of the residuals is a single unique number, and I see no way how this single number can change depending on the xi being a simple number calculated across all samples. I guess I must be missing some necessary intermediate concept/steps. I would be very grateful is someone can explain this to me in a very basic way

My question was not clear trying to improve it: what is not clear to me is if there is such a thing as the standard-deviation of a specific point/sample, ie. if we have x <- 1:5, y <- c(10,6,12,15,2) and y ~ x, is there a formal concept of standard deviation of residual of a single sample and a formula to calculate it ? ie. can I, and if yes how do I calculate the standard deviation of the residual corresponding to x=1,y=10 ? of that corresponding to x=2,y = 6? ...

Your question concerns the distinction between *errors* and *residuals*. It can be informative to examine an extreme case. Consider fitting a line to the data $(0,\epsilon_1), (0,\epsilon_2), (100,\epsilon_3)$, where the $\epsilon_i$ are identically distributed *errors* and thereby have the *same* variances. The fit will pass right through that last point and split the first two points. Thus, the *residuals* $(\epsilon_2-\epsilon_1)/2, (\epsilon_1-\epsilon_2)/2,$ and $0$ will vary for the first two points but will *never* vary for the last point. — whuber, Feb 02 '17 at 21:47
You speak about residuals, no problem with them being different, that is what I would usually expect. My doubt is about how different observations, within the same sample, can have different standard deviations, this is what my course literally says. — , Feb 10 '17 at 12:42
Data will be whatever they will be, so it's pointless to opine about what their values are. But the *distributions* of the residuals must differ even though the distributions of the errors are assumed not to. In the example I gave, the residual at $x=100$ is *invariably* zero while the residuals at $x=0$ clearly vary. Thus their distributions must differ with different $x$. — whuber, Feb 10 '17 at 13:48
My previous comment was wrong due to having moved my mind to something else and rereading in a hurry forgot that I had actually asked about residual variance, and not variance of observations. Sorry about that. Thanks for you comments/explanations. — , Feb 10 '17 at 14:02

score 3 · Accepted Answer · edited Apr 13 '17 at 12:44

3

EDIT :

My answer was indeed wrong (thank you @Glen_b). I cannot delete it as it has been validated.

All I can do is to redirect you to this answer written by Alecos Papadopoulos.

In short, if you have a simple model: $y_i = \beta_0 + \beta_1*x_i + u_i$ then your $ith$ residual will be like $\hat{u_i} = y_i - \hat{y_i} = (\beta_0 - \hat{\beta_0}) + (\beta_1 - \hat{\beta_1})*x_i + u_i$.

If $Var(u_i) = \sigma$ then $Var(\hat{u_i}) = \sigma(1-\frac{1}{n} - \frac{(x_i - \bar{x})^2}{\sum(x_i^2-\bar{x}^2)})$.

All the calculation are explained in the link, and I could not do any better.

edited Apr 13 '17 at 12:44

Community

1

answered Feb 02 '17 at 17:07

LouisBBBB

193
13

1

Thanks for your answer, I see now that my question was not clear, I try again If I understand well you use the term deviation like a synonim of residual. What is not clear to me is if there is such a thing as the standard-deviation of a specific point/sample, ie. if we have x – Feb 02 '17 at 17:42
I think my edition answers your question now. – LouisBBBB Feb 03 '17 at 16:17
2

There's a very important issue that @whuber touches on that you seem not to be dealing with at all -- indeed some of what you say implies that you don't realize it happens -- but I think it's directly relevant here. If you use a linear regression on a random variable that is ***homoscedastic*** (errors have constant variance), the residuals will actually be ***heteroscedastic*** (the residuals will not have constant variance). A number of answers on site show calculations for the variance of residuals. – Glen_b Feb 05 '17 at 23:52

score 0 · Answer 2 · 2017-02-09T05:49:25.367

I may have found a sort of answer. In some cases (analysis of residuals, this terminology might be unprecise) standard deviation is calculated excluding a sample/data-point. The standard deviation so calculated, though calculated over all samples (minus one, "that" sample/data-point), is uniquely "identified" by the sample/data-point due to its exclusion from the computation. (A similar thing seems to happen for standard error calculated for similar purposes)

help to understand how residual standard deviation can differ at different points on X

2 Answers2

Linked