Standardized residuals in R's lm output

Question

If I plot the diagnostic plots to an R regression, a couple of them have "Standardized Residuals" as their y-axis such as in this plot:

enter image description here

What are the residuals standardized over? That is, let us assume that in my model, there are 100 predicted values; hence 100 residuals.

Standardized residual $e_i$ is defined as $(e_i - \bar e)/s_e$(realized residual - mean of all 100 realized residuals)/(standard deviation of all 100 realized residuals)?
Since each residual $e_i$ is itself a realized value out of a distribution of possible realizations for this single residual $e_i$, is this residual $e_i$ normalized by its own mean $\bar e_i$ and variance $\text{Var}(e_i)$ (as opposed to the mean and variance from all other values 1 to 100 as described above)?

I tried finding documentation clarifying this distinction but could not find any that was beyond doubt.

Glen_b · Accepted Answer · 2018-01-23T09:41:58.340

7

If you look at the code for plot.lm (by typing stats:::plot.lm), you see these snippets in there (the comments are mine; they're not in the original):

r <- residuals(x)                                # <---  r contains residuals

...

if (any(show[2L:6L])) {
    s <- if (inherits(x, "rlm")) 
        x$s
    else if (isGlm) 
        sqrt(summary(x)$dispersion)   
    else sqrt(deviance(x)/df.residual(x))        #<---- value of s
    hii <- lm.influence(x, do.coef = FALSE)$hat  #<---- value of hii

...

    r.w <- if (is.null(w)) 
        r                                        #<-- r.w  for unweighted regression
    else sqrt(w) * r
    rs <- dropInf(r.w/(s * sqrt(1 - hii)), hii)  # <-- std. residual in plots

So - if you don't use weights - the code clearly defines its standardized residuals to be the internally studentized residuals defined here:

http://en.wikipedia.org/wiki/Studentized_residual#How_to_studentize

which is to say:

$${\widehat{\varepsilon}_i\over \widehat{\sigma} \sqrt{1-h_{ii}\ }}$$

(where $\widehat{\sigma}^2={1 \over n-m}\sum_{j=1}^n \widehat{\varepsilon}_j^{\,2}$, and $m$ is the column dimension of $X$).

edited Jan 23 '18 at 09:41

answered Mar 17 '13 at 22:56

Glen_b

257,508
32
553
939

Thanks for the explanation. Some clarification because I am so surprised: so that means that standardized residuals are actually _just_ studentized residuals? Not $(e_i - \bar e)/s_e$? It is surprising to me because of the confusion incurred by naming it "standardized" instead of "studentized" - or maybe this is common practice? – dval Mar 20 '13 at 01:16
2

the term 'standardized residual' is not a standardized term. Different people use it to mean somewhat different things. The meaning used in plot.lm would be - easily - the most common one in regression packages, though). Note that in regression with an intercept, $\bar{e}$ is 0. – Glen_b Mar 20 '13 at 01:43
haha, the irony! thank you so much. so just to confirm, the standardized residual shown in the plot is just a studentized residual - as the code shows? – dval Mar 20 '13 at 02:30
1

Internally studentized residual as defined in the formula I gave, yes. If you're unconvinced, you could compute an internally studentized residual from the available regression information directly and compare. – Glen_b Mar 20 '13 at 03:48

score 2 · Answer 2 · answered Mar 17 '13 at 20:54

standardized (or studentized) residuals are the residuals divided by their standard deviations. Standard deviation for residuals in a regression model can vary by a great deal from point to point, so it often makes sense to standardized them by their standard deviation in order to make comparisons more meaningful.

Standardized residuals in R's lm output

2 Answers2

Linked