8

If I plot the diagnostic plots to an R regression, a couple of them have "Standardized Residuals" as their y-axis such as in this plot:

enter image description here

What are the residuals standardized over? That is, let us assume that in my model, there are 100 predicted values; hence 100 residuals.

  1. Standardized residual $e_i$ is defined as $(e_i - \bar e)/s_e$(realized residual - mean of all 100 realized residuals)/(standard deviation of all 100 realized residuals)?
  2. Since each residual $e_i$ is itself a realized value out of a distribution of possible realizations for this single residual $e_i$, is this residual $e_i$ normalized by its own mean $\bar e_i$ and variance $\text{Var}(e_i)$ (as opposed to the mean and variance from all other values 1 to 100 as described above)?

I tried finding documentation clarifying this distinction but could not find any that was beyond doubt.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
dval
  • 224
  • 1
  • 2
  • 8

2 Answers2

7

If you look at the code for plot.lm (by typing stats:::plot.lm), you see these snippets in there (the comments are mine; they're not in the original):

r <- residuals(x)                                # <---  r contains residuals

...

if (any(show[2L:6L])) {
    s <- if (inherits(x, "rlm")) 
        x$s
    else if (isGlm) 
        sqrt(summary(x)$dispersion)   
    else sqrt(deviance(x)/df.residual(x))        #<---- value of s
    hii <- lm.influence(x, do.coef = FALSE)$hat  #<---- value of hii

...

    r.w <- if (is.null(w)) 
        r                                        #<-- r.w  for unweighted regression
    else sqrt(w) * r
    rs <- dropInf(r.w/(s * sqrt(1 - hii)), hii)  # <-- std. residual in plots

So - if you don't use weights - the code clearly defines its standardized residuals to be the internally studentized residuals defined here:

http://en.wikipedia.org/wiki/Studentized_residual#How_to_studentize

which is to say:

$${\widehat{\varepsilon}_i\over \widehat{\sigma} \sqrt{1-h_{ii}\ }}$$

(where $\widehat{\sigma}^2={1 \over n-m}\sum_{j=1}^n \widehat{\varepsilon}_j^{\,2}$, and $m$ is the column dimension of $X$).

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Thanks for the explanation. Some clarification because I am so surprised: so that means that standardized residuals are actually _just_ studentized residuals? Not $(e_i - \bar e)/s_e$? It is surprising to me because of the confusion incurred by naming it "standardized" instead of "studentized" - or maybe this is common practice? – dval Mar 20 '13 at 01:16
  • 2
    the term 'standardized residual' is not a standardized term. Different people use it to mean somewhat different things. The meaning used in plot.lm would be - easily - the most common one in regression packages, though). Note that in regression with an intercept, $\bar{e}$ is 0. – Glen_b Mar 20 '13 at 01:43
  • haha, the irony! thank you so much. so just to confirm, the standardized residual shown in the plot is just a studentized residual - as the code shows? – dval Mar 20 '13 at 02:30
  • 1
    Internally studentized residual as defined in the formula I gave, yes. If you're unconvinced, you could compute an internally studentized residual from the available regression information directly and compare. – Glen_b Mar 20 '13 at 03:48
2

standardized (or studentized) residuals are the residuals divided by their standard deviations. Standard deviation for residuals in a regression model can vary by a great deal from point to point, so it often makes sense to standardized them by their standard deviation in order to make comparisons more meaningful.

Eric Peterson
  • 2,323
  • 14
  • 20