34

This looks like a similar question and didn't get many responses.

Omitting tests such as Cook's D, and just looking at residuals as a group, I am interested in how others use residuals when assessing goodness-of-fit. I use the raw residuals:

  1. in a QQ-plot, for assessing normality
  2. in a scatterplot of $y$ versus residuals, for eyeball check of (a) hetereoscedasticity and (b) serial autocorrelation.

For plotting $y$ versus residuals to examine the values for $y$ where outliers may occur, I prefer to use the studentized residuals. The reason for my preference is that it allows easy viewing of which residuals at which $y$-values are problematic, although standardised residuals provide an extremely similar result. My theory on which is used is that it depends on which university one went to.

Is this similar to how others use residuals? Do others use this number of graphs in combination with summary statistics?

Michelle
  • 3,640
  • 1
  • 23
  • 33
  • 4
    Studentized residuals are udoubtedly better in detecting outliers, and, maybe, a little bit better in heteroscedasticity inspection. For other purposes, it makes no difference for me what residuals to use. – ttnphns Feb 12 '12 at 06:04
  • To bring attention to a question, Michelle, or ask for a change in its status (such as CW), please follow the "flag" link beneath the question. This will automatically notify all moderators. Embedding requests in questions, comments, or replies is hit-or-miss because it relies on the hope a moderator (or other high-rep user) will actually read it within a reasonable time! – whuber Feb 16 '12 at 15:28
  • @whuber Ah, see I did think one of you would read it eventually. :) Thanks for the tip on using flags. – Michelle Feb 16 '12 at 19:23
  • @Michelle, did this get enough attention for your satisfaction? I was interested in seeing a variety of responses, but it seems not to have gotten much of a range of opinions thus far. – gung - Reinstate Monica Jun 06 '12 at 20:09
  • 1
    Hi @ttnphns Why would they be better? In particular, why would studentized be better than standardized? (I've never really known the answer here) – Peter Flom Oct 04 '12 at 19:12
  • 4
    @Peter, Studentized residuals are less "distorted" by the OLS fitting algo and are closer to theoretical notion of ["errors"](http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics). They can be directly compared at different regions of the fit line, thence are better in decision if a point is an outlier. – ttnphns Oct 05 '12 at 06:58
  • What does it mean to eyeball check of hetereoscedasticity and serial autocorrelation? – abc Dec 18 '12 at 20:03

2 Answers2

9

This isn't so much an answer as a clarification on terminology. Your question asks about raw, standarized, and studentized residuals. However, this is not the terminology used by most statisticians, though I note your class notes state that it is.

Raw: same as you have it.

Standardized: this is actually the raw residuals divided by the true standard deviation of the residuals. As the true standard deviation is rarely known, a standardized residual is almost never used.

Internally Studentized: because the true standard deviation of the residuals is not typically known, the estimated standard deviation is used instead. This is an interanlly studentized residual, and it is what you called standardized.

Externally Studentized: the same as the internally studentized residual, except that the estimate of the standard deviation of the residuals is calcuated from a regression leaving out the observation in question.

Pearson: the raw residual divided by the standard deviation of the response variable (the y variable) rather than of the residuals. You don't have this one listed.

"leave one out": Doesn't have a formal name, but it is the same as the class notes.

standarized "leave one out": also doesn't have a formal name, but this is not what the class notes call studentized.

Sources:

  1. the same wiki link you have about studentized residuals ("a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation")

  2. documentation for residual calculation in SAS

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 2
    +1 Certainly some statisticians have used the terms in the OP's question (and not always perfectly consistently with others using the same words). I think the terms you use are becoming more common but I'm not sure on what basis we could guess at their relatively worldwide usage among statisticians -- papers, for example, don't necessarily help because the average statistician won't be actively publishing. You may be right -- but how would we know? [If you happen to edit again, you may want to replace "standarized" near the end with "standardized".] – Glen_b Aug 04 '14 at 22:42
2

Re: plots,

There is such a thing as overfitting, but overplotting cannot really do much harm, especially at diagnostics stage. A standardized normal probability plot cannot hurt next to your QQ-plot. I find it better to assess the middle of the distribution.

Re: residuals,

I run both standardized and studentized residuals at draft stage and usually end up coding the standardized ones. I don't know what other people actually run, because diagnostics are really coded down in the replication material that I find online.

Re: diagnostics,

For a linear model, I usually add variance inflation factors (with the vif command in Stata) and a few homoscedasticity tests (e.g. with the hettest command in Stata), as well as model decomposition with nested regression to check if the $R^2$ makes any sense.

Fr.
  • 1,343
  • 3
  • 11
  • 22