1

Often it is said that heteroskedasticty could be assessed in a graphical way: for instance it can be inferred by looking at the residuals of a regression. However, this seems to me quite discretionary. For instance, in the following pictures:

http://www.hosting.universalsite.org/image-alpha-E3E5_58CE5E81.png

How would you say if residuals are homo- / heteroskedastic and to what extent? And based on what exactly?

Is there a best practice for this kind of analysis?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user25954
  • 129
  • 1
  • 5
  • 2
    Possible duplicate of [Interpreting the residuals vs. fitted values plot for verifying the assumptions of a linear model](http://stats.stackexchange.com/questions/76226/interpreting-the-residuals-vs-fitted-values-plot-for-verifying-the-assumptions) – gung - Reinstate Monica Mar 19 '17 at 12:59
  • 2
    I think you will find the information you need in the linked thread. Please read it. If it isn't what you want / you still have a question afterwards, come back here & edit your question to state what you learned & what you still need to know. Then we can provide the information you need without just duplicating material elsewhere that already didn't help you. – gung - Reinstate Monica Mar 19 '17 at 12:59

1 Answers1

4

As a complement to the plots that you show (and some others shown in the answers linked on the right-hand-side of this page), a range-mean plot is easy to get and is often informative.

The idea is to split the series of residuals into blocks of length, say, $k=\sqrt n$, where $n$ is the number of residuals; i.e, the first block contains the residuals from 1 to $k$, the second one contains the residuals from $k+1$ to $2k$, and so on.

The mean and the range are obtained for each block and displayed in a graphic. If the variance is homogeneous throughout time, the points will be located around an horizontal line; otherwise, an increasing or decreasing (or a more complex) pattern will be observed.

As the residuals will, in principle, have a constant mean. It is better to display the means of the times at which the observations are observed.


Example (taken from documentation of R package lmtest).

# Residuals of 'dy' in data set 'jocci' regressed on six lags
require("lmtest")
data(jocci)
fit <- lm(dy ~ dy1 + dy2 + dy3 + dy4 + dy5 +dy6, data=jocci)
e   <- residuals(fit)

As I said, as there is no trend in the residuals, the mean of the blocks of residuals is not informative.

k  <- floor(sqrt(length(e)))
le <- split(e, gl(ceiling(length(e)/k), k)[seq_along(e)])
r  <- unlist(lapply(le, FUN=function(x) diff(range(x))))
m  <- unlist(lapply(le, FUN=mean))
plot(m, r, ylab="range (residuals)", xlab="mean (residuals)", 
     main="range against mean of residuals")
abline(lm(r ~ m))

Alternatively, take blocks of the times of observations. It is observed that, as we advance in the series of residuals, the range increases. A regression line shows a significant trend. This suggests therefore a heteroskedastic pattern.

par(mfrow=c(2,1), mar=c(4,4,3,3))
plot(e, type="h", main="residuals")
r   <- unlist(lapply(le, FUN=function(x) diff(range(x))))
lid <- split(seq_along(e), gl(ceiling(length(e)/k), k)[seq_along(e)])
m   <- unlist(lapply(lid, FUN=mean))
plot(m, r, ylab="range (residuals)", xlab="mean (times)",
     main="range of residuals against time")
fit <- lm(r ~ m)
abline(fit)
summary(fit)

range-mean plot

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
javlacalle
  • 11,184
  • 27
  • 53