5

I need to check the homogeneity of variances on the residuals of a linear regression. I read that Kruskal is also good without assuming a normal-distribution. But I don't know if it's good in my case. I'm doing a linear regression of two vectors of prices(stock prices). I also checked the normality with Anderson-Darling test for normality (ad.test of nortest package) and the residuals seems not normally distributed, but maybe I'm choosing a wrong test to check it.

So, my goal is to check if the variance of the residuals are homogeneous. Obviously I subdivide the residuals list in two groups:

res[1:300]
res[301:600]

What can I do?

Dail
  • 2,147
  • 12
  • 40
  • 54

2 Answers2

11

If I understand correctly, you have one predictor (explanatory variable $x$) and one criterion (predicted variable $y$) in a simple linear regression. The significance tests rests on the model assumption that for each observation $i$ $$ y_{i} = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i} $$ where $\beta_{0}, \beta_{1}$ are the parameters we want to estimate and test hypotheses about, and the errors $\epsilon_{i} \sim N(0, \sigma^{2})$ are normally-distributed random variables with mean 0 and constant variance $\sigma^{2}$. All $\epsilon_{i}$ are assumed to be independent of each other, and of the $x_{i}$. The $x_{i}$ themselves are assumed to be error free.

You used the term "homogeneity of variances" which is typically used when you have distinct groups (as in ANOVA), i.e., when the $x_{i}$ only take on a few distinct values. In the context of regression, where $x$ is continuous, the assumption that the error variance is $\sigma^{2}$ everywhere is called homoscedasticity. This means that all conditional error distributions have the same variance. This assumption cannot be tested with a test for distinct groups (Fligner-Killeen, Levene).

The following diagram tries to illustrate the idea of identical conditional error distributions (R-code here).

enter image description here

Tests for heteroscedasticity are the Breusch-Pagan-Godfrey-Test (bptest() from package lmtest or ncvTest() from package car) or the White-Test (white.test() from package tseries). You can also consider just using heteroscedasticity-consistent standard errors (modified White estimator, see function hccm() from package car or vcovHC() from package sandwich). These standard errors can then be used in combination with function coeftest() from package lmtest(), as described on page 184-186 in Fox & Weisberg (2011), An R Companion to Applied Regression.

You could also just plot the empirical residuals (or some transform thereof) against the fitted values. Typical transforms are the studentized residuals (spread-level-plot) or the square-root of the absolute residuals (scale-location-plot). These plots should not reveal an obvious trend of residual distribution that depends on the prediction.

enter image description here

N <- 100                                  # number of observations
X <- seq(from=75, to=140, length.out=N)   # predictor
Y <- 0.6*X + 10 + rnorm(N, 0, 10)         # DV
fit   <- lm(Y ~ X)                        # regression
E     <- residuals(fit)                   # raw residuals
Estud <- rstudent(fit)                    # studentized residuals

plot(fitted(fit), Estud, pch=20, ylab="studentized residuals",
     xlab="prediction", main="Spread-Level-Plot")
abline(h=0, col="red", lwd=2)
plot(fitted(fit), sqrt(abs(E)), pch=20, ylab="sqrt(|residuals|)",
     xlab="prediction", main="Scale-Location-Plot")
caracal
  • 11,549
  • 49
  • 63
  • wow! wonderful explanation, so are you advincing me to use an homoscedastic test like the Breush-Pagan test? I remember that i used it before BUT it doesn't works well with ouliers(I mean, if my series are outliers It could tell the series is Homoscedastic but is not), what is the best method to check homoscedastic in timeseries? – Dail Sep 12 '11 at 15:46
  • @Dail Yes, you can consider the Breusch-Pagan-Godfrey-Test or the White-Test. You can also have a look at heteroscedasticity-consistent standard errors (Huber-White). There may be more methods specifically suited for time series (GARCH, ...), but I'm not familiar with them. – caracal Sep 12 '11 at 16:05
  • it seems that only BP test there is on R...do you know a package that has HW test? – Dail Sep 12 '11 at 16:10
  • caracal, it seems that BP test always return a very loow p-value if I do: bptest(mod) where mod is my linear regression model. DO i have to "format" my data before doing BP test? maybe use this formula: diff(resid(mod))~1 ? – Dail Sep 12 '11 at 18:02
  • @Dail I've updated my answer for functions that implement the White-Test and the heteroscedasticity-consistent standard errors. – caracal Sep 12 '11 at 19:10
  • caracal, perfect! Thank you so much.. only one doubt about white.test. Using BP test i pass the model (linear reg), but white.test doesn't accept that model, i see x and y do i have to do something like: white.test(prices[,1],prices[,2]) ? – Dail Sep 13 '11 at 06:43
  • @¢aracal, then how to see the pages you told me in "Fox & Weisberg (2011), An R Companion to Applied Regression." ? I saw hccm)= but I didn't understand how to calculate the p.value. – Dail Sep 13 '11 at 07:08
  • @Dail Re Fox & Weisberg, for the example above: `library(car); library(lmtest); coeftest(fit, vcov=hccm)`. This is basically the normal t-test for regression parameters, just with the different standard errors. – caracal Sep 13 '11 at 08:07
  • I get: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.02994069 0.09444855 -0.317 0.7513 prices[, 2] 0.06069896 0.00064885 93.549 <2e-16 ***, Do i have to look at <2e-16 ? and what about white.test example I wrote above, correct? – Dail Sep 13 '11 at 08:15
  • `white.test` from package `tseries` is not about homoscedasticity, but nonlinearity. This function "generically computes the White neural network test for neglected nonlinearity". – Baumann Jan 13 '14 at 16:15
2

The straightforward answer seems to be Levene's Test. Also described at Wikipedia. Levene's is applicable in your case because it is less sensitive to departures from normality than an alternative, the Bartlett Test. Levene's is parametric but suitable even with some degree of non-normality. If the distribution radically departed from normality, as with extreme outliers, you'd want to use a non-parametric alternative.

I don't see any Kruskal test as being applicable here. But you'll also want to check other threads such as this one.

rolando2
  • 11,645
  • 1
  • 39
  • 60
  • ok, but how to understand if i need a parametric test or non-parametric test? As I told I need to check the homogeneity of the residuals of a linear regression(two vectors with sotck prices) – Dail Sep 12 '11 at 09:50
  • Please see edited answer. – rolando2 Sep 12 '11 at 17:49
  • yes, take a look at my comment to your answer.Thanks – Dail Sep 12 '11 at 18:00