12

I am running a pooled OLS regression using the plm package in R. Though, my question is more about basic statistics, so I try posting it here first ;)

Since my regression results yield heteroskedastic residuals I would like to try using heteroskedasticity robust standard errors. As a result from coeftest(mod, vcov.=vcovHC(mod, type="HC0")) I get a table containing estimates, standard errors, t-values and p-values for each independent variable, which basically are my "robust" regression results.

For discussing the importance of different variables I would like to plot the share of variance explained by each independent variable, so I need the respective sum of squares. However, using function aov(), I don't know how to tell R to use robust standard errors.

Now my question is: How do I get the ANOVA table/sum of squares that refers to robust standard errors? Is it possible to calculate it based on the ANOVA table from regression with normal standard errors?

Edit:

In other words and disregarding my R-issues:

If R$^2$ is not affected by using robust standard errors, will also the respective contributions to explained variance by the different explanatory variables be unchanged?

Edit:

In R, does aov(mod) actually give a correct ANOVA table for a panelmodel (plm)?

Aki
  • 497
  • 4
  • 13

2 Answers2

12

The ANOVA in linear regression models is equivalent to the Wald test (and the likelihood ratio test) of the corresponding nested models. So when you want to conduct the corresponding test using heteroskedasticity-consistent (HC) standard errors, this cannot be obtained from a decomposition of the sums of squares but you can carry out the Wald test using a HC covariance estimate. This idea is used in both Anova() and linearHypothesis() from the car package and coeftest() and waldtest() from the lmtest package. The latter three can also be used with plm objects.

A simple (albeit not very interesting/meaningful) example is the following. We use the standard model from the ?plm manual page and want to carry out a Wald test for the significance of both log(pcap) and unemp. We need these packages:

library("plm")
library("sandwich")
library("car")
library("lmtest")

The model (under the alternative) is:

data("Produc", package = "plm")
mod <- plm(log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp,
  data = Produc, index = c("state", "year"))

First, let's look at the marginal Wald tests with HC standard errors for all individual coefficients:

coeftest(mod, vcov = vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
log(pc)    0.2920069  0.0617425  4.7294 2.681e-06 ***
log(emp)   0.7681595  0.0816652  9.4062 < 2.2e-16 ***
log(pcap) -0.0261497  0.0603262 -0.4335   0.66480    
unemp     -0.0052977  0.0024958 -2.1226   0.03411 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And then we carry out a Wald test for both log(pcap) and unemp:

linearHypothesis(mod, c("log(pcap)", "unemp"), vcov = vcovHC)

Linear hypothesis test

Hypothesis:
log(pcap) = 0
unemp = 0

Model 1: restricted model
Model 2: log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp

Note: Coefficient covariance matrix supplied.

  Res.Df Df  Chisq Pr(>Chisq)  
1    766                       
2    764  2 7.2934    0.02608 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Alternatively, we can also fit the model under the null hypothesis (mod0 say) without the two coefficients and then call waldtest():

mod0 <- plm(log(gsp) ~ log(pc) + log(emp),
  data = Produc, index = c("state", "year"))
waldtest(mod0, mod, vcov = vcovHC)

Wald test

Model 1: log(gsp) ~ log(pc) + log(emp)
Model 2: log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp
  Res.Df Df  Chisq Pr(>Chisq)  
1    766                       
2    764  2 7.2934    0.02608 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The test statistic and p-value computed by linearHypothesis() and waldtest() is exactly the same. Just the interface and output formatting is somewhat different. In some cases one or the other is simpler or more intuitive.

Note: If you supply a covariance matrix estimate (i.e., a matrix like vocvHC(mod)) instead of a covariance matrix estimator (i.e., a function like vocvHC), make sure that you supply the HC covariance matrix estimate of the model under the alternative, i.e., the non-restricted model.

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53
  • 1
    If I understand it correctly, the Wald-test shows whether including certain variables is significant or not. But is there a measure of _how much_ they actually improve the model, as e.g. sum of squares? – Aki Jan 07 '15 at 14:41
  • How can I implement HAC standard errors? I tried coeftest(mod, vcov = vcovHAC) but got an error message saying "no applicable method for 'estfun' applied to an object of class "c('plm', 'panelmodel')". It seems I need to calculate weights or an estimation function first. What method would you recommend? – Aki Feb 07 '16 at 16:45
  • While the `plm` package has methods for the `vcovHC()` generic from the `sandwich` package, it does not provide methods for `vcovHAC()`. Instead `plm` ships with its own dedicated functions for computing covariance matrices in panel models that potentially include serial correlation as well. See `vcovNW()` or `vcovSCC()` in package `plm`. – Achim Zeileis Feb 07 '16 at 23:03
  • Thank you! As far as I understand both functions relate to autocorrelation-robust SE. Is there any function that can be used for both heteroscedasticity- and autocorrelation-robust SE? – Aki Feb 08 '16 at 08:12
  • All three functions (`vcovHAC`, `vcovNW`, `vcovSCC`) are _H_eteroskedasticity and _A_utocorrelation _C_onsistent...that's what HAC stands for. – Achim Zeileis Feb 08 '16 at 11:31
  • OK. In the plm description it sounded like they were only correcting for autocorrelation, since it said nothing about heteroskedasticity. coeftest(mod, vcov = vcovNW) gives the error message "object 'vcovNW' not found", although all packages are loaded. Also the example in the plm-package description gives the same error. What am I doing wrong here? – Aki Feb 08 '16 at 13:39
  • You need to check the corresponding papers which types of correlation and/or heteroskedasticity are exactly accounted for and which are not. There are more possible patterns in panel data than in cross-section data - and the details differ between the different estimators. And `vcovNW` was added in recent versions of `plm` (1.5-x). – Achim Zeileis Feb 08 '16 at 15:48
2

This can be done with the Anova function in the car package. See this excellent answer for more detail and a review of other techniques for dealing with heteroskedasticity in ANOVA.

shadowtalker
  • 11,395
  • 3
  • 49
  • 109
  • Thank you. The problem is, that Anova() does not seem to work with models of type plm (panelmodel). – Aki Jan 06 '15 at 16:48
  • @Aki if I'm not mistaken the pooled OLS is equivalent to plain OLS, based on what it says in the vignette: http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf – shadowtalker Jan 06 '15 at 16:55
  • @Aki however it sounds like you might be interested in a richer ANOVA model. There are some R examples here: http://stats.stackexchange.com/questions/3874/unbalanced-mixed-effect-anova-for-repeated-measures – shadowtalker Jan 06 '15 at 18:47