2

I need to estimate a linear regression with the OLS method. Since I assume the error terms to be correlated, I would like to account for heteroskedasticity and autocorrelation in the error terms. I already wrote some code, but I wonder whether it is correct.

So far, my code is:

myregression <- lm(x ~ y + z)
coeftest(myregression, vcov=NeweyWest(myregression))

Any help is highly appreciated!

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user144267
  • 55
  • 1
  • 2
  • 8
  • 1
    Presumably you are using the sandwich and lmtest packages. You can type ?coeftest and ?NeweyWest for examples (at the end of the help page). I personally prefer dynlm objects for timeseries type data, in case that's where your autocorrelation comes from... – P.Windridge Jan 04 '17 at 16:47
  • Thank you very much P.Windridge! I am using these packages. And I allready wrote the instructions. However, since it is very important that I do not make any mistake, I decided to ask for help here. As far as I understand you would rather use myregression – user144267 Jan 04 '17 at 16:52
  • 2
    If your question is only about R code, it is off topic here. If you have a statistical question about autocorrelation or sandwich estimators, please edit to clarify. – gung - Reinstate Monica Jan 04 '17 at 17:41
  • Be aware that the `NeweyWest` function performs a procedure called prewhitening before actually computing the HAC standard errors. This is done in order to "increase the performance" of the HAC algorithm and might be a good idea in your case, but you should be aware that this is done by default (you can turn it off though). See [here](https://stats.stackexchange.com/q/312297/182174) for a related question and [here](https://www.rdocumentation.org/packages/sandwich/versions/2.4-0/topics/NeweyWest) for the documentation of the `NeweyWest` function. – Candamir Nov 08 '17 at 21:18

1 Answers1

1

This question is about code but seeing as I've been looking at HAC estimates recently in R I will "answer".

  1. I have not checked the R implementation of Newey-West is exactly as in their original paper. However, your code does indeed calculate R's NeweyWest HAC estimate using the default bandwidth selection/lag method. (You can view this parameter with the "verbose=T" option.)

  2. If you know the form of the correlations in your data then you can take less of a "sledgehammer" approach than Newey-West. E.g. Prais-Winsten or Cochrane-Orcutt.

  3. Be aware that serial correlation is being examined here and so the order that your observations are sorted in does matter.

If you are interested you might consider a toy example where you generate correlated residuals on purpose to see how the Newey-West std error estimates/p-values differ. In the example below the correlated residuals make it "look like" the response can be fitted against a straight line. The lag parameter is automatically (and correctly) chosen = 1 (seen with verbose=T option). If you play with the process generating the residuals you can see how it changes.

set.seed(04012017)
n<-50
correlated_residuals<-arima.sim(list(ar = .9), n)
y<-correlated_residuals
x<-1:n
plot(x,correlated_residuals)
fit<-lm(y~x)
abline(fit)
summary(fit) # standard estimates
coeftest(fit,vcov=NeweyWest(fit,verbose=T))
P.Windridge
  • 2,075
  • 12
  • 13