4

The Python package statsmodels provides a use_correction option when computing HAC standard errors for an OLS model, which purportedly corrects for small sample size. When I dug into the code however, I encountered the following comment:

just guessing on correction factor, need reference

This caused a little alarm, since this correction factor significantly affects the interpretation of my fit.

The correction factor, as far as I understand the code, seems to consist in simply multiplying the usual HAC covariance matrix with $n / (n - k)$, where $n$ is the number of observations and $k$ the number of parameters in the model. While this seems plausible, I am no expert, and would very much (like the code's author) appreciate a justification or reference for this factor.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
Anthony
  • 182
  • 1
  • 12
  • 1
    If I'm not mistaken, the code uses "HC1", in the terminology of [Long & Ervin (2000)](https://www.tandfonline.com/doi/abs/10.1080/00031305.2000.10474549?casa_token=g2Mx5LsetxkAAAAA:T68GCelpMoywdKr5KboZgnMezemHT_ocFy7ZvsF2ph3J9tVaHXGfV4e3524h6AUfv1SZCbjA65c) which can serve as a reference. – COOLSerdash Nov 25 '21 at 14:10
  • @COOLSerdash An excellent reference, thank you. The $N/(N_K)$ does indeed appear in the formula for HC1. I'm missing however the "autocorrelation" part of the HAC in your article: `statsmodels` does have different options for HC0 and HAC: https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html – Anthony Nov 25 '21 at 14:32
  • @COOLSerdash It seems as though `statsmodels` is computing the equivalent of HC1 in the article you linked to, but for the HAC covariance matrix instead of the HC0 one. – Anthony Nov 25 '21 at 14:34
  • 1
    This recent question is related: the correction factor would make the matrix estimator conditionally unbiased under a homoskedastic and "balanced" design: https://stats.stackexchange.com/questions/552311/unbiasedness-of-covariance-matrix-estimator-in-ols/552418#552418 – Christoph Hanck Nov 26 '21 at 07:19
  • @ChristophHanck Thanks for this. I would accept your comment as an answer if you would elaborate a little further. – Anthony Nov 26 '21 at 08:39
  • Thanks! That elaboration would however just be (at least I currently see nothing I could add beyond that answer) a duplicate of my linked answer, so that would add little. – Christoph Hanck Nov 26 '21 at 08:44
  • @ChristophHanck Fair enough! Would you maybe quickly comment on my above question about the relationship between HC1 and HAC? (i) Are they the same thing? (ii) If they are not, is the argument for the $n/(n-k)$ correction identical? (iii) Finally, in the Long and Ervin paper linked above, they argue for using HC3: would the corresponding correction factor then be $1 / (1 - h_{ii})^2 = (n / (n - k))^2$ – Anthony Nov 26 '21 at 08:54
  • 1
    Ad (i), to me, HC1 stands for version 1 of a "heteroskedasticity consistent" variance estimator, while HAC stands for "heteroskedasticity and autocorrelation consistent". See e.g. https://stats.stackexchange.com/questions/139221/newey-west-standard-errors-in-ols/139377#139377 and https://stats.stackexchange.com/questions/153444/what-is-the-long-run-variance/153543#153543 for the idea of the latter. (ii) The justification I gave in my comment would then (or so I'd say) not hold anymore, but it could still be an ad hoc correction that is useful) (iii) rather $1/(1-k/n)^2$, I'd say – Christoph Hanck Nov 26 '21 at 09:12

0 Answers0