1

I'm trying to implement Newey-West from scratch to better understand each component. Currently having trouble replicating a basic numerical example of Newey-West with lag=1 from statsmodel.

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
df = pd.DataFrame({'invest':[90.9, 97.4, 113.5, 125.7, 122.8, 133.3, 149.3, 144.2, 166.4, 195, 229.8, 228.7, 206.1, 257.9, 324.1, 386.6, 423, 401.9, 474.9, 414.5],
                   'price':[0.7167, 0.7277, 0.7436, 0.7676, 0.7906, 0.8254, 0.8679, 0.9145, 0.9601, 1, 1.0575, 1.1508, 1.2579, 1.3234, 1.4005, 1.5042, 1.6342, 1.7842, 1.9514, 2.0688]})
df['r_invest'] = df.eval("invest/price")

reg = smf.ols('r_invest ~ 1',data=df).fit(cov_type='HAC',cov_kwds={'maxlags':1})
reg.summary()

This returns the following regression summary, where the NW-adjusted standard error is 11.418:

Regression summary

The following is my own implementation:

N = df.shape[0]
divisor = N
v = df['r_invest'].std()**2/divisor
lag = 1
cov_shift_1 = df['r_invest'].shift(1).cov(df['r_invest'])/divisor

nw_v = v + 2*(1-(1/(lag+1)))*cov_shift_1
nw_se = np.sqrt(nw_v)
print(nw_se)

The result is 11.8587. As you can see, it is slightly off - any idea why that is? Am I incorrectly calculating Newey-West, or is statsmodel applying some additional non-standard adjustment?

Also, I was able to confirm statsmodel's numbers using R's NeweyWest method from the sandwich package. Using the parameters lag=1, prewhite=FALSE got me the same answer 11.418.

nwly
  • 111
  • 4

0 Answers0