4

In a regression context, $$ Y_i = \alpha + \beta T_i + \varepsilon_i $$ my textbook defines EHW robust variance estimator as $$ \widehat{\mathbb{V}_{\rm EHW}}(\widehat{\alpha}, \widehat{\beta} | \mathbf{X}) = \left(\sum_{i=1}^n \mathbf{X}_i \mathbf{X}_i^T \right)^{-1} \left( \sum_{i=1}^n \hat{\varepsilon}_i^2 \mathbf{X}_i \mathbf{X}_i^T \right) \left( \sum_{i=1}^n \mathbf{X}_i \mathbf{X}_i^T \right)^{-1} . \quad (1) $$ $n$ is the number of observations. $\mathbf{X}_i = (1, T_i)$ and $\mathbf{X} = (\mathbf{X}_1, \cdots, \mathbf{X}_n)^T$

It also says when $T_i$ is binary, $$ \begin{align} \widehat{\mathbb{V}_{\rm EHW}}(\widehat{\beta} | \mathbf{T}) &= \frac{s_1^2}{n_1} + \frac{s_0^2}{n_0},\quad (2)\\ s_t^2 &= \sum_{i=1}^n I(T_i=t)(Y_i - \bar{Y}_t)^2 \end{align} $$ where $n_1$ is the number of observations whose $T_i$ is 1, and $n_1 + n_0 = n$.

How can we get (2) from (1)?

Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106
user2978524
  • 367
  • 1
  • 12
  • Related: https://stats.stackexchange.com/questions/211793/linear-probability-model-dummy-variables-and-the-same-standard-errors-on-all-es/211813#211813 https://stats.stackexchange.com/questions/234958/co-variance-of-beta-coefficients-for-dummy-variable-regression-with-intercept/235097#235097 https://stats.stackexchange.com/questions/234727/standard-errors-of-regression-coefficients-in-a-dummy-variable-regression-model/234734#234734 – Christoph Hanck Jul 01 '18 at 16:53

1 Answers1

5

There is a little mistake in your statement, as your $s_t^2$, $t=0,1$, define the sum of squared residuals belonging to the two groups of observations. The formula you refer to (unless your textbook has a typo) defines $s_t^2$ as the estimate of the error variances of the two groups (which, under heteroskedasticity, are allowed to differ). Hence, they are defined as your quantities $s_t^2$ divided by $n_t$ - the average of the squared residuals.

Thus, we need to show that, in your notation, $$ \widehat{\mathbb{V}_{\text{EHW}}}(\hat\beta|\mathbf{X})=\frac{s_0^2}{n_0^2}+\frac{s_1^2}{n_1^2}. $$ First, from this answer, note that $$ (\mathbf{X}^T\mathbf{X})^{-1}= \frac{1}{n_0n_1} \begin{pmatrix} n_1&-n_1\\ -n_1&n_0+n_1 \end{pmatrix}, $$ where I have used $n=n_0+n_1$ and hence $nn_1-n_1^2=n_0n_1$. Next, the "meat" matrix of the sandwich is, in matrix notation, $$ \sum_{i=1}^n\hat{\epsilon}_i^2\mathbf{X}_i\mathbf{X}_i^T=\mathbf{X}^T\Sigma_{\hat\epsilon}\mathbf{X}, $$ where $\Sigma_{\hat\epsilon}$ is a diagonal matrix with the squared OLS residuals $\hat{\epsilon}_i^2=I(T_i=t)(Y_i-\bar{Y}_t)^2$ on the main diagonal. Thus, $$ \mathbf{X}^T\Sigma_{\hat\epsilon}\mathbf{X}= \begin{pmatrix} \sum_{i=1}^n\hat{\epsilon}_i^2&\sum_{i=1}^nI(T_i=1)\hat{\epsilon}_i^2\\ \sum_{i=1}^nI(T_i=1)\hat{\epsilon}_i^2&\sum_{i=1}^nI(T_i=1)\hat{\epsilon}_i^2 \end{pmatrix} $$ which, as the total sum of squared residuals is of course nothing but the square of the sum of squares of the two subsamples, equals $$ \mathbf{X}^T\Sigma_{\hat\epsilon}\mathbf{X}= \begin{pmatrix} s_0^2+s_1^2&s_1^2\\ s_1^2&s_1^2 \end{pmatrix} $$ in your notation. Putting things together gives $$ \begin{eqnarray*} \widehat{\mathbb{V}_{\text{EHW}}}(\hat\alpha,\hat\beta|\mathbf{X})&=&(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\Sigma_{\hat\epsilon}\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\\ &=& \frac{1}{n_0^2n_1^2} \begin{pmatrix} n_1&-n_1\\ -n_1&n_0+n_1 \end{pmatrix} \begin{pmatrix} s_0^2+s_1^2&s_1^2\\ s_1^2&s_1^2 \end{pmatrix} \begin{pmatrix} n_1&-n_1\\ -n_1&n_0+n_1 \end{pmatrix} \end{eqnarray*} $$ The $(2,2)$ element of that matrix (which corresponds to what we seek, i.e., $\widehat{\mathbb{V}_{\text{EHW}}}(\hat\beta|\mathbf{X})$) is, after direct multiplication, $$ \frac{-n_1n_0s_1^2+n_1^2s_0^2+(n_0+n_1)n_0s_1^2}{n_0^2n_1^2}=\frac{n_1^2s_0^2+n_0^2s_1^2}{n_0^2n_1^2}. $$

Here is a little illustration in R:

library(sandwich)
library(lmtest)

n <- 10
y <- rnorm(n)                   # some dependent variable
x1 <- rbinom(n, size=1, p=.4)   # a dummy regressor

n1 <- length(y[x1==1])          # the no. of y's belonging to T_i=1
n0 <- length(y[x1==0])          # the no. of y's belonging to T_i=0

reg <- lm(y~x1)
prepackaged <- coeftest(reg, vcov = vcovHC(reg, "HC0"))[2,2]^2 # square to get variance instead of standard error

s1.squared <- sum((y[x1==1] - mean(y[x1==1]))^2) # "your" s_1^2
s0.squared <- sum((y[x1==0] - mean(y[x1==0]))^2) # "your" s_0^2
sum.of.variances <- s1.squared/n1^2 + s0.squared/n0^2

X <- cbind(rep(1,n),x1)
matrix.formulation <- (solve(t(X)%*%X) %*% t(X)%*%diag(resid(reg)^2)%*%X %*% solve(t(X)%*%X))[2,2]

all.equal(prepackaged, sum.of.variances, matrix.formulation)
[1] TRUE 
Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106