2

I am running regressions of the sort:

$$ y_{i}= \alpha + \beta T_{i} + \gamma G_{i} + \delta( T_{i} * G_{i}) + \rho X_{i} + \epsilon_{i} $$

where $T_{i}$ is binary treatment variable, $G_{i}$ is binary variable that indicates whether or not observation $i$ belongs to a group of interest and, $X_{i}$ is a set of covariates.

I am interested in estimating $\beta + \delta$, the average treatment effect of the group of interest. Then, I cut my full-sample in two sub-samples and I estimate:

  • $y_{i}= (\alpha+\gamma) + (\beta + \delta) T_{i} + \epsilon_{i}$ for all $i$ such that $G_{i}=1$ and,
  • $y_{i}= \alpha + \beta T_{i} + \epsilon_{i}$ for all $i$ such that $G_{i}=0$

which provide me with estimates of $(\beta + \delta)$ and $\beta$. Next, to estimate $\delta$ alone, I use:

$$ y_{i}= \alpha + \beta T_{i} + \gamma G_{i} + \delta( T_{i} * G_{i}) + \epsilon_{i} $$

These regressions are totally equivalent and I have $\hat{(\beta + \delta)}=\hat{\beta} + \hat{\delta}$, which is fine.

Now, when I introduce the set of covariates into these regressions, they are no longer equivalent. That is, when using:

  • $y_{i}= (\alpha+\gamma) + (\beta + \delta) T_{i} + \rho X_{i} + \epsilon_{i}$ for all $i$ such that $G_{i}=1$

    and

  • $y_{i}= \alpha + \beta T_{i} + \gamma G_{i} + \delta( T_{i} * G_{i}) + \rho X_{i} + \epsilon_{i}$,

I no longer get $\hat{(\beta + \delta)}=\hat{\beta} + \hat{\delta}$.

Is that normal? I can't understand why. Then, how can I estimate $\beta$, $\delta$ and $(\beta + \delta)$ with control variables (covariates)?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Greg
  • 21
  • 1

2 Answers2

1

You are using a very odd strategy. You should simply form a new variable as the product of the terms whose interaction you want to estimate ($Z = T\times G$), and then include that in a single multiple regression model. The reason you get different answers using your method when you include the covariates is that they are not orthogonal to your treatment and group variables. You can learn more about that in my answer here: Is there a difference between 'controlling for' and 'ignoring' other variables in multiple regression?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
1

Yes, that's normal, unless all the covariates are unrelated to the other variables, which is very unlikely to be exactly true even in a randomized study and unlikely to be even approximately true in an observational study.

You get an estimate of $\beta + \delta$ by adding those two terms in your original regression.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276