Interaction variable in multiple regressions

Question

I am running regressions of the sort:

$$ y_{i}= \alpha + \beta T_{i} + \gamma G_{i} + \delta( T_{i} * G_{i}) + \rho X_{i} + \epsilon_{i} $$

where $T_{i}$ is binary treatment variable, $G_{i}$ is binary variable that indicates whether or not observation $i$ belongs to a group of interest and, $X_{i}$ is a set of covariates.

I am interested in estimating $\beta + \delta$, the average treatment effect of the group of interest. Then, I cut my full-sample in two sub-samples and I estimate:

$y_{i}= (\alpha+\gamma) + (\beta + \delta) T_{i} + \epsilon_{i}$ for all $i$ such that $G_{i}=1$ and,
$y_{i}= \alpha + \beta T_{i} + \epsilon_{i}$ for all $i$ such that $G_{i}=0$

which provide me with estimates of $(\beta + \delta)$ and $\beta$. Next, to estimate $\delta$ alone, I use:

$$ y_{i}= \alpha + \beta T_{i} + \gamma G_{i} + \delta( T_{i} * G_{i}) + \epsilon_{i} $$

These regressions are totally equivalent and I have $\hat{(\beta + \delta)}=\hat{\beta} + \hat{\delta}$, which is fine.

Now, when I introduce the set of covariates into these regressions, they are no longer equivalent. That is, when using:

$y_{i}= (\alpha+\gamma) + (\beta + \delta) T_{i} + \rho X_{i} + \epsilon_{i}$ for all $i$ such that $G_{i}=1$

and
$y_{i}= \alpha + \beta T_{i} + \gamma G_{i} + \delta( T_{i} * G_{i}) + \rho X_{i} + \epsilon_{i}$,

I no longer get $\hat{(\beta + \delta)}=\hat{\beta} + \hat{\delta}$.

Is that normal? I can't understand why. Then, how can I estimate $\beta$, $\delta$ and $(\beta + \delta)$ with control variables (covariates)?

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

1

You are using a very odd strategy. You should simply form a new variable as the product of the terms whose interaction you want to estimate ($Z = T\times G$), and then include that in a single multiple regression model. The reason you get different answers using your method when you include the covariates is that they are not orthogonal to your treatment and group variables. You can learn more about that in my answer here: Is there a difference between 'controlling for' and 'ignoring' other variables in multiple regression?

edited Apr 13 '17 at 12:44

Community

1

answered May 20 '14 at 14:13

gung - Reinstate Monica

132,789
81
357
650

Am I missing something? It looks like his first regression does that. (I agree that that is the way to go). – Peter Flom May 20 '14 at 14:14
Then you have the interaction term already. You're done. – gung - Reinstate Monica May 20 '14 at 14:19
Thank you to both of you for your answers. This is the answer that I suspected but I wanted to make sure. Now, my concern is, using my first regression, how do I get s.e for (β+δ)? – Greg May 20 '14 at 14:22
I'm not sure what that would mean. However, if you want to test for both effects simultaneously, you can perform a nested model test. – gung - Reinstate Monica May 20 '14 at 14:48
1

Greg: $\sqrt{\text{Var}(\hat\beta+\hat\delta)}=\sqrt{\text{Var}(\hat\beta)+\text{Var}(\hat\delta)+2 \text{Cov}(\hat\beta,\hat\delta)}$ – Glen_b May 20 '14 at 16:32
Great! Thank you very much to all of you! It's much clearer now. – Greg May 20 '14 at 17:50

score 1 · Answer 2 · answered May 20 '14 at 14:13

Yes, that's normal, unless all the covariates are unrelated to the other variables, which is very unlikely to be exactly true even in a randomized study and unlikely to be even approximately true in an observational study.

You get an estimate of $\beta + \delta$ by adding those two terms in your original regression.

Interaction variable in multiple regressions

2 Answers2