3

I have a question about econometrics in general. If I have a regression with an interaction between two dummies, do I need to have those dummies separated in the regression too.

I have this regression: is this right?

$$ y_i=\beta_0 + \beta_1d\_sex_i*d\_country_i +\beta_2d\_country_i + \varepsilon_i $$

or do I need to do this:

$$ y_i=\beta_0 + \beta_1d\_sex_i*d\_country_i +\beta_2d\_country_i + \beta_3d\_sex + \varepsilon_i $$

Robert Long
  • 53,316
  • 10
  • 84
  • 148
lulube
  • 31
  • 1
  • 1
    This has been asked and answered many time. You should include the main effects along with the interaction. See [here](https://stats.stackexchange.com/questions/11009/including-the-interaction-but-not-the-main-effects-in-a-model) – Robert Long Oct 22 '20 at 18:34

1 Answers1

2

Generally, second version is the default version. If you consider the effect of two dummy variables, you should include them in the model. Then you think of the interaction.

However, if you may have additional information (from the theory), that the interaction could be important in the model, but the variable itself is not. In this case you can think of removing it in the next step, if it is in fact not significant. This could improve properties of the estimator (efficiency), however it is generally not a common approach (personally I have never seen such approach in any acknowledged research).

cure
  • 1,666
  • 1
  • 7
  • 19
  • define *"not significant"* please. – Robert Long Oct 22 '20 at 18:32
  • Thanks for help with improving the answer. I added a link to the concept of statistical significance. – cure Oct 22 '20 at 18:47
  • 1
    OK. So assuming we use the arbitrary value of 0.05 you would think of removing the main effect for a variable that had a p value of 0.050001, but not it it was 0.049999 ? – Robert Long Oct 22 '20 at 18:49
  • This is other issue, and a topic for other question. Please have in mind, that as a first reason to do such uncommon action, I pointed out theory (also unequivocal concept). Statistical significance as a confirmation in this case is not that bad if there would be strong theoretical justification. – cure Oct 22 '20 at 18:56
  • It's not another issue, it is very important and relevant to the question. I agree, regarding theoretical justification (which is very rare in my experience), but it should not involve p values. – Robert Long Oct 22 '20 at 19:01
  • I definitively agree, that sticking to any significance level is bad idea. However I still would think, that p-values could be somehow informative. In this situation the question could be reversed. What if we had strong (but not perfect) theoretical justification, and simultaneously the data suggest otherwise (in this case estimator would be significant)? When writing this I thought, that potential significancy in this case could be an important warning, that there may be something wrong here and it definitively needs further investigation. – cure Oct 22 '20 at 19:09
  • 1
    That's a good point. In that case I would suspect there to be an artifact present in the sample which isn't present in the population. I don't think I have ever encountered a situation where there was strong theoretical justification for removing a main effect, yet the main effect was "significant". I mean, it would be easy to construct a pathelogical example through simulation, but I would really like to see a real-world example ! (+1) to your answer btw :) – Robert Long Oct 22 '20 at 19:15