3

I am having some difficulty attempting to interpret an interaction between two categorical/dummy variables. For example, lets say there is an interaction term between an individual's gender and her race.

sex=1 if male & race=1 if white

There is an interaction term between sex and race sex*race

Let's say this is the regression model:

wage = 0 + 1*educ + 2*sex + 3*race + 4*(sex*race) + e

How would you interpret 4 in this model? I presume it would be that if an individual is male and white, his wage will increase by 4. 2 would be if an individual is male, his wage will increase by 2+4*race and 3 would be if an individual is white, then his wage will increase by 3+4*white. Is this a correct interpretation? If not, please help me understand any flaws in my intuition. I appreciate the help, thank you.

Danny Brown
  • 61
  • 1
  • 1
  • 4
  • Related: [Interpretation of betas when there are multiple categorical variables](https://stats.stackexchange.com/q/120030/7290), & [Interpretation of interaction term](https://stats.stackexchange.com/q/122246/7290). – gung - Reinstate Monica May 16 '18 at 00:45

1 Answers1

4

Your interpretation is true. This is another way to interpret these terms:

  • If the person is male but not white, the wage is increased by $\beta_2$ (or decreased if $\beta_2$ is negative).
  • If the person is not male but is white, the wage is increased by $\beta_3$.
  • If the person is male and white, the wage is increased by $\beta_2+\beta_3+\beta_4$. That is, the term $sex *race$ makes your model non-linear. Without this term, if the person is male and white, the wage is increased by the amount of increase if he is male plus the amount of increase if he is white, which is one property of linear models. In other words, this term places more emphasise on the employees that are both male and white.
Hossein
  • 3,170
  • 1
  • 16
  • 32
  • Thank you show much, that was extremely helpful and intuitive. I have another question regarding your comment, I apologize if this is stupid. Regarding your part where you stated "makes your model **nonlinear**", does this not violate the first CLRM assumption of having a linear population model? – Danny Brown Mar 12 '17 at 08:33
  • 2
    No, it does not violate this assumption, because according to this assumption, your model should be linear in **parameters**, but it does not require the model to be linear in **variables**. – Hossein Mar 12 '17 at 10:09
  • When you say "If the person is male and white, the wage is increased by β2+β3+β4", what do you compare too? What is reference category? Is it men who is black? woman who is white? or woman who is black? – Anders Madsen Sep 13 '17 at 11:49
  • They are all comparing to the case where both dummy variables are 0, so, that is black women. – Peter Flom Sep 13 '17 at 11:58
  • @AndersMadsen If I correctly get your question, we compare to the case where either the person is woman or black. (The complement of `A and B` is `not(A) or not(B)`). In this case (where the person is woman or black), we cannot say that the wage is increased by `β2+β3+β4`, and it is obviously increased by a different value for different combinations of sex and race. – Hossein Sep 13 '17 at 14:18