9

I have a model: $$ \ln({\rm earnings}) = a+b_1{\rm female}+b_2{\rm white}+b_3{\rm female}\times{\rm white} $$ ${\rm female}$ and ${\rm white}$ are dummy variables.

I have interpreted $b_1$ and $b_2$:

  • $b_1$ = change in female earnings comparing to male given you are non white
  • $b_2$ = change in white earnings comparing to non white given you are male

But I am unable to interpret the coefficient of the interaction term ($b_3$). Please help me with this.

Let me make it more clear what I need out of this regression $$ \ln({\rm earnings}) = 2.618656-.0899657{\rm female}+.382019{\rm white}-.2754126 {\rm female}\times{\rm white} $$ Now i know there is gender pay difference with b1, i also know there is race pay difference with b2. Now with b3 i need to know is their a gender pay gap for whites only. How can i figure that out with regression above and without test.

1 Answers1

6

$b_3$ is the difference between white females and the sum of $a+b_1+b_2$. That is, the difference between white females and the sum of non-white males plus the difference between non-white females and non-white males plus the difference between white males and non-white males.
\begin{align} b_3 = \bar x_\text{white female} - \big[&\ \ \bar x_\text{non-white male}\quad\quad\quad\quad\quad\quad\quad\ \ + \\ &(\bar x_\text{non-white female} - \bar x_\text{non-white male}) + \\ &(\bar x_\text{white male}\quad\quad\! - \bar x_\text{non-white male})\quad\ \big] \end{align}
Honestly, it's a bit of a mess to interpret in this way. More typically, we interpret the test of $b_3$ as a test of the additivity of the effects of ${\rm white}$ and ${\rm female}$. (The expression within the square brackets $[]$ is the additive effect of ${\rm white}$ and ${\rm female}$.) Then we make more substantive interpretations only of simple effects (i.e., the effect of one factor within a pre-specified level of the other factor). People rarely try to interpret the interaction effect / coefficient in isolation.

It may also help you to read my answer here: Interpretation of betas when there are multiple categorical variables, which covers an analogous, but simpler, situation without the interaction.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • ok can I interpret b3 as earnings of female white is different by b3 from earnings of females or white –  Nov 01 '14 at 04:48
  • @ahmed, b3 is *not* white female - non-white female, nor is it white female - white male. I'll try to make my answer clearer. – gung - Reinstate Monica Nov 01 '14 at 14:31
  • t= female=-1.65 white=8.86 female*white=-4.61 cons=66.07 p>|t|= female=(0.100) white=0.000 female*white=0.000 cons=0.000 std error= female=0.0546456 white=0.043098 female*white=0.059699 cons=0.0396351 (95% confidence interval). Now can we answer how to know whether there is a gender pay gap among whites only? –  Nov 01 '14 at 18:36
  • 1
    @ahmed, the way you have this set up is that non-white males are the reference group. The other coefficients are mostly being tested against that (w/ the interaction term as specified above). If you want to test males vs females w/i white, make white males (or white females) the reference group & the test of female (male) will do it. Alternatively, you can do a t-test on those 2 subgroups. – gung - Reinstate Monica Nov 02 '14 at 03:11
  • @ gung last question we multiple every coefficient with 100 for % change because of lnearnings, does that we multiple alpha (2.62) also by 100 for % change effect but that will be like 262% which doesn't sound right. please do comment on this. –  Nov 02 '14 at 20:15
  • @ahmed, in a situation like this, the percentages aren't bound by 0 & 1. So one group can have salaries 2.5 times as high as another group (whether that's true is an empirical question). For more on the interpretation of log transformed variables in regression, see this excellent CV thread: [Interpretation of log transformed predictor](http://stats.stackexchange.com/q/18480/7290). – gung - Reinstate Monica Nov 03 '14 at 00:00
  • CV thread explains that all variable will be *100 and give us % change if dependent has log, but my question is will intercept be also *100 to give us % change, which was not explained in that thread? I think so intercept in this case will not give us % change so it should be left with 2.61. –  Nov 03 '14 at 03:43
  • 1
    The intercept isn't a % change, it's just a constant. You can exponentiate that value to get a value on the original scale for that cell, if you want. The other coefficients represent % changes. – gung - Reinstate Monica Nov 03 '14 at 03:47
  • *"we interpret the test of b3 as a test of the additivity of the effects of white and female"* Could you elaborate on this point? What do you mean by "additivity of the effects of white and female"? What does it indicate when we reject the null for `b3`? – landroni May 16 '15 at 15:25
  • @landroni, additivity is covered in my previous answer. It assumes that the mean of `white females` (there `black females`) is simply that of `white males` plus the effect of being `female` plus the effect of being `white`. You can see this clearly in the previous answer in that the lines are parallel. If we reject the null for $b_3$, we are rejecting additivity. – gung - Reinstate Monica May 16 '15 at 16:48
  • I see, it makes sense. I understand that `b3` is effectively a test for additivity when interacting two dummy variables. But if we had a dummy variable interacted with a categorical variable with 3 cases, am I correct that in such a case the interaction coefficients individually won't have even this interpretation (i.e. they have no straightforward, immediate interpretation)? See associated question: http://stats.stackexchange.com/questions/148007/does-a-factor-by-factor-interaction-term-have-any-literal-interpretation – landroni May 17 '15 at 18:59