Does Principle of Marginality apply to interactions of categorical variables?

Question

Suppose we have factor X with n levels, factor M with p levels, then $\hat{Y} = X+M+X\cdot M$ and $\hat{Y} = X \cdot M$ will give us two parametrizations of the same model, since we can only get $np-1$ estimates for the coefficients. What, if anything, is gained by including the main effects directly in the model?

Consider the following example:

library(car)
data("Chile")

The full model:

Call:
lm(formula = log(income) ~ sex * education, data = Chile)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9759 -0.5220  0.1081  0.4281  2.6984 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       9.35894    0.03608 259.382  < 2e-16 ***
sexM              0.14874    0.05381   2.764  0.00575 ** 
educationPS       1.44101    0.07282  19.789  < 2e-16 ***
educationS        0.67606    0.05220  12.953  < 2e-16 ***
sexM:educationPS -0.17359    0.09966  -1.742  0.08166 .  
sexM:educationS  -0.04592    0.07584  -0.605  0.54491    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.8764 on 2587 degrees of freedom
  (107 observations deleted due to missingness)
Multiple R-squared: 0.2417, Adjusted R-squared: 0.2403 
F-statistic: 164.9 on 5 and 2587 DF,  p-value: < 2.2e-16

The interaction only model:

Call:
lm(formula = log(income) ~ sex:education, data = Chile)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9759 -0.5220  0.1081  0.4281  2.6984 

Coefficients: (1 not defined because of singularities)
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      10.13783    0.03786 267.803   <2e-16 ***
sexF:educationP  -0.77889    0.05230 -14.894   <2e-16 ***
sexM:educationP  -0.63015    0.05501 -11.454   <2e-16 ***
sexF:educationPS  0.66212    0.07371   8.982   <2e-16 ***
sexM:educationPS  0.63727    0.06685   9.533   <2e-16 ***
sexF:educationS  -0.10283    0.05344  -1.924   0.0544 .  
sexM:educationS        NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.8764 on 2587 degrees of freedom
  (107 observations deleted due to missingness)
Multiple R-squared: 0.2417, Adjusted R-squared: 0.2403 
F-statistic: 164.9 on 5 and 2587 DF,  p-value: < 2.2e-16

The outputs imply that these are the same model with different parametrization. In fact, any coefficient in either model is a linear combination of estimates from the other model.

Possible duplicate of this: http://stats.stackexchange.com/questions/11009/including-the-interaction-but-not-the-main-effects-in-a-model — Patrick Coulombe, Feb 23 '14 at 06:05
@Patrick: I think it's distinguished from the other question by its focus on a specific case. — Scortchi - Reinstate Monica, Feb 23 '14 at 08:50

Scortchi - Reinstate Monica · Accepted Answer · 2014-02-23T22:57:28.743

In the full model there are $n-1$ coefficients for the main effect of $X$, $p-1$ for the main effect of $M$, & $np -n - p +1$ for the interaction; giving a total, as you say of $np-1$. In the model with interaction only, there are just $np -n - p +1$ coefficients; so some combinations of levels of $X$ & $M$ share the same coefficient, which ones depending on the coding scheme. So the principle of marginality applies; indeed it's less usual that violating it is justified by a meaningful interpretation.

[Here are the combinations of dummy variables used in the full model, your first one:

(Intercept) sexM educationPS educationS sexM:educationPS sexM:educationS 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 1 0 1 0 1 1 1 1 0 1 0

A true interactions-only model would use only the first & last two columns as predictors, thus lumping together all females with males having only primary education ('P'). Adapting @Peter's example, you'd be saying that for females, educational level had no effect on income; & that for people with only primary education, sex had no effect. I'm not sure there's a lot more to be said about the marginality principle other than that you'd want such a constraint to be a deliberate modelling decision based on substantive knowledge rather than an accidental consequence of the coding scheme.]

Perhaps I'm interpreting the results incorrectly, but I count np-1 coefficients in the interactions only model. See the example in the edited OP — kevinykuo, Feb 23 '14 at 21:41
That's not an interactions-only model. Therefore everything you say about equivalence to the full model &c. is correct. I'd have expected `:` to give an interactions-only model but it seems R has other ideas - perhaps precisely because of the usual silliness of interaction-only models for categorical predictors. — Scortchi - Reinstate Monica, Feb 23 '14 at 21:47
Ahh I see. So what are the equations we should be fitting in a pure interaction model? Both R and SAS are giving me the same results... — kevinykuo, Feb 23 '14 at 22:27
E.g. just estimating sexM:educationPS & sexM:educationS from the first model. I'd suggest you use `model.matrix` to see exactly how the coding scheme works in terms of dummies - interaction is then just multiplication of dummies. — Scortchi - Reinstate Monica, Feb 23 '14 at 22:33

score 2 · Answer 2 · answered Feb 23 '14 at 11:55

2

@scortchi gave you a good answer, but I thought a specific example might be useful, if not for you then for others who will see this.

Suppose your dependent variable is log(income) and your two categorical independent variables are sex (male, female, other) and race (White, Black, Asian, Native American, Hawaiian/Pacific Islander). Let's say the reference categories are male and White.

With just the interaction you are assuming that, for White people, sex has no effect and, for males, race has no effect.

There may be situations where this is a sensible model, but I can't think of any, offhand.

answered Feb 23 '14 at 11:55

Peter Flom

94,055
35
143
276

Thanks for the example. I'm still trying to wrap my head around this. In the example I added, for the interaction only model, sexM:educationS is the reference level, and I'm interpreting the coefficient of sexF:educationS to be the effect of sex on educationS. Is this incorrect? – kevinykuo Feb 23 '14 at 17:04
No. Education is an independent variable. – Peter Flom Feb 23 '14 at 22:22

Does Principle of Marginality apply to interactions of categorical variables?

2 Answers2