Suppose we have factor X with n levels, factor M with p levels, then $\hat{Y} = X+M+X\cdot M$ and $\hat{Y} = X \cdot M$ will give us two parametrizations of the same model, since we can only get $np-1$ estimates for the coefficients. What, if anything, is gained by including the main effects directly in the model?
Consider the following example:
library(car)
data("Chile")
The full model:
Call:
lm(formula = log(income) ~ sex * education, data = Chile)
Residuals:
Min 1Q Median 3Q Max
-2.9759 -0.5220 0.1081 0.4281 2.6984
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.35894 0.03608 259.382 < 2e-16 ***
sexM 0.14874 0.05381 2.764 0.00575 **
educationPS 1.44101 0.07282 19.789 < 2e-16 ***
educationS 0.67606 0.05220 12.953 < 2e-16 ***
sexM:educationPS -0.17359 0.09966 -1.742 0.08166 .
sexM:educationS -0.04592 0.07584 -0.605 0.54491
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8764 on 2587 degrees of freedom
(107 observations deleted due to missingness)
Multiple R-squared: 0.2417, Adjusted R-squared: 0.2403
F-statistic: 164.9 on 5 and 2587 DF, p-value: < 2.2e-16
The interaction only model:
Call:
lm(formula = log(income) ~ sex:education, data = Chile)
Residuals:
Min 1Q Median 3Q Max
-2.9759 -0.5220 0.1081 0.4281 2.6984
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.13783 0.03786 267.803 <2e-16 ***
sexF:educationP -0.77889 0.05230 -14.894 <2e-16 ***
sexM:educationP -0.63015 0.05501 -11.454 <2e-16 ***
sexF:educationPS 0.66212 0.07371 8.982 <2e-16 ***
sexM:educationPS 0.63727 0.06685 9.533 <2e-16 ***
sexF:educationS -0.10283 0.05344 -1.924 0.0544 .
sexM:educationS NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8764 on 2587 degrees of freedom
(107 observations deleted due to missingness)
Multiple R-squared: 0.2417, Adjusted R-squared: 0.2403
F-statistic: 164.9 on 5 and 2587 DF, p-value: < 2.2e-16
The outputs imply that these are the same model with different parametrization. In fact, any coefficient in either model is a linear combination of estimates from the other model.