Let's say that you're fitting a model to predict some score as a linear function of two categorical inputs. Something like the following data set:
score x y
---------------------
46.12 A B
42.61 A F
39.14 B A
57.72 B C
48.65 B D
56.29 B E
45.45 B F
42.43 C B
52.68 C E
43.03 D B
52.54 D C
44.32 E B
54.43 E C
For interpretability, I would like to treat each input category as an indicator variable. That is, each data point has 3 input dimensions: the intercept, the x category, and the y category (all with value 1). However, the default coding behavior for lm() is to fold the first value of the first variable in with the intercept... which throws away the xA predictor in my example:
Call:
lm(formula = score ~ x + y, data = d)
Residuals:
1 2 3 4 5 6 7 8 9 10 11 12 13
4.281e-01 -4.281e-01 8.882e-16 1.171e-01 -4.441e-16 -5.452e-01 4.281e-01 -5.452e-01 5.452e-01 2.086e-01 -2.086e-01 -9.143e-02 9.143e-02
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.1562 1.0316 36.019 4.71e-05 ***
xB 1.9838 0.7379 2.689 0.074504 .
xC -2.7167 0.8324 -3.264 0.047005 *
xD -2.8705 0.8212 -3.495 0.039610 *
xE -1.2805 0.8212 -1.559 0.216828
yB 8.5357 0.9824 8.688 0.003209 **
yC 18.4629 0.9439 19.561 0.000292 ***
yD 9.5100 1.0195 9.328 0.002609 **
yE 17.6952 0.9569 18.492 0.000345 ***
yF 5.8819 0.9569 6.147 0.008662 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7209 on 3 degrees of freedom
Multiple R-squared: 0.9964, Adjusted R-squared: 0.9856
F-statistic: 92.5 on 9 and 3 DF, p-value: 0.001657
Of course I can achieve what I want with other machine learning libraries like scikit-learn, but they don't provide standard errors or p-values out-of-the-box. Is there a way to force R to explicitly code using every categorical value in the model (in addition to the intercept)?
EDIT: To be clear, I have multiple categorical inputs, and I want to keep the intercept. I did a pretty thorough search of this site (and others) and haven't found a discussion of how to do this yet. Also, the categories are not ordinal.