I'm trying to understand what is going on when I remove the intercept in my model using y ~ x + 0
Why do these models have the same predicted values despite one not having the intercept?
glm1 <- glm(Survived ~ Pclass + Sex + Age, family = binomial(link = "logit"), data = titanic_train)
glm2 <- glm(Survived ~ 0 + Pclass + Sex + Age, family = binomial(link = "logit"), data = titanic_train)
> summary(glm1)
Call:
glm(formula = Survived ~ Pclass + Sex + Age, family = binomial(link = "logit"),
data = titanic_train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7270 -0.6799 -0.3947 0.6483 2.4668
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.056006 0.502128 10.069 < 2e-16 ***
Pclass -1.288545 0.139259 -9.253 < 2e-16 ***
Sexmale -2.522131 0.207283 -12.168 < 2e-16 ***
Age -0.036929 0.007628 -4.841 1.29e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 964.52 on 713 degrees of freedom
Residual deviance: 647.29 on 710 degrees of freedom
(177 observations deleted due to missingness)
AIC: 655.29
Number of Fisher Scoring iterations: 5
> summary(glm2)
Call:
glm(formula = Survived ~ 0 + Pclass + Sex + Age, family = binomial(link = "logit"),
data = titanic_train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7270 -0.6799 -0.3947 0.6483 2.4668
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Pclass -1.288545 0.139259 -9.253 < 2e-16 ***
Sexfemale 5.056006 0.502128 10.069 < 2e-16 ***
Sexmale 2.533875 0.456247 5.554 2.80e-08 ***
Age -0.036929 0.007628 -4.841 1.29e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 989.81 on 714 degrees of freedom
Residual deviance: 647.29 on 710 degrees of freedom
(177 observations deleted due to missingness)
AIC: 655.29
Number of Fisher Scoring iterations: 5
And a summary of the resulting predictions:
> summary(predict.glm(glm1, newdata = titanic_test))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-3.5660 -1.9688 -0.5642 -0.3827 0.7782 3.1027 86
> summary(predict.glm(glm2, newdata = titanic_test))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-3.5660 -1.9688 -0.5642 -0.3827 0.7782 3.1027 86
if it simply reparamartarizes the model is there any advantage to having the intercept term or can I safely use the second model for increased interpretability?