Why does removing the intercept in my categorical GLM result in the same model?

Question

I'm trying to understand what is going on when I remove the intercept in my model using y ~ x + 0

Why do these models have the same predicted values despite one not having the intercept?

glm1 <- glm(Survived ~ Pclass + Sex + Age, family = binomial(link = "logit"), data = titanic_train)
glm2 <- glm(Survived ~ 0 + Pclass + Sex + Age, family = binomial(link = "logit"), data = titanic_train)

> summary(glm1)

Call:
glm(formula = Survived ~ Pclass + Sex + Age, family = binomial(link = "logit"), 
    data = titanic_train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.7270  -0.6799  -0.3947   0.6483   2.4668  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  5.056006   0.502128  10.069  < 2e-16 ***
Pclass      -1.288545   0.139259  -9.253  < 2e-16 ***
Sexmale     -2.522131   0.207283 -12.168  < 2e-16 ***
Age         -0.036929   0.007628  -4.841 1.29e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 964.52  on 713  degrees of freedom
Residual deviance: 647.29  on 710  degrees of freedom
  (177 observations deleted due to missingness)
AIC: 655.29

Number of Fisher Scoring iterations: 5

> summary(glm2)

Call:
glm(formula = Survived ~ 0 + Pclass + Sex + Age, family = binomial(link = "logit"), 
    data = titanic_train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.7270  -0.6799  -0.3947   0.6483   2.4668  

Coefficients:
           Estimate Std. Error z value Pr(>|z|)    
Pclass    -1.288545   0.139259  -9.253  < 2e-16 ***
Sexfemale  5.056006   0.502128  10.069  < 2e-16 ***
Sexmale    2.533875   0.456247   5.554 2.80e-08 ***
Age       -0.036929   0.007628  -4.841 1.29e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 989.81  on 714  degrees of freedom
Residual deviance: 647.29  on 710  degrees of freedom
  (177 observations deleted due to missingness)
AIC: 655.29

Number of Fisher Scoring iterations: 5

And a summary of the resulting predictions:

> summary(predict.glm(glm1, newdata = titanic_test))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-3.5660 -1.9688 -0.5642 -0.3827  0.7782  3.1027      86 
> summary(predict.glm(glm2, newdata = titanic_test))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-3.5660 -1.9688 -0.5642 -0.3827  0.7782  3.1027      86

if it simply reparamartarizes the model is there any advantage to having the intercept term or can I safely use the second model for increased interpretability?

because you fit both levels of sex, and sexmale+sexfemale = 1, so you still have an implicit constant term when you take the constant out. — Glen_b, Mar 12 '19 at 14:16
To add to @Glen_b 's comment, look at the value for the "Sexfemale" term in your second model - a term that doesn't exist in the first model - and compare to the value of the intercept in the first model - a term that doesn't exist in the second model. — jbowman, Mar 12 '19 at 14:22
Since the models are equivalent, you can safely use the one you find more interpretable — kjetil b halvorsen, Jul 18 '19 at 17:09

Why does removing the intercept in my categorical GLM result in the same model?

0 Answers0