I'm trying to understand the output of glm
when a categorical variable has more than 2 categories.
I'm analysing if age affects death. Age is a categorical variable with 4 categories
I use the following code in R:
mydata <- read.delim("Data.txt", header = TRUE)
mydata$Agecod <- factor(mydata$Agecod)
mylogit <- glm(Death ~ Agecod, data = mydata, family = "binomial")
summary(mylogit)
Obtaining the following output:
Call:
glm(formula = Death ~ Agecod, family = "binomial", data = mydata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4006 -0.8047 -0.8047 1.2435 2.0963
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.5108 0.7303 0.699 0.4843
Agecod2 -0.6650 0.7715 -0.862 0.3887
Agecod3 -1.4722 0.7658 -1.922 0.0546 .
Agecod4 -2.5903 1.0468 -2.474 0.0133 *
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 237.32 on 184 degrees of freedom
Residual deviance: 223.73 on 181 degrees of freedom
(1 observation deleted due to missingness)
AIC: 231.73
Number of Fisher Scoring iterations: 4
Since I have p-values for Agecod2
, Agecod3
and Agecod4
and only Agecod4
has a significant p-value my questions are:
- Is really
Age
associated with death? - Is only the 4th age category associated with death?
- What happens with the first category since I don't have its p-value?
Update:
Since Antoni Parellada says “It seems as though you have proven that old age is a good predictor of death” and Gung points “You cannot tell from your output if Age is associated with death” I’m still confused.
I understand that “Intercept” is representing Agecod1 and is the “reference level”. According to Gung “The Estimates for the rest are the differences between the indicated level and the reference level. The associated p-values are for the tests of the indicated level vs. the reference level in isolation.”
My question now is:
Since Agecod4 p-value (0.0133) is significantly different from Agecod1 (reference lelvel) it doesn’t mean that age is associated with death?
I have also tried to perform a nested test with the following command:
anova(mylogit, test="LRT")
Obtaining:
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 184 237.32
Agecod 3 13.583 181 223.73 0.003531 *
Does it mean that Age is definitively associated with death?
Update2:
I have solved my problem using binary logistic regression in SPSS. The output is the same than “mylogit” but with SPSS I obtain a global p-value for the overall variable Agecod which is 0.008.
I don’t know if is possible to obtain this “global p-value” with R, but since I know that I can use SPSS is not a big problem for me.